SNP Data Analysis


New sequencing and marker genotyping technologies promise to accelerate the pace of molecular genetics/diversity research and gains in selection through molecular breeding.  Single nucleotide polymorphism (SNP) markers have become popular due to several key advantages such as (1) greater abundance in the genome, (2) increased efficiency due to sample multiplexing, and (3) cost-effectiveness as compared to SSR and other gel-based genotyping. Being sequence-based, SNPs are directly informative of associations with genes and regulatory elements. SNP chips enable rapid genome-wide scans at different levels of resolution as needed by diversity analysis, fingerprinting, QTL / association mapping, marker-assisted selection, and genomic prediction. The size and complexity of SNP genotyping data often overwhelm researchers not used to such datasets. In order to extract useful information from SNP data for genetics, diversity and breeding applications, researchers must acquire computer skills to handle and manage SNP data files and utilize advanced software that implement the appropriate analyses. Through a combination of classroom lectures and computer hands-on sessions, the course will provide the skills and knowledge required to manage, explore, manipulate, and analyze SNP data sets.


The goal is to introduce and provide foundation knowledge to the participants about analyses methods for molecular genetics, diversity / population studies, and molecular breeding using SNP datasets. Participants will learn to use stand-alone and web-based software designed to handle medium to high density SNP genotyping data and implement analyses methods relevant to their own research interests.

Specifically, the course aims to:
  • Present an overview of methodologies for generation of SNP genotyping data (from the lab to variant discovery/calling);
  • Develop skills in SNP dataset processing: handling, graphical genotype visualization, and data manipulation
  • Conduct genetic diversity and population analysis
  • Perform genome-wide association analysis and elementary genomic prediction
  • Perform post GWAS / QTL bioinformatics analyses: in silico Candidate Genes and QTL lift-over across different rice reference genomes
  • SNP discovery: from laboratory methods to variant discovery by software
  • Searching and retrieving SNP data from SNP-Seek and other genotyping data resources 
  • Manipulation and graphical visualization of SNP datasets (using Flapjack and IRRI Galaxy)
  • Genetic diversity and population (structure, LD) analysis (using R)
  • Association Analysis and Genomic Prediction (using R, TASSEL, and IRRI Galaxy)
  • Post-GWAS/QTL bioinformatics: in silico Candidate Gene Discovery and QTL lift-over across rice genomes (using IRRI Galaxy)
  • Researchers and scholars from IRRI and NARES partners
  • Private companies that use SNP datasets in their respective research project(s)
  • 4.5 days
  • Proficiency in using computer for data analysis and data manipulation (knowledge in spreadsheet and text file editor software)
  • Knowledge in Intermediate Genetics

Ms. Achu Arboleda
IRRI Education

International Rice Research Institute
DAPO Box 7777, Metro Manila, Philippines
Fax: (63- 2) 891-1292; 580 5699
Phone: (63- 2) 845-0563; 580 5600 ext 2538