Many economically important traits in livestock and plant species are genetically complex; i.e. traits are affected by many genes and by environment. The use of whole genome sequence data can lead to a better understanding and better models and predictions of (the background of) these traits. Eventually, this will lead to a more efficient livestock and crop production and, for example, better health and welfare of animals. Therefore, the use of sequence data for genomic prediction in animals and plants is investigated.
In the coming few years, it is expected that whole genome sequence data will be widely available at reasonable costs. Despite the fact that costs of sequencing are falling, it is still expensive to genotype a large group of individuals. A promising approach will be to sequence a core set of individuals and subsequently use this information to ‘estimate’ whole-genome sequence genotypes of other individuals. This approach, called imputation, will result in a large dataset for genomic selection with individuals with imputed sequence genotypes and phenotypic records.
Important questions are how many core animals should be sequenced for accurate imputation and whether sequenced individuals from other breeds contribute to the imputation accuracy. In dairy cattle, the international 1,000 Bull Genomes project has provided a core set of sequenced bulls from several commonly used cattle breeds. Results in dairy cattle showed that imputation from 800,000 DNA markers to whole-genome sequence data was accurate when the core set of sequenced individuals is large (≥80), while imputation of individuals genotyped with 60,000 DNA markers gave poor imputation accuracy regardless of the size of the core set of sequenced individuals. Imputation can either be done within one breed but also sequence data from multiple breeds can be combined into a reference population, this is especially beneficial for rare DNA variants present in multiple breeds.
The next steps are to investigate the use of imputed sequence data for genomic selection. Does sequence data lead to the better predictions and higher accuracy as expected? Can functional information present in the whole-genome sequence assist the genomic prediction models and increase the prediction accuracy?