Biology informed multi-population genomic prediction

The accuracy of genomic prediction is largely dependent on the size of the reference population. Therefore, the accuracy of genomic prediction in numerically small populations is limited. Potentially, the accuracy of genomic prediction in numerically small populations can be improved by including individuals from numerically larger populations in the reference, a method known as multi-population genomic prediction.

However, empirical results have so far shown that multi-population genomic prediction yields little to no increase in accuracy as compared to within-population genomic prediction models. As part of the Breed4Food TOPBREED project, we implemented and tested a multi-population, multiple genomics relationship matrix (MPMG) model that tries to overcome some of the limitations of pre-existing models. In the MPMG model, a limited number of markers that were shown to be significantly associated with the trait of interest were pre-selected and used to create a genomic relationship matrix, or simply a GRM. Markers on a traditional 50k SNP chip were then used to create a second GRM and both GRMs were fitted simultaneously. In addition, information of all individuals in the reference were weighted by their genetic correlation with individuals in the validation population. In terms of accuracy, results show that the MPMG model outperforms within-population and multi-population genomic prediction models in which all markers are equally weighted in a single GRM. 

Theoretical underpinning of the performance of the multiple genomic relationship matrix model

Having observed the superior performance of the MPMG model, we wanted to theoretically understand the reasons for it. To do that, we used a method known as selection index theory to derive and validate a prediction equation for the accuracy of the MPMG model. Using the derived prediction equation, we identified a parameter known as Me that plays a key role. The Mparameter reflects the effective number of genomic segments for which effects are estimated. In general, the lower the value of this parameter, the higher the accuracy of prediction. We showed that an important advantage of the MPMG model is its ability to benefit from the low values of Me due to the few pre-selected markers. However, this advantage holds only if marker pre-selection is accurate, such that the pre-selected markers explain some genetic variance for the trait of interest. Furthermore, the markers in the second GRM are used as a backup, that is, they capture any residual genetic variance for the trait that could not be captured by the pre-selected markers. These properties of the model make it superior, in terms of accuracy, to genomic prediction models that equally weight all makers in a single GRM. In addition to its use as a tool to gain theoretical insights into the performance of biology-informed genomic prediction models, the derived prediction equation can be used by breeding companies to, a priori, estimate expected accuracy if they were to implement the MPMG model for genomic prediction. 


Results from this research are published (open access) in the following papers: 

Raymond B., Bouwman A.C., Wientjes Y.C., Schrooten C., Houwing-Duistermaat J., Veerkamp R.F., Genomic prediction for numerically small breeds, using models with pre-selected and differentially weighted markers, Genetics Selection Evolution. 50 (2018) 49. DOI

Raymond B., Bouwman A.C., Wientjes Y.C., Schrooten C., Veerkamp R.F., A deterministic equation to predict the accuracy of multi-populationgenomic prediction with multiple genomic relationship matrices, Genetics Selection Evolution. 52 (2020) 22. DOI


Contact Person: Biaty Raymond