We can write the model more generally as <img src="images/model2.png" width="100%"/> where we assume a separable variance structure for genotype-by-environment effects: <center> <img src="images/ugxe.png" width="50%"/> </center> ]][.content[ * **G**<sub>e</sub> may be assumed a general matrix such as unstructured matrix that is usually estimated from the data. * **G**<sub>g</sub> is a known genotype relationship matrix. ]] --- class: split-60 with-border .column[.content[ # Unstructured Covariance Model <br> <iframe src="html_extras/table5.html" width="100%" height="500!important" frameBorder="0"> </iframe> ]][.content[ * The number of parameters to be estimated grows .yellow[quadratically] with the number of trials so it quickly becomes too many parameters to estimate. * Recall covariances are symmetric so there is no need to estimate the parameters in the upper (or lower) triangle of covariance matrices. <center> <img src="images/usmodel.png" width="60%"/> </center> ]] --- class: split-60 with-border .column[.content[ # Factor Analytic Model For some .indigo[order *k*], you can replace the unstructured covariance with factor analytic form: <br> <center> <img src="images/famodel.png" width="70%"/> </center> ]][.content[ * Due to identifiability, some contraints are applied to the loading matrix. * `asreml` constrains such that the upper triangle of the loading matrix are zeroes. <center> <img src="images/faconstraint.png" width="80%"/> </center> * So in effect there are <br> *(k + 1) t - k (k - 1) / 2* <br> parameters to estimate. ]] --- class: split-70 with-border .column[.content[ # The number of variance parameters to estimate <iframe src="html_extras/table7.html" width="100%" height="500!important" frameBorder="0"> </iframe> ]][.content[ * The number of variance parameters to estimate for FA model grows .yellow[linearly] with the number of trials. * FA model can be considered a .yellow[lower order approximation] to the US model. * As FA model is to offer a simpler model then it does not make sense to have more parameters to estimate in FA model than the US model. ]] --- class: split-70 with-border .column[.content[ # Condition to use FA model over US model <center> <img src="images/numpar.png" width="70%"/> </center> ]][.content[ * *Trial*, *site* and *environment* are used synonymously. * We expect that *t > k*. ]] --- class: split-70 with-border .column[.content[ # Latent Variable Model <br> <center> <img src="images/famodelassum.png" width="80%"/> </center> ]][.content[ * FA Model is a special case of latent variable model when the responses are conditionally normally distributed. * Note: our FA model is different to the standard FA model due to the separable structure of **G**<sub>ge</sub>. ]] --- class: split-70 with-border .column[.content[ # Latent Variable Model <br> <center> <img src="images/famodelassum2.png" width="80%"/> </center> ]][.content[ * Notice that this is like a linear regression model except the covariates are estimated from the data. * The loadings represent some latent environmental covariate. * The common factor represent how the genotype responds to that covariate. * The specific factor represent an effect specific to that environment. ]] --- class: split-70 with-border .column[.content[ # How to choose the order, *k*, of FA model? 1. Pragmatically, you can use some threshold for overall percentage of between genetic variances explained by the *k* factors: <center> <img src="images/percfa.png" width="45%"/> </center> 2. You can use a hypothesis testing approach or use of information criterion. 3. You can use OFAL penalty proposed in Hui et al. (2018). .bottom_abs.width100.font_small[ Hui et al. (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. *Biometrics* ] ]][.content[ * The first approach is akin to using coefficient of determination *R<sup>2</sup>* in linear regression. ]] --- class: split-70 with-border .column[.content[ # Non-uniqueness of factor loadings * Suppose that we have an orthogonal matrix **Q** of size *t*. * If we do not impose some constraint as per slide 5, then there are many possible solutions for factor loadings. * In fact, the rotated loadings <img src="images/QLam.png" height="24pt"/> and factor <img src="images/Qf.png" height="36pt"/> are also solutions. So what to do? * Many solutions. The approach we use in the practical session, we use an approach similar to principal components, in that: - the first rotated factor accounts for the maximum amount of estimated genetic covariance, - the second accounts for the next largest amount of estimated genetic covariance and so on. ]][.content[ * A square matrix **Q** of size *t* is orthogonal if <center> <img src="images/orthogonal.png" width="50%"/> </center> * If *k=1*, then there is no constraint applied and no rotation is necessary. Steps: 1. Constrain upper triangle of loading matrix to zero. 2. Rotate the estimated loading matrix. ]] --- class: split-90 with-border .column[.content[ .split-30[ .row[.content[ # How is this different principal component analysis? Good question! The two are the same under certain case in fact (can you tell when?). ]] .row[.content[ .split-two[ .column[.content[ * Principal component analysis (PCA) is a transformation of the data. * PCA transforms the variables to principal components. ]] .column[.content[ * Factor analytic model assumes that the data comes from a well-defined model where the underlying factors satisfy assumptions mentioned before. * The emphasis in factor analysis is that the factors map to the variables but specific factors are explicitly assumed to be "noise". ]] ] ]] ] ]][.content[ ]] --- class: split-70 with-border .column[.content[ # Between environment genetic correlation matrix <center> <img src="images/cov2cor.png" width="70%"/> </center> ]][.content[ * Negative genetic correlation estimate indicate cross-over interaction. * Positive genetic correlation estimate indicate noncross-over interaction. * Estimating the between environment genetic covariance is dependent on the number of varieties in common between trials (which we refer to as *connectivity*). ]] --- class: split-70 with-border .column[.content[ # Notes on FA model * If trials are completely disconnected then between environment genetic covariance cannot be reliably computed. * The genetic regression residuals represent non-repeatable variety effects for the given the model and set of environments. ### What set of trials for MET analysis? * You would want a representative sample of environments, both in a geographic and seasonal sense, a relevant set of varieties and reasonable connectivity between pairs of trials. There could be more done (e.g. discussing with trial managers and experts) to address this point before proceeding with the FA model. * For simplicity, we shall assume there is no concern and proceed on. I recommend you make into a function for many repetitive task such as this. (2015) Factor analytic mixed models for the provision of grower information from national crop variety testing programs. *Theoretical and Applied Genetics* **128**(1) 55-72 * Cullis et al. (2014) Factor analytic and reduced animal models for the investigation of additive genotype-by-environment interaction in outcrossing plant species with application to a *Pinus radiata* breeding programme. *Theoretical and Applied Genetics* **127**(10) 2193-2210 * Mardia et al. (1995) Multivariate Analysis. 