Estimate of fixed effects and prediction of random effects is given by:
Source: Morota (2015) The Origin of BLUP and MME.
First appearance of MME and BLUP:
The formal proof that solutions in Henderson (1949, 1950) are BLUE and BLUP:
asreml
R package by Butler et al. (2009) wraps the core algorithm.*That is, unrealistic example only used to illustrate the basic idea.
We show how to fit this model with different variance structures.
*This is not to say this is the best model!
asreml
asreml(fixed=response ~ fixed formula, random= ~ random formula, rcov= ~ rcov formula, data=data.frame)
head(data1)
# A tibble: 6 x 5 Row Col Site Geno Yield <fct> <fct> <fct> <fct> <dbl> 1 1 1 A G1 5.14 2 2 1 A G3 7.15 3 1 2 A G2 3.80 4 2 2 A G2 6.41 5 1 3 A G1 4.30 6 2 3 A G3 6.91
To fit Model 1 in asreml
:
fit1 <- asreml(Yield ~ Site, random= ~ Site:Geno, data=data1)
asreml
outputsummary(fit1)$varcomp # or use lucid::vc(fit1)
gamma component std.error z.ratio constraint Site:Geno!Site.var 1.093095 1.0663211 0.9262377 1.151239 Positive R!variance 1.000000 0.9755063 0.4598581 2.121320 Positive
coef(fit1)$fixed
effect Site_A 0.0000000 Site_B -1.0396488 Site_C 0.1556465 (Intercept) 5.6176863
coef(fit1)$random
effect Site_A:Geno_G1 -0.6172425 Site_A:Geno_G2 -0.3519268 Site_A:Geno_G3 0.9691694 Site_B:Geno_G1 0.2607835 Site_B:Geno_G2 -0.9825827 Site_B:Geno_G3 0.7217992 Site_C:Geno_G1 -0.6441000 Site_C:Geno_G2 -0.2980348 Site_C:Geno_G3 0.9421349
fit1
is equivalent to
fit1a <- asreml(Yield ~ Site, data=data1 random= ~ idv(Site):id(Geno))fit1b <- asreml(Yield ~ Site, data=data1, random= ~ id(Site):idv(Geno))
where the variance structures of random effects are made explicit. Additionally we can define the residual error variance structure explicit:
fit1c <- asreml(Yield ~ Site, data=data1, random= ~ Site:Geno, rcov= ~ idv(units))
where units
is a special reserve word in asreml
equivalent to factor(1:nrow(data1))
.
By default, asreml
has an idv
structure if the variance structure is not defined. E.g. random=~Site
means
If rcov
is not defined, the default is idv(units)
.
For random interaction effects (say Site:Geno
), the default is
idv(Site):id(Geno)
(or equivalently id(Site):idv(Geno)
)
id()
is a correlation structure \(\boldsymbol{I}_n\)idv()
is a variance structure with common (homogeneous) variance \(\sigma^2\boldsymbol{I}_n\)idh()
(which is the same as diag()
) is a variance structure with a separate (heterogeneous) variance for each level of factor \(\text{diag}(\sigma^2_1, ..., \sigma^2_n)\).Appending the name of the correlation function with:
v
results in adding a homogeneous variance or h
results in heterogeneous variance. Time Series Type | Metric Based | General |
---|---|---|
ar1 |
exp |
cor |
ar2 |
gau |
corb |
ar3 |
iexp |
corg |
sar |
igau |
id |
sar2 |
ieuc |
us |
ma1 |
sph |
chol |
ma2 |
cir |
cholc |
arma |
aexp |
ante |
agau |
fa |
|
mtrn |
rr |
Given a random term A:B
where A
and B
are factors then the generic form is f(A):g(B)
where
f(A)
generates a variance matrix GA say, and g(B)
generates a variance matrix GB say.Then the corresponding random effects, , has properties:
Note assumption of separability
Given general matrices:
and B. Then
Note: gAi,j means the (i,j)-th entry of matrix GA. Similar definition for gBk,l.
Site:Geno
prediction.fit2 <- asreml(Yield ~ Site, random=~ us(Site):id(Geno), data=data1)
summary(fit2, nice=T)$nice
$`Site:Geno` A B C A 0.1500000 0.2209702 0.663163760 B 0.2209702 0.4615557 0.155530075 C 0.6631638 0.1555301 0.009486833
summary(fit2, all=T) %>% str()
List of 9 $ call : language asreml(fixed = Yield ~ Site, random = ~us(Site):id(Geno), data = data1) $ loglik : num -13.4 $ nedf : int 15 $ sigma : num 1.18 $ varcomp :'data.frame': 7 obs. of 5 variables: ..$ gamma : num [1:7] 0.15 0.221 0.462 0.663 0.156 ... ..$ component : num [1:7] 0.209 0.308 0.644 0.925 0.217 ... ..$ std.error : num [1:7] 0.118 0.593 0.733 0.846 0.592 ... ..$ z.ratio : num [1:7] 1.774 0.52 0.878 1.093 0.366 ... ..$ constraint: Factor w/ 3 levels "?","Positive",..: 3 2 2 2 2 1 2 .. ..- attr(*, "names")= chr [1:7] "Site:Geno!Site.A:A" "Site:Geno!Site.B:A" "Site:Geno!Site.B:B" "Site:Geno!Site.C:A" ... $ distribution: chr "gaussian" $ link : chr "identity" $ coef.fixed : num [1:4, 1:3] 0 -1.04 0.156 5.618 NA ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:4] "Site_A" "Site_B" "Site_C" "(Intercept)" .. ..$ : chr [1:3] "solution" "std error" "z ratio" $ coef.random : num [1:9, 1:3] -0.238 -0.301 0.539 -0.121 -0.385 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:9] "Site_A:Geno_G1" "Site_A:Geno_G2" "Site_A:Geno_G3" "Site_B:Geno_G1" ... .. ..$ : chr [1:3] "solution" "std error" "z ratio" - attr(*, "class")= chr "summary.asreml"
summary(fit2, all=T)
$call asreml(fixed = Yield ~ Site, random = ~us(Site):id(Geno), data = data1) $loglik [1] -13.40281 $nedf [1] 15 $sigma [1] 1.180882 $varcomp gamma component std.error z.ratio constraint Site:Geno!Site.A:A 0.150000000 0.20917239 0.1178934 1.77424976 Singular Site:Geno!Site.B:A 0.220970186 0.30813908 0.5925541 0.52001846 Positive Site:Geno!Site.B:B 0.461555711 0.64363140 0.7334553 0.87753330 Positive Site:Geno!Site.C:A 0.663163760 0.92477032 0.8461568 1.09290650 Positive Site:Geno!Site.C:B 0.155530075 0.21688398 0.5919423 0.36639377 Positive Site:Geno!Site.C:C 0.009486833 0.01322922 1.1282792 0.01172513 ? R!variance 1.000000000 1.39448259 0.7859562 1.77424976 Positive $distribution [1] "gaussian" $link [1] "identity" $coef.fixed solution std error z ratio Site_A 0.0000000 NA NA Site_B -1.0396488 0.7150596 -1.4539330 Site_C 0.1556465 0.7150596 0.2176692 (Intercept) 5.6176863 0.5496707 10.2200938 $coef.random solution std error z ratio Site_A:Geno_G1 -0.2377406 0.4021136 -0.5912274 Site_A:Geno_G2 -0.3010980 0.4021136 -0.7487883 Site_A:Geno_G3 0.5388385 0.4021136 1.3400157 Site_B:Geno_G1 -0.1214088 0.4021136 -0.3019265 Site_B:Geno_G2 -0.3846551 0.4021136 -0.9565832 Site_B:Geno_G3 0.5060639 0.4021136 1.2585097 Site_C:Geno_G1 -0.2412990 0.4021136 -0.6000766 Site_C:Geno_G2 -0.2939577 0.4021136 -0.7310314 Site_C:Geno_G3 0.5352567 0.4021136 1.3311081 attr(,"class") [1] "summary.asreml"
Genotypes are often related with its relatedness inferred from pedigree or genetic markers.
This known structure (denoted \(\textbf{G}_g\) here) is fitted in the variance structure as follows:
Say,
fit3 <- asreml(Yield ~ Site, data=data1, random=~ us(Site):giv(Geno), ginverse=list(Geno=Ginv))
where Ginv
is:
Ginv
Row Column Value 1 1 1 1.5 2 2 1 0.5 3 2 2 1.5 4 3 1 -1.0 5 3 2 -1.0 6 3 3 2.0
attr(Ginv, "rowNames")
[1] "G1" "G2" "G3"
at(f,l)
conditions on level l
of factor f
. If l
is not specified, conditions on each level of f
. fit4 <- asreml(Yield ~ Site, data=data1, random=~ us(Site):giv(Geno), ginverse=list(Geno=Ginv), rcov=~at(Site):units)
Rules for residual error term:
rcov
must be equal to the number of obeservational units included in the analysis.rcov
, each combination of levels of the simple model terms comprising this term must uniquely identify one unit of the data.rcov
specified.fit4A1 <- asreml(Yield ~ Site, random=~Site:Geno, data=data1, subset=Site=="A")
fit4A2 <- asreml(Yield ~ 1, random=~ Geno, data=data1, subset=Site=="A")
dataA <- subset(data1, Site=="A")fit4A3 <- asreml(Yield ~ Site, random=~Site:Geno, data=dataA)
resid(fit4A1)
A quick check to see if the (deviance) residual are the same:
all.equal(resid(fit4A1), resid(fit4A2))
[1] TRUE
all.equal(resid(fit4A2), resid(fit4A3))
[1] TRUE
all.equal(resid(fit4A1), resid(fit4A3))
[1] TRUE
fit4A <- asreml(Yield ~ 1, random=~ Geno, data=data1, subset=Site=="A")fit4B <- asreml(Yield ~ 1, random=~ Geno, data=data1, subset=Site=="B")fit4C <- asreml(Yield ~ 1, random=~ Geno, data=data1, subset=Site=="C")fit4 <- asreml(Yield ~ Site, data=data1, random=~ diag(Site):Geno, # idh(Site):Geno rcov=~ at(Site):units)
lucid::vc(fit4)
effect component std.error z.ratio constr Site:Geno!Site.A.var 0.9034 1.618 0.56 P Site:Geno!Site.B.var 1.349 1.669 0.81 P Site:Geno!Site.C.var 0.9461 1.54 0.61 P Site_A!variance 1.261 1.029 1.2 P Site_B!variance 0.6029 0.4922 1.2 P Site_C!variance 1.063 0.868 1.2 P
lucid::vc(fit4A)
effect component std.error z.ratio constr Geno!Geno.var 0.9034 1.618 0.56 P R!variance 1.261 1.029 1.2 P
lucid::vc(fit4B)
effect component std.error z.ratio constr Geno!Geno.var 1.349 1.669 0.81 P R!variance 0.6029 0.4922 1.2 P
lucid::vc(fit4C)
effect component std.error z.ratio constr Geno!Geno.var 0.9461 1.54 0.61 P R!variance 1.063 0.868 1.2 P
More explanations on these models later today.
fit5 <- asreml(Yield ~ Site, data=data1, random=~ us(Site):giv(Geno), ginverse=list(Geno=Ginv), rcov=~ at(Site):ar1v(Row):ar1(Col))
fit6 <- asreml(Yield ~ Site, data=data1, random=~ fa(Site, 3):giv(Geno), ginverse=list(Geno=Ginv), rcov=~ at(Site):ar1v(Row):ar1(Col))
giv
s. Here the model is fitting additive marker effects, residual additive effects (via pedigree), and non-additive effects (capturing interaction effects).asreml
has a reserve word trait
which is a factor with levels corresponding to each trait. fit7 <- asreml(Yield ~ Site, data=data1, random=~ fa(Site, 3):giv(Geno) + fa(Site, 2):giv(GenoDup) + fa(Site, 2):ide(Geno), ginverse=list(Geno=Kinv, GenoDup=Ainv), rcov=~ at(Site):ar1v(Row):ar1(Col))
fit8 <- asreml(c(y1,y2) ~ trait, data=data1, random=~ us(trait):giv(Geno), ginverse=list(Geno=Ginv), rcov=~ units:us(trait))
Yates, F. (1935) Complex experiments. Journal of the Royal Statistical Society Suppl. 2 181–247.
data(oats, package="asreml") # load dataoats <- oats %>% mutate(Row=as.numeric(Wplots), Col=as.numeric(Subplots))desplot(Nitrogen ~ Row*Col|Blocks, data=oats, text=Variety, cex=1.5, out1=Wplots, gg=TRUE) + labs(title="", x="Whole Plot", y="Subplot") + theme(axis.title=element_text(size=20), legend.text=element_text(size=16), legend.title=element_text(size=24), strip.text=element_text(size=24))
fit <- asreml(yield ~ Variety*Nitrogen, random= ~ Blocks/Wplots, data=oats)wald(fit) # same as anova(fit)
Wald tests for fixed effects Response: yield Terms added sequentially; adjusted for those above Df Sum of Sq Wald statistic Pr(Chisq) (Intercept) 1 43410 245.141 <2e-16 *** Variety 2 526 2.971 0.2264 Nitrogen 3 20020 113.057 <2e-16 *** Variety:Nitrogen 6 322 1.817 0.9357 residual (MS) 177 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Wald (1943) Tests of Statistical Hypotheses Concerning Several Parameters When the Number of Observations is Large. Transactions of the American Mathematical Society 54(3) 426-482
asreml
uses a partial implementation of Kenward-Roger approach. More specifically, it uses an unadjusted variance matrix with variance parameters fixed at the REML estimates of the maximal model.Kenward, M.G. and Roger, J.H. (1997). Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood. Biometrics 53 983-997.
To compute the denominator degrees of freedom in asreml
:
denDF="default"
denDF="numeric"
denDF="algebraic"
Default is denDF="none"
.
If denominator df is unavailable, you may use residual df (fit$nedf
). Note this will be anti-conservative.
asreml
show the incremental
tests where the sum of the squares are adjusted for the terms listed above the rows. conditional
tests by using the argument ssType="conditional"
.then Nelder (1977, 1994) argues that meaningful and interesting test for terms can only be conducted for those that respect marginality relations.
y ~ 1 + A + B + A:B
then A
cannot be adjusted for A:B
.
asreml
out_wald <- wald(fit, ssType="conditional", denDF="numeric")
out_wald$Wald %>% lucid::lucid()
Df denDF F.inc F.con Margin Pr (Intercept) 1 5 245 245 0.0000193 Variety 2 10 1.48 1.48 A 0.272 Nitrogen 3 45 37.7 37.7 A 0 Variety:Nitrogen 6 45 0.303 0.303 B 0.932
Variety:Nitrogen
is adjusted for other terms;Variety
is adjusted for Nitrogen
and (Intercept)
; andNitrogen
is adjusted for Variety
and (Intercept)
.data("burgueno.rowcol", package="agridat")fit1 <- asreml(yield ~ gen, random=~rep, trace=F, data=burgueno.rowcol)fit2 <- asreml(yield ~ gen, trace=F, data=burgueno.rowcol)lrt(fit2, fit1)
Likelihood ratio test(s) assuming nested random models. Chisq probability adjusted using Stram & Lee, 1994. Df LR statistic Pr(Chisq) fit1/fit2 1 21.141 2.133e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-2 * (fit2$loglik - fit1$loglik)
[1] 21.1414
rep
, \(\sigma^2_r\). fit <- asreml(Yield ~ Site + Geno, data=data1, random=~ Site:Geno + at(Site, "A"):Row)predict(fit, classify="Geno")$predictions$pval
Notes: - The predictions are obtained by averaging across the hypertable calculated from model terms constructed solely from factors in the averaging and classify sets. - Use "average" to move ignored factors into the averaging set. - The SIMPLE averaging set: Site - The ignored set: Row Geno predicted.value standard.error est.status 1 G1 5.049309 0.4423711 Estimable 2 G2 4.529920 0.4253100 Estimable 3 G3 6.389828 0.4423711 Estimable
classify
).average
) which by default is all fixed effects not in the classify set. use
, except
, only
) in prediction. average
).Geno
Site
using uniform weightsRow
Geno predicted.value standard.error est.status 1 G1 5.049309 0.4423711 Estimable 2 G2 4.529920 0.4253100 Estimable 3 G3 6.389828 0.4423711 Estimable
0.000 + 5.344 + (0.002 + 0.138 - 0.141)/3 + (0.000 - 1.040 + 0.156)/3
[1] 5.049
-0.519 + 5.344 + (0.059 - 0.135 + 0.076)/3 + (0.000 - 1.040 + 0.156)/3
[1] 4.530333
1.341 + 5.344 + (-0.062 - 0.003 + 0.065)/3 + (0.000 - 1.040 + 0.156)/3
[1] 6.390333
Of course precision is lost here beyond 2 decimal places due to rounding of 3 decimal places.
predict(fit, classify="Geno:Site")$predictions$pval
Site Geno predicted.value standard.error est.status 1 A G1 5.346355 0.8971775 Estimable 2 A G2 4.883918 0.8803011 Estimable 3 A G3 6.622785 0.8971775 Estimable 4 B G1 4.442765 0.4949729 Estimable 5 B G2 3.649645 0.4855925 Estimable 6 B G3 5.641703 0.4949729 Estimable 7 C G1 5.358807 0.4949729 Estimable 8 C G2 5.056196 0.4855925 Estimable 9 C G3 6.904995 0.4949729 Estimable
0.002 + 5.344 + 0.000 + 0.000
[1] 5.346
0.059 + 5.344 - 0.519 + 0.000
[1] 4.884
-0.062 + 5.344 + 1.341 + 0.000
[1] 6.623
0.138 + 5.344 + 0.000 - 1.040
[1] 4.442
predict
?predict.asreml
And references:
asreml
NA
.na.method.X="fail"
.na.method.X="omit"
then corresponding records with missing explanatory variables are dicarded. na.method.Y="include"
. In this case a factor mv
is created with one level for each missing response. na.method.Y="omit"
then missing values in the response are discarded. na.method.X="include"
, missing values in explanatory variables are replaced with zeros. Thus it is essential that you can scale and center covariates with missing values.Missing value treatment argument:
na.method.X
na.method.Y
each with three options:
"omit"
"include"
"fail"
asreml.Ainverse
uses Meuwissen and Luo (1992) to compute the inverse of the numerator relationship matrix (A-1) directly from the pedigree that can be supplied in ginverse
in conjuction with giv
or ped
. ped
me mum dad fgen 1 I1 0 0 0 2 I2 0 0 0 3 I3 0 0 0 4 I4 0 0 0 5 I5 I1 I2 0 6 I6 I1 I2 0 7 I7 I1 I3 0 8 I8 I2 I3 0 9 I9 I3 I4 0 10 I10 I5 I6 0 11 I11 I7 I8 0 12 I13 I9 I9 2
asreml.Ainverse(ped)$ginv
asreml.Ainverse(ped)$ginv %>% # for the format for asreml asreml.sparse2mat() # to get A inverse matrix
asreml.Ainverse(ped)$ginv %>% # for the format for asreml asreml.sparse2mat() %>% # to get A inverse matrix solve() # to get the A matrix
LogLikelihood not converged
By default maximum iteration maxiter
is 13 for the AI-REML algorithm. Convergence is determined by REML. You can run a few more iterations by:
asreml_object <- update(asreml_object)
or supplying the argument maxiter
with a higher number, say 20
asreml(Yield ~ Site, random=~Geno, maxiter=20)
Hint:
while(!asreml_object$converge) asreml_object <- update(asreml_object)
Note: this can get stuck in a loop if it never converges!
So what if the model does not seem to converge at all?
Consider decreasing the step size from default of 0.1:
asreml(Yield ~ Site, random=~Geno, stepsize=0.01)
or change to "better" initial values...
asreml
uses Quasi Newton-Raphson Algorithm to solve for REML estimates.Initial value = 1
Converges to the solution x=0.
Initial value = 1.5
Does not converge.
asreml
First get a table of initial values:
sv <- asreml(Yield ~ Site, data=data1, random= ~Site:Geno, start.values=TRUE)sv$gammas.table
Gamma Value Constraint 1 Site:Geno!Site.var 0.1 P 2 R!variance 1.0 P
Modify as you see fit:
sv$gammas.table$Value <- c(0.5, 1)fit <- asreml(Yield ~ Site, data=data1, random= ~Site:Geno, G.param=sv$gammas.table, R.param=sv$gammas.table)
The default initial values are 0.1 for both variance ratios and correlations with some exceptions, e.g. for us
models it is given by
matrix(0.1, nrow=n, ncol=n) + diag(0.05, nrow=n)
When you have some better idea of what the parameter values should be, e.g. from past study or estimates from more simpler model going to a more complex model.
fit <- asreml(Yield ~ Site, data=data1, random=~ Geno + Site:Geno)lucid::vc(fit)
effect component std.error z.ratio constr Geno!Geno.var 1.092 1.252 0.87 P Site:Geno!Site.var 0.0000004 0.0000002 2.5 B R!variance 0.9553 0.3747 2.5 P
sv <- asreml(Yield ~ Site, data=data1, random= ~Geno + Site:Geno, start.values=TRUE) sv$gammas.table
Gamma Value Constraint 1 Geno!Geno.var 0.1 P 2 Site:Geno!Site.var 0.1 P 3 R!variance 1.0 P
"U"
sv$gammas.table[,3] <- c("P", "U", "P")
The proposed model is the same as:
fit <- asreml(Yield ~ Site, data=data1, random= ~Geno + Site:Geno, G.param=sv$gammas.table, R.param=sv$gammas.table) vc(fit)
effect component std.error z.ratio constr Geno!Geno.var 1.099 1.255 0.88 P Site:Geno!Site.var -0.03282 0.3954 -0.083 U R!variance 0.9755 0.4599 2.1 P
If you used the code "F" instead, it will fix the parameter to the specified initial value.
For technical help, the best place is to post at the VSN forum.
For sofware installation and license issues, please contact support@vsni.co.uk.
Cross Validated with the tag as asreml
. The questions are very sparse but I roam around Stack Overflow and Cross Validated and answer occasionally.
These slides were made using the R package xaringan
with the ninja-themes
and is available at bit.ly/UT-WS-asreml-R
.
Download
For workshop participants, contact Emi for the tutorials.
day3-session01-intro-to-asreml.Rmd
here, open in RStudio, push the button "Run Document" on the top tab and work through the exercises.
Estimate of fixed effects and prediction of random effects is given by:
Source: Morota (2015) The Origin of BLUP and MME.
First appearance of MME and BLUP:
The formal proof that solutions in Henderson (1949, 1950) are BLUE and BLUP:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |