Software design, selection and estimation for latent variable models
Presented by Emi Tanaka
School of Mathematics and Statistics
dr.emi.tanaka@gmail.com
@statsgen
29th Nov 2019 @ WOMBAT2019 | Melbourne, Australia
Presented by Emi Tanaka
School of Mathematics and Statistics
dr.emi.tanaka@gmail.com
@statsgen
29th Nov 2019 @ WOMBAT2019 | Melbourne, Australia
1 2 3 4 5
1 Marketing: Product testing survey
Person vs product attributes
2 Psychometrics: Human intelligence test
Student vs intellect test
3 Bioinformatics: Gene expression data
Sample vs genes
4 Ecology: Multi-abundance species data
Site vs species
5 Agriculture: Multi-environmental field trial
Variety vs site
(or factor analytic models)
can model this reasonably well
Non-Gaussian Distribution
g(\mu_{ij}) = \eta_{ij} = \boldsymbol{x}_i^\top\boldsymbol{\beta}_j + \boldsymbol{u}_i^\top\color{blue}{\boldsymbol{\lambda}_j}
Gaussian Distribution
y_{ij} = \boldsymbol{x}_i^\top\boldsymbol{\beta}_j + \boldsymbol{u}_i^\top \color{blue}{ \boldsymbol{\lambda}_j} + \epsilon_{ij}
Main idea
(in another words what should k be?)
Cov(\boldsymbol{\eta}_i|~\boldsymbol{x}_i) = \mathbf{\Lambda}\mathbf{\Lambda}^\top\quad\text{or}\quad Cov(\boldsymbol{y}_i|~\boldsymbol{x}_i) = \mathbf{\Lambda}\mathbf{\Lambda}^\top + \mathbf{\Psi}
So \mathbf{\Lambda} is a set of variance parameters
Let \boldsymbol{\theta} denote a vector of all variance parameters in a model then
\DeclareMathOperator*{\argmax}{arg\,max}\hat{\boldsymbol{\theta}}_{\text{ML/REML}} = \argmax_{\boldsymbol{\theta}}~ \ell(\boldsymbol{\theta}| ~\cdot)
Assume that \mathbf{\Lambda}_0 is a p\times d pseudo factor loading matrix where k \leq d \leq p
OFAL estimate: our approach via penalised likelihood
\displaystyle\hat{\boldsymbol{\theta}}_{\text{OFAL}} = \argmax_{\boldsymbol{\theta}} \left\{\ell(\boldsymbol{\theta}) - s \sum _{l=1}^d \omega_{g,l} \sqrt{ \sum_{i=1}^p\sum_{j=l}^d \lambda_{0,ij}^2} - s \sum_{i=1}^p\sum_{j=1}^d \omega_{e,ij}|\lambda_{0,ij}|\right\}
where
Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics
\mathbf{\Lambda}_0
Say s\omega_{e,15} \rightarrow \infty, then you would expect |\lambda_{15}| \rightarrow 0.
\mathbf{\Lambda}_0
Say s\omega_{g,5} \rightarrow \infty, then you would expect \left(\sum_{l=1}^p(\lambda_{l5}^2 + \lambda^2_{l6})\right)^{1/2} \rightarrow 0.
Sum of squares is zero only if each element is zero, so \lambda_{l5}\rightarrow 0 and \lambda_{l6}\rightarrow 0 for l=1, ..., p.
\mathbf{\Lambda}_0
Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics
Well now how to get people to use your method?
Non-Gaussian Distribution
\color{green}{g}(\mu_{ij}) = \eta_{ij} = \color{purple}{\boldsymbol{x}_i}^\top\boldsymbol{\beta}_j + \boldsymbol{u}_i^\top\boldsymbol{\lambda}_j
Gaussian Distribution
y_{ij} = \color{purple}{\boldsymbol{x}_i}^\top\boldsymbol{\beta}_j + \boldsymbol{u}_i^\top \boldsymbol{\lambda}_j + \epsilon_{ij}
What if the cells contain >1 observations or no observation?
Multivariate data not rectangular?
Users: "Let's make it rectangular [to fit the software]"
Bottomline: software design encourages certain user behaviour whether that be good or not
Tanaka and Hui (2019) Symbolic formulae for linear mixed models. https://arxiv.org/abs/1911.08628
Tanaka and Hui (2019) Symbolic formulae for linear mixed models. https://arxiv.org/abs/1911.08628
These slides are made using xaringan
R-package and can be found at
emitanaka.org/slides/WOMBAT2019
Emi Tanaka
dr.emi.tanaka@gmail.com
@statsgen
─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.6.0 (2019-04-26) os macOS Mojave 10.14.6 system x86_64, darwin15.6.0 ui X11 language (EN) collate en_AU.UTF-8 ctype en_AU.UTF-8 tz Australia/Melbourne date 2019-11-29 ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package * version date lib source anicon 0.1.0 2019-05-28 [1] Github (emitanaka/anicon@377aece) assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0) broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.0) callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0) cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0) cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) curl 4.2 2019-09-24 [1] CRAN (R 3.6.0) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) devtools 2.0.2 2019-04-08 [1] CRAN (R 3.6.0) digest 0.6.22 2019-10-21 [1] CRAN (R 3.6.0) dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.0) emo 0.0.0.9000 2019-06-03 [1] Github (hadley/emo@02a5206) evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.0) fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0) ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0) glue 1.3.1.9000 2019-10-24 [1] Github (tidyverse/glue@71eeddf) gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0) haven 2.1.0 2019-02-19 [1] CRAN (R 3.6.0) hms 0.5.1 2019-08-23 [1] CRAN (R 3.6.0) htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0) httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0) icon 0.1.0 2019-05-28 [1] Github (ropenscilabs/icon@a510f88) jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0) knitr 1.25 2019-09-18 [1] CRAN (R 3.6.0) lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.0) lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0) lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) modelr 0.1.4 2019-02-18 [1] CRAN (R 3.6.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0) nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.0) pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0) pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0) processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0) ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) purrr * 0.3.2 2019-03-15 [1] CRAN (R 3.6.0) quoter 0.1.0 2019-07-28 [1] local R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0) Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0) readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0) readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0) remotes 2.0.4 2019-04-10 [1] CRAN (R 3.6.0) rlang 0.4.0.9000 2019-08-03 [1] Github (r-lib/rlang@b0905db) rmarkdown 1.16 2019-10-01 [1] CRAN (R 3.6.0) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0) rvest 0.3.4 2019-05-15 [1] CRAN (R 3.6.0) scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.0) selectr 0.4-1 2018-04-06 [1] CRAN (R 3.6.0) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0) stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0) tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0) tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.0) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0) tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.0) usethis 1.5.0 2019-04-07 [1] CRAN (R 3.6.0) vctrs 0.2.0.9000 2019-08-03 [1] Github (r-lib/vctrs@11c34ae) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) xaringan 0.9 2019-03-06 [1] CRAN (R 3.6.0) xfun 0.10 2019-10-01 [1] CRAN (R 3.6.0) xml2 1.2.0 2018-01-24 [1] CRAN (R 3.6.0) yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0) zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0) [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
These slides are licensed under
Presented by Emi Tanaka
School of Mathematics and Statistics
dr.emi.tanaka@gmail.com
@statsgen
29th Nov 2019 @ WOMBAT2019 | Melbourne, Australia
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |