STAT1003 – Statistical Techniques
Dr. Emi Tanaka
Australian National University
These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.
An experiment was conducted to investigate the effect of advertising medias on sales. The predictors consist of advertising budget (in thousands of dollars) for youtube, facebook and newspaper, and the response variable is sales (in thousands of units).
The general multiple linear regression model is: \[y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik} + \varepsilon_i\] where:
The multiple linear regression model can be expressed in matrix form as:
\[\boldsymbol{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\]
where:
The least squares estimator of the coefficients \(\boldsymbol{\beta}\) is given by:
\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \boldsymbol{y}.\]
\[\text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2 (\mathbf{X}^\top \mathbf{X})^{-1}.\]
\(H_0: \beta_j = 0\) vs \(H_A: \beta_j \neq 0\)
\[t^* = \frac{\hat{\beta}_j - \beta_j}{\text{SE}(\hat{\beta}_j)}\]
The model formula in R provides a convenient way to specify the structure of the regression model, including main effects and interaction effects.
Intercept included by default:
y ~ 1 + x1 + x2 is equivalent to y ~ x1 + x2.
Removing intercept:
y ~ 0 + x1 + x2 and y ~ -1 + x1 + x2 both remove the intercept term in the model.
Main and interaction effects:
y ~ x1 * x2 is equivalent to y ~ x1 + x2 + x1:x2.
y ~ x1 * x2 * x3 is equivalent to
y ~ x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3 + x1:x2:x3.
Main effects and two-way interaction effects only:
y ~ (x1 + x2 + x3)^2 and y ~ x1 * x2 * x3 - x1:x2:x3 is equivalent to
y ~ x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3.
Assuming that x1 and x2 are numerical variables, the following model
\(\log(y_i)\) \(=\) \(\beta_0\)\(1\) \(+\) \(\beta_1\) \(\log(x_1)\) \(+\) \(\beta_2\) \(x_2\) \(+\) \(\beta_3\) \(\log(x_1)\times x_2\) \(+ \epsilon_i\),
log(youtube) * facebook allows the effect of (log of) youtube advertisement on sales to depend on the level of facebook advertising, and vice versa.\[\class{highlight mark-yellow}{\hat{\log(\texttt{sales}) }\approx 1.14 + 0.27\times \log(\texttt{youtube}) - 0.00025\times \texttt{facebook} + 0.0024 \times \log(\texttt{youtube})\times \texttt{facebook}}\]
taking the exponential of both sides:
\[\hat{\texttt{sales} }\approx\texttt{youtube}^{0.27 + 0.0024 \times \texttt{facebook}}\exp\left(1.14 - 0.00025\times \texttt{facebook}\right)\]
\[\hat{\log(\texttt{sales}) }\approx 1.14 + 0.27\times \log(50) - 0.00025\times 30 + 0.0024 \times \log(50)\times 30 \approx 2.47\]
\[\begin{align*} \text{Total SS} &= \text{Regression SS} + \text{Residual SS}\\ \sum_{i=1}^n (y_i - \bar{y})^2 &= \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 + \sum_{i=1}^n (y_i - \hat{y}_i)^2 \end{align*}\]
The coefficient of determination \(R^2\) measures the proportion of variability in the response explained by the model:
\[R^2 = 1 - \frac{\text{Residual SS}}{\text{Total SS}}\]
If the linear regression includes an intercept term, then \(R^2\) can be calculated as the square of the correlation between the observed and fitted values of the response variable: \[R^2 = r_{y, \hat{y}}^2 = \frac{\left(\sum_{i=1}^n (y_i - \bar{y})(\hat{y}_i - \bar{y})\right)^2}{\sum_{i=1}^n (y_i - \bar{y})^2\sum_{i=1}^n (\hat{y}_i - \bar{y})^2} = \dfrac{\sum_{i=1}^n (\hat{y}_i - \bar{y})^2}{\sum_{i=1}^n (y_i - \bar{y})^2}\]
In a simple linear regression, \(R^2\) is also equal to \(r^2_{xy}\) (the square of the sample correlation between \(y\) and \(x\)).
Adding predictors will never decrease \(R^2\), even if the predictors are not useful.
Adjusted \(R^2\) penalizes the inclusion of unnecessary predictors:
\[ R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{n - p} \]
This measure may decrease if a new predictor does not improve the model sufficiently.
Multicollinearity occurs when predictors are highly correlated with each other. This can lead to:
\[\texttt{weight}_i = \begin{cases} \mu_F + \varepsilon_i & \text{if female} \\ \mu_M + \varepsilon_i & \text{if male}\end{cases}\]
\[\texttt{weight}_i = \begin{cases} \gamma_0 + \varepsilon_i & \text{if female}\\ \gamma_0 + \gamma_1 + \varepsilon_i & \text{if male} \end{cases}\]
\[\texttt{weight}_i = \beta_1x_{1i} + \beta_2 x_{2i} + \varepsilon_i\] where
Equivalence: \(\mu_F = \gamma_0 = \beta_1\) and \(\mu_M = \gamma_0 + \gamma_1 = \beta_2\).
-1 in the formula removes the interceptI() allows us to include an expression as a predictor.The above model is equivalent to: \[\texttt{weight}_i = \gamma_0 + \gamma_1 x_i + \varepsilon_i\] where \(x_i = 1\) if the \(i\)-th observation is male and \(0\) otherwise.
Recall \(\gamma_1 = \mu_M - \mu_F\) is the difference in mean weight between males and females, and \(\gamma_0 = \mu_F\) is the mean weight for females.
So the average weight for males is \(\hat{\gamma}_0 + \hat{\gamma}_1\) and the average weight for females is \(\hat{\gamma}_0\).
Do you notice something between the two approaches?

STAT1003 – Statistical Techniques