Inference for linear regression

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Making inferences

The standard deviation of an estimator is referred to as the standard error.
In making inferences, you may like to know:
- what is the standard error of \(\hat{\beta}_1\)?
- is \(\beta_1 \neq 0\)?
- what is the confidence interval of the average \(y\) for a given \(x\)?
- what is the prediction interval of \(y\) for a given \(x\)?
It’s important to note that making inferences require the assumptions of the linear regression model to be satisfied.
So check model assumptions first!

Hypothesis testing for regression parameters

Hypothesis: Suppose we want to test if the \(j\)-th regression parameter is significant: \[H_0: \beta_j = 0 \quad \text{vs} \quad H_A: \beta_j \neq 0\] where \(j \in \{1, ..., p\}\) and \(p\) is the number of regression parameters. Note for simple linear regression \(p = 2\).
Assumption: suppose the errors are independent and identically normally distributed with mean \(0\) and constant variance \(\sigma^2\).
Test statistic: The test statistic and its distribution under \(H_0\) is \[t = \dfrac{\hat{\beta}_j - \beta_j}{\text{SE}(\hat{\beta}_j)} \sim t_{n-p}.\] where \(\text{SE}(\hat{\beta}_j)\) is the standard error of \(\hat{\beta}_j\).

Hypothesis testing for regression parameters: P-value

P-value: \(P(|t_{n - p}| > |t|)\)

Model objects to tidy data

So how do I extract these summary values out?

Confidence interval for regression parameters

The \(100(1-\alpha)\%\) confidence interval for the \(j\)-th regression parameter is

\[\hat{\beta}_j \pm t_{n-p, \alpha/2} \times \text{SE}(\hat{\beta}_j).\]

For \(\alpha = 0.05\), the 95% confidence interval for the regression parameters are:

Confidence interval for the response

The 95% confidence interval for the mean response at \(x = 5\) is \[\hat{y} \pm t_{n-p, \alpha/2} \times \text{SE}(\hat{y})\] where \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 \times 5\).

Prediction interval for the response

The prediction interval considers the uncertainty in the error as well and is always wider than the corresponding confidence interval.
The 95% prediction interval for the response at \(x = 5\) is \[\hat{y} \pm t_{n-p, \alpha/2} \times \sqrt{\text{SE}(\hat{y})^2 + \hat{\sigma}^2}\] where \(\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 \times 5\).

Standard error for the response

\[\text{SE}(\hat{y}) = \sqrt{\text{Var}(\hat{\beta}_0 + \hat{\beta}_1 x)} = \sqrt{\text{Var}(\hat{\beta}_0) + x^2 \text{Var}(\hat{\beta}_1) + 2x \text{Cov}(\hat{\beta}_0, \hat{\beta}_1)}.\]

Summary

scroll

Source: xkcd

Fitting a linear model:

Getting the model summary:

Getting just the regression coefficient estimates

Getting the regression coefficient table in the model summary as a tibble:

Getting the fitted values \(\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i\):

Getting the residuals \((y_i - \hat{y}_i)\):

Augment the data with fitted values, residuals, etc:

Getting the deviance (residual sum of squares):

The estimate of the error standard deviation \(\hat{\sigma}\):

Getting the influence measures such as Cook’s distance and leverage values:

Selecting \(\lambda\) for box-cox transformation:

Quick diagnostic plots:

Confidence interval for regression parameters:

Prediction for mean response:

Confidence interval for mean response:

Prediction interval for response:

Standard error for prediction of mean response: