
STAT1003 – Statistical Techniques
Dr. Emi Tanaka
Australian National University
These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.
We seek to model the relationship between:
\[y = f(x) = \beta_0 + \beta_1 x.\]
\(y_i =\) \(\beta_0\) \(+\) \(\beta_1\)\(x_i +\) \(\epsilon_i\)
intercept slope error for the \(i\)-th observation
\[\text{RSS}(\beta_0, \beta_1) = \sum_{i=1}^n \left(\underbrace{y_i - (\beta_0 + \beta_1 x_i)}_{\text{residual}}\right)^2\]
\[\begin{align*} \frac{\partial \text{RSS}}{\partial \beta_0} &= -2\sum_{i=1}^n \left(y_i - (\beta_0 + \beta_1 x_i)\right) = 0\\ \frac{\partial \text{RSS}}{\partial \beta_1} &= -2\sum_{i=1}^n x_i \left(y_i - (\beta_0 + \beta_1 x_i)\right) = 0 \end{align*}\]
Solving the above equations gives the least squares estimates:
\[\begin{align*} \hat{\beta}_1 &= \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} = \frac{s_{xy}}{s_x^2}\\ \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \end{align*}\]
where:
\[\texttt{Weight}_i=\beta_0 + \beta_1\texttt{Length}_i + e_i\]
fit even contain?coef(), fitted(), predict(), residuals(), and sigma().
\[\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i.\]

\[\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x.\]

\[\hat{\epsilon}_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i).\]
geom_smooth() makes it easy to add the model to a scatter plotggpubr::stat_regline_equation() adds the regression line to the plotRecall correlation coefficient \[r = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}} = \frac{s_{xy}}{s_x s_y}\] is a measure of the strength and direction of the linear relationship between two variables.
Relationship between the slope and the correlation coefficient:
\[\hat{\beta}_1 = \frac{s_{xy}}{s_{x}^2} = \frac{s_{xy}}{s_x s_y}\frac{s_y}{s_x} = r \frac{s_y}{s_x}.\]
viewof nsample = Inputs.number([20, 1000], {step: 20, value: 200, label: "n"})
viewof intercept = Inputs.range([-10, 10], {step: 0.05, value: 0.8, label: "β₀"})
viewof slope = Inputs.range([-10,10], {step: 0.05, value: 0.8, label: "β₁"})
viewof sigma = Inputs.range([0.1, 20], {step: 0.1, value: 3, label: "σ"})\[y_i = \beta_0 + \beta_1 x_i + \epsilon_i\]
where \(\epsilon_i \stackrel{iid}{\sim} N(0, \sigma^2)\) for \(i = 1, 2, \ldots, n\).

STAT1003 – Statistical Techniques