ETC3250/5250

**Introduction to Machine Learning**

Lecturer: *Emi Tanaka*

Department of Econometrics and Business Statistics

- Regression models propose y_i = f(x_{i1}, x_{i2}, ..., x_{ip}) + e_i where the goal is to estimate the function f.

- Methods for estimation:
**parametric**- assumes model takes a specific form to data**non-parametric**- no or less specific assumptions of the functional form to all data**semi-parametric**- (not covered in this unit) combines parametric and non-parametric methods

- In a parametric regression, some type of distribution is assumed in advance.
- Therefore, fitted parametric regression models can lead to fitting a smooth curve that misrepresents the data.
- Non-parametric regression works for well to fitting a line to a scatter plot where noisy data values, sparse data points or weak inter-relationships interfere with your ability to see a line of best fit.
- A drawback of non-parametric regressions is that it does not produce a functional form of the fitted model.

- Some methods:
**local regression**: sliding window with regression fitted to subsets**step functions**: cut the range of a predictor into distinct regions**regression splines**: combine polynomials and step functions to different subsets of a predictor.

- These methods offer a lot of flexibility, while maintaining the ease and interpretability of linear models.

This dataset is available from http://research.stlouisfed.org/fred2.

- The curve below is a fit of the polynomial model of order 27:

\color{#006DAE}{y_i = \beta_0 + \beta_1x_{i1}+ \beta_2x_{i1}^2 + \cdots + \beta_{27}x_{i1}^{27} + e_i}

- LOESS (LOcal regrESSion) and LOWESS (LOcally WEighted Scatterplot Smoothing) are
**non-parametric regression**methods (LOESS is a generalisation of LOWESS). **LOESS fits a weighted low order polynomial model to a subset of neighbouring data**.- A user specified “bandwidth”, “smoothing parameter” or “span” \alpha determines how much of the data is used to fit each local polynomial model.
- Large \alpha produce a smoother fit.
- Small \alpha overfits the data with the fitted regression capturing the random error in the data.

The model can be fitted using the `loess`

function where

- the
*default span*is 0.75 and - the
*default local polynomial degree*is 2.

In `ggplot`

, you can add the loess using `geom_smooth`

with `method = loess`

and method arguments passed as list:

`span`

changes the loess fit`loess`

works- The idea of a
**step function**is to cut up the range of a predictor into distinct regions. - This essentially converts a continuous predictor into an ordered categorical variable.
- We don’t normally use this idea alone!

- A wooden or metal strip that fits into another part of a machine to make it rotate.

- A thin, long wood or metal to draw smooth curves.

- A
**spline**is a*piecewise polynomial function*where- each piece corresponds to a disjoint subinterval that makes up the range of the variable, and
- the function output is the same values at the subinterval boundaries.

- The boundaries of the subintervals are called
**knots**. - The
of a spline is based on adjacent polynomial pieces sharing common derivative values or up to a certain order.*smoothness* - The simplest spline consists of step functions (but step functions are not necessary a spline).

- All possible splines can be constructed from a linear combination of B-splines.

- Natural cubic splines is a spline with degree 3 such that the second derivative is zero at the boundaries (i.e. is a linear function at the boundaries).

```
map_dfr(c(0, 2, 3, 8, 14, 48),
~ economics %>%
mutate(
nknot = .x,
nknot_label = paste(nknot, "knots") %>% fct_inorder()
)) %>%
ggplot(aes(date, uempmed, nknot = nknot)) +
geom_point() +
geom_smooth(
method = lm,
formula = y ~ splines::ns(x, df = nknot + 1),
se = FALSE,
colour = "orangered3"
) +
facet_wrap(~ nknot_label) +
ggtitle("Natural cubic splines")
```

- Non-parametric regression is useful in data exploration and analysis although parameters must be carefully chosen not to overfit the data.

ETC3250/5250 Week 2