ETC3250/5250: Introduction to Machine Learning

Artificial neuron: dendrites

The artificial neuron is the elementary units of an artificial neural network.

The artificial neuron receives predictors $\boldsymbol{x}_i = (1, x_{i1}, \dots,x_{ip})^\top$ that is typically combined as a weighted sum:

$z_i = \beta_0 + \sum_{j=1}^p\beta_jx_{ij} = \boldsymbol{\beta}^\top\boldsymbol{x}_i, \quad\text{where }\boldsymbol{\beta} = (\beta_0, \beta_1, \dots, \beta_p)^\top.$

Artificial neuron: action potential

The $z_i$ then get passed into the activation function, $h(z_i)$ .
Suppose $a$ is the location parameter and $w$ is the scale parameter.
Some common choices include:
- Heaviside step: $h(z_i) = \mathbb{I}(z_i > 0)$ ,
- Linear: $h(z_i) = z_i$ ,
- Sigmoid: $h(z_i|a,w) = a + w(1+e^{-z_i})^{-1}$ ,
- Tanh: $h(z_i|a,w) = a + w \left(\frac{2}{1+e^{-2z_i}} - 1 \right)$ , and
- ReLU: $h(z_i|a,w) = a + w \times \max(0, z_i)$ .

Visualising artifical neuron

When $x_1 = 1$ and $x_2 = 3$ , then $\begin{align*}z &= \beta_0 + \beta_1x_1 + \beta_2 x_2\\ &= 1 + 0.5 \times 1 - 3\times 3 = -7.5.\end{align*}$
Using ReLU with $a = 1, w = 2$ , the prediction is $1 + 2 \times \max(0, z) = 1.$

Combining artificial neurons for regression

In general, we combine $K$ neurons as $f(\boldsymbol{x}_i) = b + \sum_{k=1}^Kw_kh(\boldsymbol{\beta}_k^\top\boldsymbol{x}_i)$
- $\boldsymbol{\beta}_k$ is the coefficient of predictors in the $k$ -th artificial neuron,
- $b$ is called the bias,
- $w_k$ is the weight corresponding to the $k$ -th neuron.
The activation function $h$ is always of the same type.

Example of combining artificial neurons

$f(\boldsymbol{x}_i) = b + w_1{\color{#027EB6}{h(\boldsymbol{\beta}_1^\top\boldsymbol{x}_i)}} + w_2\color{#EE0220}{h(\boldsymbol{\beta}_2^\top\boldsymbol{x}_i)}$

This represents a neural network with

2 nodes in the input layer,
2 nodes in the middle layer, and
1 node in the output layer with parameters:
- $b = 0$ ,
- $w_1 = 0.5$ ,
- $w_2 = 0.9$ ,
- $\boldsymbol{\beta}_1 = (3, -5)^\top$ ,
- $\boldsymbol{\beta}_2 = (1, 0.5)^\top$ , and
- $h$ is the ReLU activation function with $a = 0$ and $w = 1$ .

Combining artificial neurons for classification

The output of the previous example is only applicable for regression problems.
We can easily modify this for classfication by changing the output layers to say the Sigmoid function, which gives a numerical value between 0 and 1 and can be thought of as the propensity score. $P(y_i = 1 | \boldsymbol{x}_i) = \frac{1}{1 + \exp\left(-(b + \sum_{k=1}^Kw_kh(\boldsymbol{\beta}_k^\top\boldsymbol{x}_i))\right)}.$

Multi-class classification

The Sigmoid functions allows for computation of propensity scores for binary outcomes.
If you have more than two classes in your response, then you need to convert it to dummy variables, or otherwise referred to as one-hot encoding.
So for a categorical variable with $m$ levels, $y_i \in \{\text{Class 1}, \dots, \text{Class }m\}$ , we convert it as: $y_{ik} = \begin{cases}1 & \text{if } y_i = \text{Class}k\\0 & \text{if } y_i \neq \text{Class}k\end{cases}$

Softmax activation function

The Sigmoid function only works for $m = 2$ .
For $m > 2$ , we can use the Softmax activation function instead: $P(y_{ij} = 1 | \boldsymbol{x}_i) = \frac{\exp(\boldsymbol{\beta}_j^\top\boldsymbol{x}_i)}{\sum_{j=1}^m\exp\left(\boldsymbol{\beta}_j^\top\boldsymbol{x}_i\right)}.$
The number of neurons for the Softmax layer must be $m$ .
Note that $\sum_{j=1}^m P(y_{ij} = 1 | \boldsymbol{x}_i) = 1$ .

Solution

Layer 2

ReLU: $\max(0, 49.62 - 0.37 \times 45) = 32.97$
ReLU: $\max(0, 27.62 - 0.19 \times 45) = 19.07$
ReLU: $\max(0, -1.72 + 0.35 \times 45) = 14.03$

Output layer

Cheap: $\frac{\exp(32.97)}{\exp(32.97) + \exp(19.07) + \exp(14.03)} = 0.9999991$
Average: $\frac{\exp(19.07)}{\exp(32.97) + \exp(19.07) + \exp(14.03)} = 0.00000092$
Expensive: $\frac{\exp(14.03)}{\exp(32.97) + \exp(19.07) + \exp(14.03)} = 0.0000000059$

Prediction: The customer will buy the cheap brand.

1 / 36

No notes on this slide.

Neural network I - table of contents ETC3250/5250 Neural network I Supervised learning Human brain Artificial neuron Biological neuron model Artificial neuron: dendrites Artificial neuron: action potential Visualising artifical neuron Predicting from an artificial neuron Activation function Heaviside step Linear Sigmoid (Logistic) Hyperbolic tangent (Tanh) Rectified linear unit (ReLU) Neural network Limitations of a single artificial neuron Multiple artificial neurons Combining artificial neurons for regression Example of combining artificial neurons Combining artificial neurons for classification Regression vs classification Multi-class classification Multi-class classification From categorical variable to dummy variables Softmax activation function An illustration of a Softmax layer Solution Building a neural network structure with R Installing keras Building a neural network structure in R Examining the weights and biases Manually setting the weights Prediction from neural network model Takeaways