Introduction to Machine Learning
Lecturer: Emi Tanaka
Department of Econometrics and Business Statistics
This lecture benefited from the lecture notes by Dr. Ruben Loaiza-Maya.
Here, the addition of 2 neurons increases the number of parameters by 10.
Input layer
x=(1,x1)⊤
Input layer
x=(1,x1)⊤
Layer 2
Input layer
x=(1,x1)⊤
Layer 2
Input layer
x=(1,x1)⊤
Layer 2
Input layer
x=(1,x1)⊤
Layer 2
Input layer
x=(1,x1)⊤
Layer 2
Layer 3
Input layer
x=(1,x1)⊤
Layer 2
Layer 3
Input layer
x=(1,x1)⊤
Layer 2
Layer 3
Input layer
x=(1,x1)⊤
Layer 2
Layer 3
Output layer
Input layer
x=(1,x1)⊤
Layer 2
Layer 3
Output layer
Input layer
x=(1,x1)⊤
Layer 2
Layer 3
Output layer
MSE(θ)=n1i=1∑n(yi−f(xi ∣ θ))2
BCE(θ)=−n1i=1∑n{yilog(P(yi=1 ∣ xi,θ))+(1−yi)log(1−P(yi=1 ∣ xi,θ))}
CE(θ)=−n1i=1∑nj=1∑myijlog(P(yij=1 ∣ xi,θ))
θ[s+1]=θ[s]−r∂θ∂MSE(θ)∣∣θ=θ[s]
θ[s+1]=θ[s]−r∂θ∂MSE(θ)∣∣θ=θ[s]
θ[s+1]=θ[s]−r∂θ∂MSE(θ)∣∣θ=θ[s]
θ[s+1]=θ[s]−r∂θ∂MSE(θ)∣∣θ=θ[s]
θ[s+1]=θ[s]−r∂θ∂MSE(θ)∣∣θ=θ[s]
⋯
θ[s+1]=θ[s]−r∂θ∂MSE(θ)∣∣θ=θ[s]
Epoch 1
Epoch 1
Epoch 2
In evaluating ∇MSE^(θ), we must evaluate ∇(yi−f(xi ∣ θ))2.
Now we have: ∇(yi−f(xi ∣ θ))2=−2(yi−f(xi ∣ θ))×Layer L−1∂ai(L−1)∂f(xi∣θ)×Layer L−2∂ai(L−2)∂ai(L−1)×Layer L−3∂ai(L−3)∂ai(L−2)×⋯×Layer 1∂θ∂ai(2).
The gradient of the later layers is calculated earlier.
For parameter θ, we can show ∂θ∂(yi−f(xi ∣ θ))2=constant×Layer L−1∂ai(L−1)∂hL(ai(L−1))×Layer L−2∂ai(L−2)∂hL(ai(L−2))×Layer L−3∂ai(L−3)∂hL(ai(L−3))×⋯×Layer 1∂θ∂h2(xi∣θ).
∂θ∂(yi−f(xi ∣ θ))2=constant×Layer L−1∂ai(L−1)∂hL(ai(L−1))×Layer L−2∂ai(L−2)∂hL(ai(L−2))×Layer L−3∂ai(L−3)∂hL(ai(L−3))×⋯×Layer 1∂θ∂h2(xi∣θ).
scroll
Label | Description |
---|---|
0 | T-shirt/top |
1 | Trouser |
2 | Pullover |
3 | Dress |
4 | Coat |
5 | Sandal |
6 | Shirt |
7 | Sneaker |
8 | Bag |
9 | Ankle boot |
pixel1
, …, pixel784
) and 1 response variable (label
) labelled from 0-9.label
!).keras
package.library(keras)
NN <- keras_model_sequential() %>%
# hidden layer
layer_dense(units = 128, # number of nodes in hidden layer
activation = "relu",
# number of predictors
input_shape = 784) %>%
# output layer
layer_dense(units = 10, # the number of classes
# we need to use softmax for multi-class classification
activation = "softmax")
NN
Model: "sequential"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
dense_1 (Dense) (None, 128) 100480
dense (Dense) (None, 10) 1290
================================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
________________________________________________________________________________
mean_squared_error
.binary_crossentropy
.categorical_crossentropy
.help("loss-functions")
.This model takes long to fit!
epochs
is a large number.plot
function on the training history.NNl
and not learnNNl
here! [,1] [,2] [,3] [,4] [,5]
[1,] 9.959991e-01 4.470216e-15 2.498414e-06 1.280446e-06 6.907984e-09
[2,] 2.188681e-21 1.000000e+00 5.769808e-32 9.360376e-28 1.490198e-26
[3,] 6.637161e-03 6.112183e-16 9.681513e-01 3.647272e-13 6.959190e-06
[,6] [,7] [,8] [,9] [,10]
[1,] 2.304835e-11 3.900097e-03 9.305310e-19 9.703544e-05 4.587546e-13
[2,] 0.000000e+00 8.628763e-31 0.000000e+00 4.123140e-29 1.746240e-37
[3,] 8.466854e-11 2.520457e-02 8.891557e-21 6.381754e-13 2.705846e-17
ETC3250/5250 Week 12