
Introduction to Large Language Models for Statisticians
15th October 2024
Corpus (raw)
Corpus (labelled)
Corpus (raw)
Input Where there’s a will, there’s a
Token Where there ’s a will , there ’s a
Token ID 11977 1354 802 261 738 11 1354 802 261
LLM
2006
way
The artificial neuron is an elementary unit of an artificial neural network.
The artificial neuron receives inputs \boldsymbol{x} = (x_{1}, \dots,x_{p})^\top that is typically combined as a weighted sum: z = b + \sum_{j=1}^pw_jx_{j} = b + \boldsymbol{w}^\top\boldsymbol{x}, where \boldsymbol{w} = (w_1, \dots, w_p)^\top are weights and b referred to as bias.
The z then get passed into the activation function, h(z), e.g.
Input layer
1
x_1
x_2
Output layer
h = ReLU
h(z)
When x_1 = 1 and x_2 = 3, then \begin{align*}z &= b + w_1x_1 + w_2 x_2\\ &= 1 + 0.5 \times 1 - 3\times 3 = -7.5.\end{align*}
Using ReLU, the prediction is \max(0, z) = 0.
Input layer
1
x_1
x_2
Hidden layer
h_1 = ReLU
1
h_1(z_{11})
h_1(z_{12})
Output layer
h_2 = Linear
h_2(z_{21})
Input layer
1
x_1
x_2
Hidden layer
h_1 = ReLU
1
h_1(z_{11})
h_1(z_{12})
Output layer
h_2 = Softmax
h_2(\boldsymbol{z}, 1)
h_2(\boldsymbol{z}, 2)
h_2(\boldsymbol{z}, 3)
Assume classification to K classes.
Softmax:
h(\boldsymbol{z}, i) = \dfrac{\exp(z_i)}{\sum_{j=1}^K\exp(z_j)}
Output node i contains the “probability score” associated with class i.
Training data: Statisticians analyse data Training mode: Continuous Bag Of Words Vocabulary size: 5 Window size: 3 Aim: Predict middle word
Statisticians
1
0
0
0
0
data
0
0
1
0
0
Input
Output
0.01
0.89
0.04
0.01
0.01
Actual:
analyse
0
1
0
0
0
Input The paired t-test p-value is 0.049, so it is statistically significant
Token The paired t -test p -value is 0 . 049 , so it is statistically signicant
llama3.1:8b
)The paired t -test p -value is 0 . 049
The paired t -test p -value is 0 . 049
\times
The query vector of the current position:
=
Result \rightarrow Softmax(Result)
\times
=
Sum above and the new token embedding that incorporates relevant information from other tokens is:
Input:
All models are wrong, but some are
Updated input:
All models are wrong, but some are useful
Updated input:
All models are wrong, but some useful.
Tokenizer
Embedding layer
Transformer blocks
(artificial neural network)
Un-embedding layer
Token | Token ID |
---|---|
All | 2594 |
models | 7015 |
are | 553 |
wrong | 8201 |
, | 11 |
but | 889 |
Token ID | Probability |
---|---|
1236 | 0.844 |
1991 | 0.004 |
12698 | 0.001 |
... | ... |
8316 = useful
13 = .
13 = .
XXXX = <|end|>
XXXX = <|end|>
Output: useful.
LLM
Which three systems of the human body function together to move and control body parts?
Correct answer:
Emily took her final stroll in the park last night, forever, when her life was snuffed out under the mask of night. The cause of death was a single fatal shot from a pistol. Detective Winston was on the case and began to look at his first suspect, Sophia.
Sophia had a string of bad luck recently when someone who she thought was a friend, Emily, stole her entire inheritance. Her evening strolls in the park became franc pacing while she reconciled the fortune she lost. Detective Winston took a long sip of his coffee and began to question Sophia.
‘Quite the marksmen I see’ - pointing to a picture of her holding a recently shot buck up.
‘Yeah, my dad loved taking me shooting’ - Sophia replied sheepishly
Identify the killer. Killers have a motive, means, and opportunity …
Example 1
You will be given a string of words separated by commas or spaces. Your task is to split the string into words and return an array of the words.
For example:
words_string(“Hi, my name is John”) == [“Hi”, “my”, “name”, “is”, “John”]
words_string(“One, two, three, four, five, six”) == [“One”, “two”, “three”, “four”, “five”, “six”]
Example 2
def is_prime(n):
"""Return true if a given number is prime, and false otherwise.
>>> is_prime(6)
False
>>> is_prime(101)
True
>>> is_prime(11)
True
>>> is_prime(13441)
True
>>> is_prime(61)
True
>>> is_prime(4)
False
>>> is_prime(1)
False
"""
emitanaka.org/workshop-LLM-2024/