Introduction to Large Language Models for Statisticians

Author

Emi Tanaka

Published

October 15, 2024

👋 Welcome

This workshop serves as an introduction to Large Language Models (LLMs), specifically tailored for statisticians. The concept behind LLMs are distilled and presented in a way that is accessible and relevant to those with a background in statistics. The workshop will help participants understand how LLMs can be integrated into existing workflows. Practical applications will be demonstrated primarily through the R programming language. Participants will receive all R codes used in the demonstration, enabling them to replicate the analyses and continue exploring LLMs on their own.

Optimal viewing experience

Please note that this website is best viewed using desktop or laptop computers using Google Chrome.

🎯 Learning objectives

  • Understanding the fundamental concepts of Large Language Models (LLMs) and Generative Artificial Intelligence (genAI).
  • Exploring the role of LLMs in modern data analysis and decision-making.
  • Gain insight into practical applications of LLMs in various domains.

🔧 Software requirements

Please ensure that you download and install

  • the latest version of R,
  • an interactive development environment like RStudio Desktop or Positron, and
  • the following packages by opening RStudio Desktop or Positron, then copy and paste the command below in the Console section, pushing Enter after pasting.
install.packages(c('pak', 'usethis', 'tidyverse'))
pak::pak("emitanaka/elmer") # originally: pak::pak("hadley/elmer")
pak::pak("emitanaka/SAI")
pak::pak("AlbertRapp/tidychatmodels") # alternative to elmer (will not be used)
  • For Window users, you may need to install Rtools to install R packages.
  • the latest version of ollama
  • a LLM from ollama by running the command like below in a terminal (not within R). The command below downloads the LLM called llama3.1 with 8B parameters. The LLM is gigabytes and may take a while to download depending on your internet speed.
ollama pull llama3.1:8b

🕜 Schedule

Time (AEDT) Topic
1.30PM - 1.35PM Introductions
1.35PM - 3.00PM Landscape of Large Language Models
R Demo #1 and #2
3.00PM - 3.15PM Break
3.15PM - 4.25PM Architectures of Large Language Models
R Demo #3
4.25PM - 4.45PM Discussion & Conclusion

📚 Slides

️ Demos

License

These workshop materials by Emi Tanaka is licensed under CC BY-NC-ND 4.0