bngo.dev

Notes on “Intro to Large Language Models”

Earlier this month I quit my job and started a mini-sabbatical of sorts, with the goal of taking a step back and deciding what I'd like to focus on next. My intent was to dig deeper into a few trends and build some internal conviction on them. Of course, when looking at the latest trends, you can't go very far these days without bumping into some hype about ✨ AI ✨.

The AI hype train is certainly going full steam up the hype mountain at the moment... but just like the crypto bubble which burst in a spectacular fashion last year, I do believe there is usually something of substance underneath all the hype. In the case of this most recent wave of interest in AI, this is actually all about LLMs.

It just so happened that a few weeks ago, Andrej Karpathy posted an excellent "Intro to LLMs" talk on YouTube. In the spirit of being intellectually rigorous in tempering my skepticism of the hype with a curiosity of what's driving it all underneath, I watched it a few times and produced the notes below on Andrej's talk.

The exercise was really instructive - while I was already familiar with many of the concepts already, the act of taking notes on the talk really helped me organize & solidify the ideas in my head. I might attempt a similar exercise on more content like this in the future.

Quick aside before we get into the notes: I think the reason why this exercise was helpful was because Andrej is an excellent teacher. In his blog posts and videos that I've come across before, this has always been the case, and I want to give props to Andrej for putting this out there 👏. All of the content below comes from Andrej and research he references, and I hope my notes help folks' ability revisit or follow along with concepts his video.


Large Language Models & LLM Inference

Video timestamp: 0:20


LLM Training

Video timestamp: 4:17


LLM Dreams

Video timestamp: 8:58

Dreams, they feel real while we're in them, right? It's only when we wake up that we realize something was actually strange.

~Leonardo DiCaprio as Cobb in Inception (📹)


How are next words produced?

Video timestamp: 11:22


Assistant model

Video timestamp: 14:14


Flow of how to train

Video timestamp: 17:52

Andrej's slide here very effectively describes this section: image capture from Andrej's slide deck summarizing the overall training process

More on fine-tuning

Video timestamp: 21:05


Scaling LLMs

Video timestamp: 25:43


Demos


Future developments

Video timestamp: 35:00


LLM Operating System

Video timestamp: 42:15


Jailbreaking

Video timestamp: 46:14


Prompt Injection

Video timestamp: 51:30


Data Poisoning

Video timestamp: 56:23