Timothy B. Lee and Sean Trott in Ars Technica:
The goal of this article is to make a lot of this knowledge accessible to a broad audience. We’ll aim to explain what’s known about the inner workings of these models without resorting to technical jargon or advanced math.
We’ll start by explaining word vectors, the surprising way language models represent and reason about language. Then we’ll dive deep into the transformer, the basic building block for systems like ChatGPT. Finally, we’ll explain how these models are trained and explore why good performance requires such phenomenally large quantities of data.
To understand how language models work, you first need to understand how they represent words. Humans represent English words with a sequence of letters, like C-A-T for “cat.” Language models use a long list of numbers called a “word vector.”
More here.