Cloud Native Heidelberg

Building a GPT-2 Model from Scratch – Nanoschnack

Capacity:
in-person
Event date
Feb 26, 26
06:00 PM - 08:30 PM CET
Location
BioQuant Hörsaal 041, Heidelberg
About this event

Join us for an engaging evening focused on the exciting world of AI model building. In this sessions, you'll learn about the fundamental steps involved in creating your own Generative Pre-trained Transformer (GPT) model (with a german perspective :-) and about the internals of Agentic Systems and AI Products.

📍 Coordinates: BioQuant, Grosser Hörsaal (Raum 041)

🗓 Agenda:
🕒 18: 00 – Check-in
🕒 18: 15 – Welcome - Cloud Native Organizers Heidelberg
📢 18:30 – Building a GPT-2 Model from Scratch – Nanoschnack, Stefan Schimanski
📢 19:30 – Beyond Models and into Agentic Systems and AI Products, Adel Zaalouk
🎨 20:15 – Networking. Use the opportunity to get to know somebody new!

Talk 1 Speaker: Stefan Schminanski

Abstract:

Everybody is talking about AI and LLMs — attention, transformers, tokens, embeddings, context windows, system prompts, temperature, backpropagation, PyTorch, KV caching, vLLM, llm-d, pre-training, post-training, reinforcement learning. So many terms, and it's easy to get imposter syndrome in a conversation like that.

Last Christmas holidays, I decided to change that. Inspired by Andrej Karpathy's NanoChat project — and following his recommendation to not just run his code but to write everything from the ground up myself — I set out to build a GPT-2 class model. Think of it as "GPT-2 the Hard Way," in the spirit of Kelsey Hightower's Kubernetes the Hard Way.

I started by studying the "Attention Is All You Need" paper, watching many hours of Karpathy's YouTube lectures, and gradually building an intuition for how everything fits together. Then I wrote my own PyTorch implementation of the GPT-2 architecture and put a data pipeline in front of it using freely available datasets from Hugging Face.

As a twist, my model would be trained exclusively on German data: German internet archive datasets, Wikipedia, Goethe, and transcripts from German YouTube channels.

My bar: only use components I understand well enough to confidently explain them to others — which is what I'll attempt in this talk.

Talk 2 Speaker: Adel Zaalouk

Abstract:

I pulled apart six agent runtimes to see what’s actually inside them. One is 500 lines of TypeScript. Another is 430,000. Both do the same job: connect an LLM to your messaging apps and let it run. The LLM calls are about 50 lines each. The rest handles context windows about to overflow, credentials the model shouldn’t see, messaging channels that need persistent connections, and memory that has to survive between sessions. None of that is machine learning, it’s good ol’ systems engineering.

In this talk we map the agent space onto five layers then show where each sits on the adoption curve, what’s already commoditizing, and where there’s still room to build something that lasts. The model still matters, but it is a fraction of the code in any agent that actually ships. The rest is plumbing. And the plumbing is where the most interesting problems linger.

Meeting details
Use the link below to watch the recording.
Speakers
Organizers
Gallery