# Dusting off my blog, and Machine Learning

It's been a while.  And I'm digging into Machine Learning.

I was watching the [excellent video by Andrej Karpathy](https://youtu.be/kCc8FmEb1nY) about how to write a GPT (of which GPT-3 is an example) from scratch, using the paper "Attention is all you need"

I implemented it from scratch while watching the video, and also refactored and added some features to make it better.

My code is located [here](https://github.com/ehartford/gpt):

What's this do?

* Generates fake Shakespearean text (could be trained on any text by replacing the training data)
    

```plaintext
FRIAR THUM:
Northum, my father: if detering
Come with my replies, tribunes. Come, title,
It by please now.

TYBALT:
Why, the should I do you gow not in this state,
For esignation to fear any oils?
My hand must we this likely demest for unvy's houses,
Wars 'govern all our erested vains of restre:
Let's tender out and content thousand for me:
That I will wish through the that I meet,
Unless the kingly country, perfection
And those grim our stone. Here any mist noble gague.

CLARENCE:
A sail what
```

* I added anaconda environment so it's reproducible.
    

```bash
conda env create
conda activate gpt
```

* It uses DirectML so it works in Windows or WSL with any video card (tested on AMD Radeon) It can be easily adapted to run in Linux, using Cuda or ROCm.
    

```python
dml = torch_directml.device()
torch.device(dml)
...
data = torch.tensor(encode(text), dtype=torch.long, device=dml)
```

* It uses config.ini for the hyper parameters so you can modify them without changing the code.
    

```python
import configparser
config = configparser.ConfigParser()
config.read('config.ini')
default = config['DEFAULT']
config.batch_size = int(default['batch_size'])
config.block_size = int(default['block_size'])
config.max_iters = int(default['max_iters'])
config.eval_interval = int(default['eval_interval'])
config.learning_rate = float(default['learning_rate'])
config.eval_iters = int(default['eval_iters'])
config.n_embd = int(default['n_embd'])
config.n_head = int(default['n_head'])
config.n_layer = int(default['n_layer'])
config.dropout = float(default['dropout'])

with open('input.txt', 'r', encoding='utf8') as f:
    config.chars = sorted(list(set(f.read())))
config.vocab_size = len(config.chars)
```

* Split into "[generate.py](http://generate.py)" vs "[train.py](http://train.py)", and abstracted the model
    
* "[train.py](http://train.py)" also saves the model to a file when finished generating, and "[generate.py](http://generate.py)" loads the model from the file.
    

train.py

```python
torch.save(blm.state_dict(), 'model.pt')
```

generate.py

```python
blm = BigramLanguageModel().to(dml)
blm.load_state_dict(torch.load('model.pt', map_location=dml))
```

Implementing this from scratch, based on only watching a video and no copy-pasting, is an excellent way to become familiar with the concepts of training a large language model and working with your hardware limitations, as well as hyperparameter tuning. I highly recommend this exercise.