# ML & Econ reading group
## LLMs
### W6: Practical implementation
#### Jeremy Large, 14 Nov 2023
[Huggingface NLP course](https://huggingface.co/learn/nlp-course/)

A snapshot of our group:

 - 85% use python and/or R
 - 50% use python
 - 10-20% use python.pytorch
 - 10-20% use Google Colab
 - 0% use HuggingFace

### Plan for today
- Overviews
  - HuggingFace hub
  - python
  - jupyter notebooks
  - transformers

- Demonstration / try-out
  - [Zephyr 7b](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
  - written in pytorch
  - accessed via the [transformers API](https://github.com/huggingface/transformers)
  - we interact using a jupyter notebook
  - we code in python
  - the notebook is hosted on Google's Colab service
  - Google gives it access to a GPU, using CUDA
  

## [HuggingFace](https://huggingface.co/) overview

This platform is attracting state-of-the-art AI models
- **Open source**: 
  - Largest hub of pre-trained AI models
- **Simplification versus precision**: 
  - Trade-offs tend to favour simplicity and community recommendations
- **Mostly in python**

Pre-trained models for most/all AI tasks

- **Extensive Collection**: Choose from thousands of models
- **Task-Specific Models**: Tailored for classification, translation, summarization, etc.
- **Community Models**: Contributions from researchers and industry professionals

Things people have done with HuggingFace tech

- **Content Creation**: Automating writing and creative content generation
- **Semantic Search**: Enhancing search engines with NLP for better results
- **Education**: Personalized learning assistance and automated grading systems

Recent release on HuggingFace:
- [Zephyr 7B beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
- [Zephyr 7B](https://arxiv.org/abs/2310.16944)
- [Mistral 7B](https://mistral.ai/news/announcing-mistral-7b/)

Zephyr is a fine-tuning of Mistral

We will run Zephyr later on today

Fine-Tuning with HuggingFace Transformers:
- Select a pre-trained model relevant to your task
- Prepare your dataset in a compatible format
- Use 
  - [HuggingFace's Trainer API](https://brev.dev/blog/fine-tuning-mistral) or 
  - your own training loop
  - or [find code that has done this before](https://github.com/abacaj/fine-tune-mistral)
- Adjust hyperparameters to suit your data and task
- Train the model on your data, monitoring for convergence


# Python

Python - Language of Choice for Machine Learning and AI

- **Why Python?**: Popular for its simplicity and readability
- **Python in AI**: Dominant language with a rich ecosystem of libraries
- **Interpreted Language**: Easy to write and test - good for rapid prototyping

Python Syntax Essentials

- **Indentation**: Python uses indentation, not braces, to define scope
- **Variables**: No need to declare a type - `x = 5`, `name = "Alice"`
- **Functions**: Defined with `def` keyword - `def my_function():`
- **Attributes**: Dot is used eg - `car.door.window.switch`
- **Methods**: Actions an object can take eg - `car.door.window.switch.open()`

- **Lists**: Ordered and changeable collections - `my_list = [1, 2, 3]`
- **Dictionaries**: Key-value pairs - `my_dict = {'name': 'Alice', 'age': 25}`
- **Tuples**: Ordered and unchangeable collections - `my_tuple = (1, 2, 3)`
- **Sets**: Unordered collection of unique elements - `my_set = {1, 2, 3}`


Control flow
- **If statements**: `if x > 10:`
- **Loops**: `for item in my_list:` or `while x < 10:`
- **Comprehensions**: `[x for x in range(10)]` for concise loops
- **Exception Handling**: `try:` and `except:` blocks to handle errors


Marshalling open-source libraries:

- **Installing Libraries**: Use `pip` or `conda` to install packages
- **Importing**: `import numpy as np`, `from math import sqrt`
- **Common Libraries**: NumPy for numerical computing, Pandas for data analysis

## Jupyter Notebooks


- One way of writing and running python code
- On the face of it, they can seem a bit amateur
- Interactive documents that combine live, executable code with narrative text, equations, and visualizations
- Widespread use among AI researchers

### As this is a jupyter notebook, we can do things like:

In [1]:
def fibonacci(x):
    out = [0, 1]
    for _ in range(x):
        out.append(out[-2] + out[-1])
    return out[-1]

In [2]:
fibonacci(23)

46368

- Why Jupyter Notebooks?
    - **Can be turned into slides**
    - **Interactive Development**: Test out Python code in real-time, perfect for experimenting with Transformers.
    - **Visualization**: Inline graphs and charts to analyze model performance or data distributions.
    - **Documentation**: Combine code with rich text descriptions, making it ideal for educational purposes and reproducible research.
    - **Support for Markdown**: Document your process and findings with Markdown and LaTeX within the same interface.

Markdown within notebooks supports # $\LaTeX$

This is an inline equation: $V_{sphere} = \frac{4}{3}\pi r^3,$
followed by a display style equation:

$$V_{sphere} = \frac{4}{3}\pi r^3$$

- Jupyter notebooks and HuggingFace:
    - **Experimentation**: Quickly test different models and parameters.
    - **Tutorials and Examples**: HuggingFace provides many Jupyter notebooks as tutorials for their models.
    - **Community Sharing**: Share your models and findings easily with the HuggingFace community.


Hosted Notebook Services:
- **[Binder](https://mybinder.org/)**: Turn a Git repo into a collection of interactive notebooks.

- **[Deepnote](https://deepnote.com/)**: A new generation of Jupyter compatible data science notebook.

- **[HuggingFace + Notebooks](https://huggingface.co/docs/transformers/notebooks)**: Accessible tutorials, easy model experimentation, and community-driven knowledge sharing

- **[Google Colab](https://colab.research.google.com/)**: Free (err...) access to GPUs and TPUs for machine learning.
- **...**

# Transformers

Getting Started with Transformers

- **Environment**: Python environment or Docker container
- **Installation**: `pip install transformers`
- **Dependencies**: Automatically managed

The Building Blocks of NLP Applications

- **Tokenizer**: Converts text to a format understandable by the model
    - Handles various text pre-processing steps
    - Ensures compatibility with different model architectures

- **Model**: Pre-trained models with a wide range of NLP capabilities
    - Fine-tuning options for specific tasks
    - Various architectures (BERT, GPT-2, T5, etc.)

- **Pipeline**: Pre-built routines for end-to-end tasks
    - Simplifies common tasks (e.g., question answering, text generation)
    - Custom pipeline creation for specialized tasks
    - Includes a suitable 'head', eg beam search

Pseudo-Code Demonstration

1. **Loading a Model**: `model = ModelLoader.load("bert-base-uncased")`
2. **Tokenizing Text**: `tokens = Tokenizer.encode("Hello, world!")`
3. **Running the Model**: `predictions = model(tokens)`

# Lets set this up on Google Colab

- upload to Google Drive
- open the notebook
- top-right, pick a processor for the code ('runtime')

- this is onerous code so we need at least:
  - V100 GPU with 'High RAM'
  - which costs some money

# Now we are ready to run some code

In [3]:
import torch  # this better work, or the environment is wrong
import jax # ditto this

In [4]:
torch.cuda.is_available()   # check if there is a GPU which is ready

True

In [5]:
# import transformers  # this won't work yet

In [6]:
# Classic transformers
!pip install huggingface-hub
!pip install transformers
!pip install accelerate

Collecting huggingface-hub
  Downloading huggingface_hub-0.19.0-py3-none-any.whl (311 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.2/311.2 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: huggingface-hub
Successfully installed huggingface-hub-0.19.0
Collecting transformers
  Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m76.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [7]:
import transformers

# Authentication

In [8]:
# Tokens
# https://huggingface.co/settings/tokens

access_key = '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'
import huggingface_hub
huggingface_hub.login(access_key)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful



# Zephyr 7b

In [9]:
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")

Downloading (…)lve/main/config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

Downloading (…)of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

In [10]:
# We use the tokenizer's chat template to format each message 
# - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a professor of economics at the University of Oxford",
    },
    {
        "role": "user",
        "content": "Explain double-debiased machine learning for causal inference"},
]

In [11]:
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=512, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)

In [11]:
print(outputs[0]["generated_text"])



<|system|>
You are a professor of economics at the University of Oxford</s>
<|user|>
Explain double-debiased machine learning for causal inference</s>
<|assistant|>
Double-debiased machine learning is a statistical technique that combines two debiasing strategies to estimate causal effects more accurately. In causal inference, the goal is to estimate the effect of a treatment on an outcome, while controlling for all other factors that might influence the outcome. Double-debiased machine learning applies two debiasing steps to address sources of estimation error that can arise in complex data settings.

The first debiasing step is called outcome regression debiasing. In this step, the effect of the treatment on the outcome is estimated by regressing the outcome on the treatment and other relevant covariates. However, this estimated effect can be biased due to unobserved confounding variables that are correlated with both the treatment and the outcome. To address this, outcome regression

# Zephyr 7b in more detail

In [12]:
# if we wnated to clear-out memory ...:
# del pipe
# torch.cuda.empty_cache()


In [13]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [13]:
# Load the tokenizer and model
# tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
# model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16)

### Here's a shortcut because we already created `pipe`

In [13]:
tokenizer = pipe.tokenizer
model = pipe.model

In [14]:
model.device

device(type='cuda', index=0)

### Tokenize the prompt

In [15]:
# Tokenize the prompt
inputs = tokenizer(prompt, return_tensors="pt")
print(prompt)
print(inputs)

<|system|>
You are a professor of economics at the University of Oxford</s>
<|user|>
Explain double-debiased machine learning for causal inference</s>
<|assistant|>

{'input_ids': tensor([[    1,   523, 28766,  6574, 28766, 28767,    13,  1976,   460,   264,
         12192,   302, 25426,   438,   272,  2900,   302, 13434,     2, 28705,
            13, 28789, 28766,  1838, 28766, 28767,    13,   966, 19457,  3579,
         28733,   450,  6309,  1293,  5599,  5168,   354,  3599,   282,   297,
          2103,     2, 28705,    13, 28789, 28766,   489, 11143, 28766, 28767,
            13]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1]])}


### Move the input tensors to the same device as the model

In [16]:
inputs = {k: v.to(model.device) for k, v in inputs.items()}
inputs

{'input_ids': tensor([[    1,   523, 28766,  6574, 28766, 28767,    13,  1976,   460,   264,
          12192,   302, 25426,   438,   272,  2900,   302, 13434,     2, 28705,
             13, 28789, 28766,  1838, 28766, 28767,    13,   966, 19457,  3579,
          28733,   450,  6309,  1293,  5599,  5168,   354,  3599,   282,   297,
           2103,     2, 28705,    13, 28789, 28766,   489, 11143, 28766, 28767,
             13]], device='cuda:0'),
 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
          1, 1, 1]], device='cuda:0')}

### Run the model on the inputs

In [17]:
# Forward pass, specifying that we want hidden states and attentions
model_output = model(**inputs, output_hidden_states=True, output_attentions=True)

### OK, lets look inside the model

In [18]:
# Access hidden states; note that the first element is the input embeddings,
# so you might want to start from index 1 to get the hidden states after the first layer
initial_embeddings = model_output.hidden_states[0]
hidden_states = model_output.hidden_states[1:]

In [19]:
# Access attentions
attentions = model_output.attentions

In [20]:
len(hidden_states)

32

In [21]:
initial_embeddings.shape, hidden_states[0].shape, hidden_states[31].shape

(torch.Size([1, 51, 4096]),
 torch.Size([1, 51, 4096]),
 torch.Size([1, 51, 4096]))

### The raw initial embeddings:

In [25]:
initial_embeddings[0]

tensor([[-4.1199e-03,  4.1580e-04, -4.4861e-03,  ..., -4.1771e-04,
         -1.0223e-03, -2.0117e-06],
        [-5.4321e-03, -2.9297e-03, -6.9809e-04,  ...,  2.5940e-03,
         -6.8283e-04, -5.4550e-04],
        [ 6.5327e-05, -2.2278e-03,  7.0190e-04,  ..., -1.0300e-04,
          1.0071e-03,  1.4191e-03],
        ...,
        [ 6.5327e-05, -2.2278e-03,  7.0190e-04,  ..., -1.0300e-04,
          1.0071e-03,  1.4191e-03],
        [ 3.5286e-04,  2.6855e-03,  3.0365e-03,  ..., -1.5721e-06,
          3.6621e-04, -4.6921e-04],
        [-8.8882e-04, -1.0910e-03,  8.0109e-04,  ...,  1.1978e-03,
         -7.5340e-05, -6.7902e-04]], device='cuda:0', dtype=torch.bfloat16,
       grad_fn=<SelectBackward0>)

### The 'transformed' embeddings after all the attention processing:

In [26]:
hidden_states[31][0]

tensor([[ -1.0156,  -0.1875,  -2.3594,  ...,   0.0698,  -1.2891,   1.9219],
        [ -1.6094,  -5.6250,  -1.1953,  ...,   0.6250,  -0.6016,  -1.2266],
        [ -3.0938,  -1.8281,  -0.7539,  ...,  -0.7266,  -5.5312,   0.9688],
        ...,
        [  2.5781, -10.5625,   2.6875,  ...,   1.2344,  -2.4688,  -3.5938],
        [  9.6875,   3.6094,   5.5000,  ...,  -6.0312,  -3.6250,  -6.4375],
        [  7.1250,  -1.4062,   5.3125,  ...,  -1.1250,  -6.4688,  -1.6094]],
       device='cuda:0', dtype=torch.bfloat16, grad_fn=<SelectBackward0>)

### The final logits from this run of the model:

In [22]:
# Get logits from the model output
logits = model_output.logits
logits.shape

torch.Size([1, 51, 32000])

### Simplest way to extract a first word from the logits:

In [23]:
# Find the most likely next token ID (assuming you want the last token in the sequence)
next_token_id = torch.argmax(logits[:, -1, :], dim=-1).item()  # This gives you an integer

In [24]:
next_token_text = tokenizer.decode([next_token_id], skip_special_tokens=True)

In [25]:
next_token_text

'Double'

### A delve into Zephyr's parameters

In [31]:
n_params = 0
import numpy

In [31]:
for i, (name, param) in enumerate(model.named_parameters()):
    n_params += numpy.prod(list(param.size()))
    if i > 10: continue
    print(f"Layer: {name} | Size: {param.size()} | Some of its values : {param[0]} \n")  # example to print first 2 values

Layer: model.embed_tokens.weight | Size: torch.Size([32000, 4096]) | Some of its values : tensor([ 3.4124e-06, -1.3888e-05, -1.3411e-05,  ..., -7.0632e-06,
         2.3842e-06,  9.8944e-06], device='cuda:0', dtype=torch.bfloat16,
       grad_fn=<SelectBackward0>) 

Layer: model.layers.0.self_attn.q_proj.weight | Size: torch.Size([4096, 4096]) | Some of its values : tensor([ 7.7248e-05,  9.6893e-04, -3.3379e-05,  ...,  4.1504e-03,
         2.4438e-05, -4.0054e-04], device='cuda:0', dtype=torch.bfloat16,
       grad_fn=<SelectBackward0>) 

Layer: model.layers.0.self_attn.k_proj.weight | Size: torch.Size([1024, 4096]) | Some of its values : tensor([ 3.1292e-07, -3.1891e-03,  1.2875e-04,  ..., -1.6724e-02,
         1.6022e-04, -6.5613e-04], device='cuda:0', dtype=torch.bfloat16,
       grad_fn=<SelectBackward0>) 

Layer: model.layers.0.self_attn.v_proj.weight | Size: torch.Size([1024, 4096]) | Some of its values : tensor([-3.4714e-04, -1.6861e-03, -6.9046e-04,  ...,  4.9133e-03,
         9

In [32]:
print(f"These are the first 10 of {i} tensors of weights in the model\n")
print(f"The total number of entries in all the tensors is {n_params}")

These are the first 10 of 290 tensors of weights in the model

The total number of entries in all the tensors is 7241732096


# Many thanks