lollms-webui/notebooks/ggml_quantize.ipynb
2024-02-27 13:03:35 +01:00

3.5 KiB
Raw Permalink Blame History

The first step consists of compiling llama.cpp and installing the required libraries in our Python environment.

In [ ]:
# Install llama.cpp
!git clone https://github.com/ggerganov/llama.cpp
!cd llama.cpp && git pull && make clean && LLAMA_CUBLAS=1 make
!pip install -r llama.cpp/requirements.txt

Now we can download our model. We will use an jondurbin/airoboros-m-7b-3.1.2 model

In [ ]:
MODEL_ID = "jondurbin/airoboros-m-7b-3.1.2"

# Download model
!git lfs install
!git clone https://huggingface.co/{MODEL_ID}

This step can take a while. Once its done, we need to convert our weight to GGML FP16 format

In [ ]:
MODEL_NAME = MODEL_ID.split('/')[-1]

# Convert to fp16
fp16 = f"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin"
!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}

Finally, we can quantize the model using one or several methods. In this case, we will use the Q4_K_M and Q5_K_M methods. This is the only step that actually requires a GPU.

In [ ]:
QUANTIZATION_METHODS = ["q4_k_m", "q5_k_m"]

for method in QUANTIZATION_METHODS:
    qtype = f"{MODEL_NAME}/{MODEL_NAME.lower()}.{method.upper()}.gguf"
    !./llama.cpp/quantize {fp16} {qtype} {method}

Finally, we can push our quantized model to a new repo on the Hugging Face Hub with the “-GGUF” suffix. First, lets log in and modify the following code block to match your username. You can enter your Hugging Face token (https://huggingface.co/settings/tokens) in Google Colabs “Secrets” tab. We use the allow_patterns parameter to only upload GGUF models and not the entirety of the directory.

In [ ]:
!pip install -q huggingface_hub
from huggingface_hub import create_repo, HfApi
from google.colab import userdata

# Defined in the secrets tab in Google Colab
hf_token = userdata.get('huggingface')

api = HfApi()
username = "parisneo"

# Create empty repo
create_repo(
    repo_id = f"{username}/{MODEL_NAME}-GGUF",
    repo_type="model",
    exist_ok=True,
    token=hf_token
)

# Upload gguf files
api.upload_folder(
    folder_path=MODEL_NAME,
    repo_id=f"{username}/{MODEL_NAME}-GGUF",
    allow_patterns=f"*.gguf",
    token=hf_token
)