{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "The first step consists of compiling llama.cpp and installing the required libraries in our Python environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install llama.cpp\n", "!git clone https://github.com/ggerganov/llama.cpp\n", "!cd llama.cpp && git pull && make clean && LLAMA_CUBLAS=1 make\n", "!pip install -r llama.cpp/requirements.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can download our model. We will use an jondurbin/airoboros-m-7b-3.1.2 model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MODEL_ID = \"jondurbin/airoboros-m-7b-3.1.2\"\n", "\n", "# Download model\n", "!git lfs install\n", "!git clone https://huggingface.co/{MODEL_ID}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This step can take a while. Once it’s done, we need to convert our weight to GGML FP16 format" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MODEL_NAME = MODEL_ID.split('/')[-1]\n", "\n", "# Convert to fp16\n", "fp16 = f\"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin\"\n", "!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can quantize the model using one or several methods. In this case, we will use the Q4_K_M and Q5_K_M methods. This is the only step that actually requires a GPU." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "QUANTIZATION_METHODS = [\"q4_k_m\", \"q5_k_m\"]\n", "\n", "for method in QUANTIZATION_METHODS:\n", " qtype = f\"{MODEL_NAME}/{MODEL_NAME.lower()}.{method.upper()}.gguf\"\n", " !./llama.cpp/quantize {fp16} {qtype} {method}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can push our quantized model to a new repo on the Hugging Face Hub with the “-GGUF” suffix. First, let’s log in and modify the following code block to match your username. You can enter your Hugging Face token (https://huggingface.co/settings/tokens) in Google Colab’s “Secrets” tab. We use the allow_patterns parameter to only upload GGUF models and not the entirety of the directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install -q huggingface_hub\n", "from huggingface_hub import create_repo, HfApi\n", "from google.colab import userdata\n", "\n", "# Defined in the secrets tab in Google Colab\n", "hf_token = userdata.get('huggingface')\n", "\n", "api = HfApi()\n", "username = \"parisneo\"\n", "\n", "# Create empty repo\n", "create_repo(\n", " repo_id = f\"{username}/{MODEL_NAME}-GGUF\",\n", " repo_type=\"model\",\n", " exist_ok=True,\n", " token=hf_token\n", ")\n", "\n", "# Upload gguf files\n", "api.upload_folder(\n", " folder_path=MODEL_NAME,\n", " repo_id=f\"{username}/{MODEL_NAME}-GGUF\",\n", " allow_patterns=f\"*.gguf\",\n", " token=hf_token\n", ")" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }