mirror of
https://github.com/ParisNeo/lollms-webui.git
synced 2024-12-19 04:17:52 +00:00
131 lines
3.5 KiB
Plaintext
131 lines
3.5 KiB
Plaintext
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"The first step consists of compiling llama.cpp and installing the required libraries in our Python environment."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# Install llama.cpp\n",
|
|||
|
"!git clone https://github.com/ggerganov/llama.cpp\n",
|
|||
|
"!cd llama.cpp && git pull && make clean && LLAMA_CUBLAS=1 make\n",
|
|||
|
"!pip install -r llama.cpp/requirements.txt"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Now we can download our model. We will use an jondurbin/airoboros-m-7b-3.1.2 model"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"MODEL_ID = \"jondurbin/airoboros-m-7b-3.1.2\"\n",
|
|||
|
"\n",
|
|||
|
"# Download model\n",
|
|||
|
"!git lfs install\n",
|
|||
|
"!git clone https://huggingface.co/{MODEL_ID}"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"This step can take a while. Once it’s done, we need to convert our weight to GGML FP16 format"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"MODEL_NAME = MODEL_ID.split('/')[-1]\n",
|
|||
|
"\n",
|
|||
|
"# Convert to fp16\n",
|
|||
|
"fp16 = f\"{MODEL_NAME}/{MODEL_NAME.lower()}.fp16.bin\"\n",
|
|||
|
"!python llama.cpp/convert.py {MODEL_NAME} --outtype f16 --outfile {fp16}"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Finally, we can quantize the model using one or several methods. In this case, we will use the Q4_K_M and Q5_K_M methods. This is the only step that actually requires a GPU."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"QUANTIZATION_METHODS = [\"q4_k_m\", \"q5_k_m\"]\n",
|
|||
|
"\n",
|
|||
|
"for method in QUANTIZATION_METHODS:\n",
|
|||
|
" qtype = f\"{MODEL_NAME}/{MODEL_NAME.lower()}.{method.upper()}.gguf\"\n",
|
|||
|
" !./llama.cpp/quantize {fp16} {qtype} {method}"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Finally, we can push our quantized model to a new repo on the Hugging Face Hub with the “-GGUF” suffix. First, let’s log in and modify the following code block to match your username. You can enter your Hugging Face token (https://huggingface.co/settings/tokens) in Google Colab’s “Secrets” tab. We use the allow_patterns parameter to only upload GGUF models and not the entirety of the directory."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": null,
|
|||
|
"metadata": {},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"!pip install -q huggingface_hub\n",
|
|||
|
"from huggingface_hub import create_repo, HfApi\n",
|
|||
|
"from google.colab import userdata\n",
|
|||
|
"\n",
|
|||
|
"# Defined in the secrets tab in Google Colab\n",
|
|||
|
"hf_token = userdata.get('huggingface')\n",
|
|||
|
"\n",
|
|||
|
"api = HfApi()\n",
|
|||
|
"username = \"parisneo\"\n",
|
|||
|
"\n",
|
|||
|
"# Create empty repo\n",
|
|||
|
"create_repo(\n",
|
|||
|
" repo_id = f\"{username}/{MODEL_NAME}-GGUF\",\n",
|
|||
|
" repo_type=\"model\",\n",
|
|||
|
" exist_ok=True,\n",
|
|||
|
" token=hf_token\n",
|
|||
|
")\n",
|
|||
|
"\n",
|
|||
|
"# Upload gguf files\n",
|
|||
|
"api.upload_folder(\n",
|
|||
|
" folder_path=MODEL_NAME,\n",
|
|||
|
" repo_id=f\"{username}/{MODEL_NAME}-GGUF\",\n",
|
|||
|
" allow_patterns=f\"*.gguf\",\n",
|
|||
|
" token=hf_token\n",
|
|||
|
")"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"language_info": {
|
|||
|
"name": "python"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 2
|
|||
|
}
|