Added working GPT-j bindings

This commit is contained in:
ParisNeo 2023-04-24 00:19:15 +02:00
parent 7d7490da2d
commit 42d63d99a8
14 changed files with 153 additions and 75 deletions

View File

@ -46,19 +46,14 @@ Make sure that your CPU supports `AVX2` instruction set. Without it, this applic
### Automatic install
1. Open directory on your computer where you want to download/install this application (This will create new directory: `/gpt4all-ui/`. Make sure a folder with this name does not exist in this direcotry.)
2. Press and hold `Shift` on your keyboard and `Right click` with your mouse inside a folder. Select from a menu `Open Terminal` or `Open to powershell windows here` (This command can hide under `Show more options` in Windows 11).
3. Copy and paste this command and press enter:
```
mkdir gpt4all-ui & curl https://raw.githubusercontent.com/nomic-ai/gpt4all-ui/main/webui.bat -o ./gpt4all-ui/webui.bat ; pushd ./gpt4all/ ; Invoke-Expression -Command "./webui.bat"
```
> **Note**
>
> This command creates new directory `/gpt4all-ui/`, downloads a file [webui.bat](https://raw.githubusercontent.com/nomic-ai/gpt4all-ui/main/webui.bat), changes current work directory to `/gpt4all-ui/` and executes webui.bat that downloads and installs everything that is needed.
It is advised to have python 3.10 (The official one, not the one from microsoft store) and git installed. Although it should work with any python from 3.7, it is advised to use 3.10 to have the full support as some extensions like the future stable diffusion extension will force you to have 3.10.
4. Follow instructions on screen until it launches webui.
5. To relaunch application double click on `webui.bat` file from Windows explorer as normal user.
1. [Goto the latest release section](https://github.com/nomic-ai/gpt4all-ui/releases)
2. Download the `webui.bat` if you are on windows or `webui.sh` if you are on linux/mac. but the download in a folder you name for example gpt4all-ui
3. Run the script and wait. It should install everything and start the chatbot
4. Before running, it may ask you to download a model. Feel free to accept or to download your own models depending on the backends you are using.
Once installed, you should see a new folder called GPT4All. From now on, you can run the app by using webui.bat or webui.sh. The script will check for any new updates
### Manual Simple install:
1. Download this repository .zip:
@ -216,18 +211,29 @@ After that, you can open the application in your browser on http://localhost:960
Now you're ready to work!
# Supported backends
Two backends are now supported:
1 - The llama_cpp backend
2 - The GPT-j backend
3 - Hugging face's Transformers (under construction)
# Supported models
You can also refuse to download the model during the install procedure and download it manually.
**For now, we support ggml models that work "out-of-the-box" (tested on Windows 11 and Ubuntu 22.04.2), such as:**
## LLama_cpp models
- [GPT4ALL 7B](https://huggingface.co/ParisNeo/GPT4All/resolve/main/gpt4all-lora-quantized-ggml.bin) or visit [repository](https://huggingface.co/ParisNeo/GPT4All)
- [GPT4ALL 7B unfiltered](https://huggingface.co/ParisNeo/GPT4All/blob/main/gpt4all-lora-unfiltered-quantized.new.bin) or visit [repository](https://huggingface.co/ParisNeo/GPT4All)
- [Vicuna 7B rev 1](https://huggingface.co/eachadea/legacy-ggml-vicuna-7b-4bit/resolve/main/ggml-vicuna-7b-4bit-rev1.bin) or visit [repository](https://huggingface.co/eachadea/legacy-ggml-vicuna-7b-4bit)
- [Vicuna 13B rev 1](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/resolve/main/ggml-vicuna-13b-4bit-rev1.bin) or visit [repository](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit)
**These models don't work "out-of-the-box" and need to be converted to the right ggml type:**
## GPT-j models
- [GPT-j 7B](https://gpt4all.io/models/ggml-gpt4all-j.bin)
**These models don't work "out-of-the-box" and need to be converted to the right ggml type:**
## LLAMACPP models
- [Vicuna 7B](https://huggingface.co/eachadea/legacy-ggml-vicuna-7b-4bit/resolve/main/ggml-vicuna-7b-4bit.bin) or visit [repository](https://huggingface.co/eachadea/legacy-ggml-vicuna-7b-4bit)
- [Vicuna 13B q4 v0](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vicuna-13b-1.1-q4_0.bin) or visit [repository](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/)
- [Vicuna 13B q4 v1](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vicuna-13b-1.1-q4_1.bin) or visit [repository](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/)

34
app.py
View File

@ -82,8 +82,8 @@ class Gpt4AllWebUI(GPT4AllAPI):
self.add_endpoint(
"/new_discussion", "new_discussion", self.new_discussion, methods=["GET"]
)
self.add_endpoint("/bot", "bot", self.bot, methods=["POST"])
self.add_endpoint("/stop", "stop", self.stop, methods=["POST"])
self.add_endpoint("/generate", "generate", self.generate, methods=["POST"])
self.add_endpoint("/stop_gen", "stop_gen", self.stop_gen, methods=["GET"])
self.add_endpoint("/run_to", "run_to", self.run_to, methods=["POST"])
self.add_endpoint("/rename", "rename", self.rename, methods=["POST"])
@ -239,7 +239,6 @@ class Gpt4AllWebUI(GPT4AllAPI):
@stream_with_context
def parse_to_prompt_stream(self, message, message_id):
bot_says = ""
self.stop = False
# send the message to the bot
print(f"Received message : {message}")
@ -264,28 +263,33 @@ class Gpt4AllWebUI(GPT4AllAPI):
self.discussion_messages = self.prepare_query(message_id)
self.prepare_reception()
self.generating = True
app.config['executor'] = ThreadPoolExecutor(max_workers=1)
app.config['executor'].submit(self.generate_message)
while self.generating:
try:
while not self.text_queue.empty():
value = self.text_queue.get(False)
yield value#.replace("\n","<br>")
if self.cancel_gen:
self.generating = False
break
yield value
time.sleep(0)
except :
except Exception as ex:
print(f"Exception {ex}")
time.sleep(0.1)
if self.cancel_gen:
self.cancel_gen = False
app.config['executor'].shutdown(True, True)
self.generating = False
print("## Done ##")
app.config['executor'].shutdown(True, timeout=5)
self.current_discussion.update_message(response_id, self.bot_says)
self.full_message_list.append(self.bot_says)
bot_says = markdown.markdown(self.bot_says)
yield "FINAL:"+bot_says
self.cancel_gen = False
return bot_says
def bot(self):
self.stop = True
def generate(self):
if self.current_discussion is None:
if self.db.does_last_discussion_have_messages():
@ -308,14 +312,13 @@ class Gpt4AllWebUI(GPT4AllAPI):
), content_type='text/plain; charset=utf-8'
)
def stop(self):
def stop_gen(self):
self.cancel_gen = True
return jsonify({"status": "ok"})
def run_to(self):
data = request.get_json()
message_id = int(data["id"])
self.stop = True
# Segmented (the user receives the output as it comes)
# We will first send a json entry that contains the message id and so on, then the text as it goes
return Response(
@ -380,6 +383,7 @@ class Gpt4AllWebUI(GPT4AllAPI):
def new_discussion(self):
title = request.args.get("title")
timestamp = self.create_new_discussion(title)
app.config['executor'] = ThreadPoolExecutor(max_workers=1)
app.config['executor'].submit(self.create_chatbot)
# Return a success response
@ -392,13 +396,15 @@ class Gpt4AllWebUI(GPT4AllAPI):
print("New backend selected")
self.config['backend'] = backend
self.load_backend(self.BACKENDS_LIST[self.config["backend"]])
models_dir = Path('./models')/self.config["backend"] # replace with the actual path to the models folder
models = [f.name for f in models_dir.glob(self.backend.file_extension)]
if len(models)>0:
self.config['model'] = models[0]
self.load_backend(self.BACKENDS_LIST[self.config["backend"]])
self.create_chatbot()
return jsonify({"status": "ok"})
else:
return jsonify({"status": "no_models_found"})
return jsonify({"status": "error"})
@ -577,7 +583,7 @@ if __name__ == "__main__":
personality = load_config(f"personalities/{config['personality_language']}/{config['personality_category']}/{config['personality']}.yaml")
executor = ThreadPoolExecutor(max_workers=6)
executor = ThreadPoolExecutor(max_workers=1)
app.config['executor'] = executor
bot = Gpt4AllWebUI(app, config, personality, config_file_path)

View File

@ -6,8 +6,8 @@ n_threads: 8
host: localhost
language: en-US
# Supported backends are llamacpp and gpt-j
backend: llama_cpp
model: gpt4all-lora-quantized-ggml.bin
backend: gpt_j
model: ggml-gpt4all-j.bin
n_predict: 1024
nb_messages_to_remember: 5
personality_language: english
@ -23,3 +23,4 @@ top_p: 0.95
voice: ""
use_gpu: false # Not active yet
auto_read: false
use_avx2: true # By default we require using avx2 but if not supported, make sure you remove it from here

View File

@ -1,6 +1,6 @@
# GPT4All Chatbot conditionning file
# Author : @ParisNeo
# Version : 1.0
# Version : 1.1
# Description :
# An NLP needs conditionning to instruct it to be whatever we want it to be.
# This file is used by the GPT4All web ui to condition the personality of the model you are
@ -24,19 +24,19 @@ personality_description: |
# The conditionning instructions sent to eh model at the start of the discussion
personality_conditionning: |
GPT4All is a smart and helpful AI chat bot built by Nomic-AI. It can generate stories on demand.
GPT4All is a smart and helpful Assistant built by Nomic-AI. It can discuss with humans and assist them.
#Welcome message to be sent to the user when a new discussion is started
welcome_message: "Welcome! I am GPT4All A free and open discussion AI. What can I do for you today?"
welcome_message: "Welcome! I am GPT4All A free and open assistant. What can I do for you today?"
# This prefix is added at the beginning of any message input by the user
user_message_prefix: "user: "
user_message_prefix: "### Human:"
# A text to put between user and chatbot messages
link_text: "\n"
# This prefix is added at the beginning of any message output by the ai
ai_message_prefix: "gpt4all: "
ai_message_prefix: "### Assistant:"
# Here is the list of extensions this personality requires
dependencies: []

View File

@ -25,6 +25,7 @@ class GPT4AllAPI():
self.config = config
self.personality = personality
self.config_file_path = config_file_path
self.cancel_gen = False
# This is the queue used to stream text to the ui as the bot spits out its response
self.text_queue = Queue(0)
@ -57,7 +58,7 @@ class GPT4AllAPI():
"""
self.prepare_reception()
self.discussion_messages = "Instruction: Act as gpt4all. A kind and helpful AI bot built to help users solve problems.\nuser: how to build a water rocket?\ngpt4all:"
self.chatbot_bindings.generate(
text = self.chatbot_bindings.generate(
self.discussion_messages,
new_text_callback=self.new_text_callback,
n_predict=372,
@ -68,9 +69,10 @@ class GPT4AllAPI():
repeat_last_n = self.config['repeat_last_n'],
#seed=self.config['seed'],
n_threads=self.config['n_threads']
)
)
print(text)
"""
# generation status
self.generating=False
@ -87,14 +89,10 @@ class GPT4AllAPI():
loader = importlib.machinery.SourceFileLoader(module_name, str(absolute_path))
backend_module = loader.load_module()
backend_class = getattr(backend_module, backend_module.backend_name)
self.backend = backend_class(self.config)
self.backend = backend_class
def create_chatbot(self):
try:
return self.backend(self.config)
except Exception as ex:
print(f"Exception {ex}")
return None
return self.backend(self.config)
def condition_chatbot(self, conditionning_message):
if self.current_discussion is None:
@ -175,23 +173,73 @@ class GPT4AllAPI():
return discussion_messages # Removes the last return
def remove_text_from_string(self, string, text_to_find):
"""
Removes everything from the first occurrence of the specified text in the string (case-insensitive).
Parameters:
string (str): The original string.
text_to_find (str): The text to find in the string.
Returns:
str: The updated string.
"""
index = string.lower().find(text_to_find.lower())
if index != -1:
string = string[:index]
return string
def new_text_callback(self, text: str):
if self.cancel_gen:
return False
print(text, end="")
sys.stdout.flush()
self.full_text += text
if self.is_bot_text_started:
if self.chatbot_bindings.inline:
self.bot_says += text
self.text_queue.put(text)
#if self.current_message in self.full_text:
if len(self.discussion_messages) < len(self.full_text):
self.is_bot_text_started = True
if not self.personality["user_message_prefix"].lower() in self.bot_says.lower():
self.text_queue.put(text)
if self.cancel_gen:
print("Generation canceled")
return False
else:
return True
else:
self.bot_says = self.remove_text_from_string(self.bot_says, self.personality["user_message_prefix"].lower())
print("The model is halucinating")
return False
else:
self.full_text += text
if self.is_bot_text_started:
self.bot_says += text
if not self.personality["user_message_prefix"].lower() in self.bot_says.lower():
self.text_queue.put(text)
if self.cancel_gen:
print("Generation canceled")
return False
else:
return True
else:
self.bot_says = self.remove_text_from_string(self.bot_says, self.personality["user_message_prefix"].lower())
print("The model is halucinating")
self.cancel_gen=True
return False
#if self.current_message in self.full_text:
if len(self.discussion_messages) < len(self.full_text):
self.is_bot_text_started = True
def generate_message(self):
self.generating=True
self.text_queue=Queue()
gc.collect()
total_n_predict = len(self.discussion_messages)+self.config['n_predict']
total_n_predict = self.config['n_predict']
print(f"Generating {total_n_predict} outputs... ")
print(f"Input text : {self.discussion_messages}")
self.chatbot_bindings.generate(
self.discussion_messages,
new_text_callback=self.new_text_callback,

View File

@ -18,8 +18,10 @@ __license__ = "Apache 2.0"
class GPTBackend:
file_extension='*.bin'
def __init__(self, config:dict) -> None:
def __init__(self, config:dict, inline:bool) -> None:
self.config = config
self.inline = inline
def generate(self,
prompt:str,

View File

@ -28,12 +28,21 @@ class GPT_J(GPTBackend):
Args:
config (dict): The configuration file
"""
super().__init__(config)
super().__init__(config, True)
self.config = config
self.model = Model(
ggml_model=f"./models/gptj/{self.config['model']}"
)
if "use_avx2" in self.config and not self.config["use_avx2"]:
self.model = Model(
model=f"./models/gpt_j/{self.config['model']}", instructions='avx'
)
else:
self.model = Model(
model=f"./models/gpt_j/{self.config['model']}"
)
def get_num_tokens(self, prompt):
return self.model.num_tokens(prompt)
def generate(self,
prompt:str,
@ -49,16 +58,19 @@ class GPT_J(GPTBackend):
new_text_callback (Callable[[str], None], optional): A callback function that is called everytime a new text element is generated. Defaults to None.
verbose (bool, optional): If true, the code will spit many informations about the generation process. Defaults to False.
"""
num_tokens = self.get_num_tokens(prompt)
print(f"Prompt has {num_tokens} tokens")
self.model.generate(
prompt,
#new_text_callback=new_text_callback,
n_predict=n_predict,
callback=new_text_callback,
n_predict=num_tokens + n_predict,
seed=self.config['seed'] if self.config['seed']>0 else -1,
temp=self.config['temp'],
top_k=self.config['top_k'],
top_p=self.config['top_p'],
repeat_penalty=self.config['repeat_penalty'],
repeat_last_n = self.config['repeat_last_n'],
# repeat_penalty=self.config['repeat_penalty'],
# repeat_last_n = self.config['repeat_last_n'],
n_threads=self.config['n_threads'],
verbose=verbose
#verbose=verbose
)
new_text_callback()
#new_text_callback()

View File

@ -27,7 +27,7 @@ class LLAMACPP(GPTBackend):
Args:
config (dict): The configuration file
"""
super().__init__(config)
super().__init__(config, False)
self.model = Model(
ggml_model=f"./models/llama_cpp/{self.config['model']}",

View File

@ -29,7 +29,7 @@ class Transformers(GPTBackend):
Args:
config (dict): The configuration file
"""
super().__init__(config)
super().__init__(config, True)
self.config = config
self.tokenizer = tokenizer = AutoTokenizer.from_pretrained(f"./models/transformers/{self.config['model']}/tokenizer.json", local_files_only=True)
self.model = AutoModelForCausalLM.from_pretrained(f"./models/transformers/{self.config['model']}/model.bin", local_files_only=True)

View File

@ -4,5 +4,5 @@ pytest
pyyaml
markdown
pyllamacpp==1.0.6
gpt4all-j
gpt4all-j==0.2.1
transformers

View File

@ -29,7 +29,7 @@ function addMessage(sender, message, id, rank = 0, can_edit = false) {
senderElement.innerHTML = sender;
const messageTextElement = document.createElement('div');
messageTextElement.classList.add('font-medium', 'text-md');
messageTextElement.classList.add('font-medium', 'text-md', 'whitespace-pre-wrap');
messageTextElement.innerHTML = message;
// Create a hidden div element needed to buffer responses before commiting them to the visible message
const hiddenElement = document.createElement('div');
@ -115,8 +115,6 @@ function addMessage(sender, message, id, rank = 0, can_edit = false) {
sendbtn.style.display = "block";
waitAnimation.style.display = "none";
stopGeneration.style.display = "none";
hiddenElement_.innerHTML = txt
messageTextElement_.innerHTML = txt
controller.close();
return;
}
@ -153,6 +151,7 @@ function addMessage(sender, message, id, rank = 0, can_edit = false) {
entry_counter++;
prefix = "FINAL:";
if(text.startsWith(prefix)){
console.log("Final text found")
text = text.substring(prefix.length);
hiddenElement.innerHTML = text
messageTextElement.innerHTML = text

View File

@ -2,12 +2,12 @@ function update_main(){
const chatWindow = document.getElementById('chat-window');
const chatForm = document.getElementById('chat-form');
const userInput = document.getElementById('user-input');
const stopGeneration = document.querySelector("#stop-generation")
const stopGeneration = document.getElementById("stop-generation")
stopGeneration.addEventListener('click', (event) =>{
event.preventDefault();
console.log("Stop clicked");
fetch('/stop')
fetch('/stop_gen')
.then(response => response.json())
.then(data => {
console.log(data);
@ -43,7 +43,7 @@ function update_main(){
user_msg = addMessage('',message, 0, 0, can_edit=true);
bot_msg = addMessage('', '', 0, 0, can_edit=true);
fetch('/bot', {
fetch('/generate', {
method: 'POST',
headers: {
'Content-Type': 'application/json'

View File

@ -87,9 +87,13 @@ fetch('/settings')
.then((response) => response.json())
.then((data) => {
console.log(data);
populate_settings();
alert("Backend set successfully")
if(data["status"]==="no_models_found"){
alert("No models found for this backend. Make sure you select a backend that you have models for or download models from links in our repository")
}
else{
populate_settings();
alert("Backend set successfully")
}
})
.catch((error) => {
console.error('Error:', error);

View File

@ -37,7 +37,7 @@
<form id="chat-form" class="w-full flex-row text-white mr-15 p-15">
<input type="text" id="user-input" placeholder="Type your message..." class="dark:bg-gray-600 pl-1 pr-1 py-1 my-1 mr-2 text-black dark:text-white outline-none drop-shadow-sm w-full rounded-md flex-1">
<div id="wait-animation" style="display: none;" class="lds-facebook bg-secondary my-1 mx-1 outline-none drop-shadow-sm w-full rounded-md p-2"><div></div><div></div><div></div></div>
<button id="stop-generation" style="display: none; " class="my-1 mx-1 outline-none px-4 dark:bg-gray-900 text-black dark:text-white rounded-md hover:bg-[#7ba0ea] active:bg-[#3d73e1] transition-colors ease-in-out">Stop generating</button>
<button id="stop-generation" style="display: none;" class="my-1 mx-1 outline-none px-4 dark:bg-gray-900 text-black dark:text-white rounded-md hover:bg-[#7ba0ea] active:bg-[#3d73e1] transition-colors ease-in-out">Stop generating</button>
<input type="submit" value="Send" id="submit-input" class="my-1 mx-1 outline-none px-4 dark:bg-gray-900 text-black dark:text-white rounded-md hover:bg-[#7ba0ea] active:bg-[#3d73e1] transition-colors ease-in-out">
</form>
</div>