Added working GPT-j bindings

2025-03-22 11:55:17 +00:00 · 2023-04-24 00:19:15 +02:00 · 2023-04-24 00:19:15 +02:00 · 42d63d99a8
commit 42d63d99a8
parent 7d7490da2d
14 changed files with 153 additions and 75 deletions
--- a/README.md
+++ b/README.md
@ -46,19 +46,14 @@ Make sure that your CPU supports `AVX2` instruction set. Without it, this applic

 ### Automatic install

-1. Open directory on your computer where you want to download/install this application  (This will create new directory: `/gpt4all-ui/`. Make sure a folder with this name does not exist in this direcotry.)
-2. Press and hold `Shift` on your keyboard and `Right click` with your mouse inside a folder. Select from a menu `Open Terminal` or `Open to powershell windows here` (This command can hide under `Show more options` in Windows 11).
-3. Copy and paste this command and press enter: 
-```
-mkdir gpt4all-ui & curl https://raw.githubusercontent.com/nomic-ai/gpt4all-ui/main/webui.bat -o ./gpt4all-ui/webui.bat ; pushd ./gpt4all/ ; Invoke-Expression -Command "./webui.bat"
-```
-> **Note**
->
-> This command creates new directory `/gpt4all-ui/`, downloads a file [webui.bat](https://raw.githubusercontent.com/nomic-ai/gpt4all-ui/main/webui.bat), changes current work directory to `/gpt4all-ui/` and executes webui.bat that downloads and installs everything that is needed.
+It is advised to have python 3.10 (The official one, not the one from microsoft store) and git installed. Although it should work with any python from 3.7, it is advised to use 3.10 to have the full support as some extensions like the future stable diffusion extension will force you to have 3.10.

-4. Follow instructions on screen until it launches webui.
-5. To relaunch application double click on `webui.bat` file from Windows explorer as normal user.
+1. [Goto the latest release section](https://github.com/nomic-ai/gpt4all-ui/releases)
+2. Download the `webui.bat` if you are on windows or `webui.sh` if you are on linux/mac. but the download in a folder you name for example gpt4all-ui
+3. Run the script and wait. It should install everything and start the chatbot
+4. Before running, it may ask you to download a model. Feel free to accept or to download your own models depending on the backends you are using.

+Once installed, you should see a new folder called GPT4All. From now on, you can run the app by using webui.bat or webui.sh. The script will check for any new updates
 ### Manual Simple install:

 1. Download this repository .zip:
@ -216,18 +211,29 @@ After that, you can open the application in your browser on http://localhost:960

 Now you're ready to work!

+# Supported backends
+Two backends are now supported:
+1 - The llama_cpp backend
+2 - The GPT-j backend
+3 - Hugging face's Transformers (under construction)
+
 # Supported models
 You can also refuse to download the model during the install procedure and download it manually.

 **For now, we support ggml models that work "out-of-the-box" (tested on Windows 11 and Ubuntu 22.04.2), such as:**

+## LLama_cpp models
 - [GPT4ALL 7B](https://huggingface.co/ParisNeo/GPT4All/resolve/main/gpt4all-lora-quantized-ggml.bin) or visit [repository](https://huggingface.co/ParisNeo/GPT4All)
 - [GPT4ALL 7B unfiltered](https://huggingface.co/ParisNeo/GPT4All/blob/main/gpt4all-lora-unfiltered-quantized.new.bin) or visit [repository](https://huggingface.co/ParisNeo/GPT4All)
 - [Vicuna 7B rev 1](https://huggingface.co/eachadea/legacy-ggml-vicuna-7b-4bit/resolve/main/ggml-vicuna-7b-4bit-rev1.bin) or visit [repository](https://huggingface.co/eachadea/legacy-ggml-vicuna-7b-4bit)  
 - [Vicuna 13B rev 1](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/resolve/main/ggml-vicuna-13b-4bit-rev1.bin) or visit [repository](https://huggingface.co/eachadea/ggml-vicuna-13b-4bit)

-**These models don't work "out-of-the-box" and need to be converted to the right ggml type:**

+## GPT-j models
+- [GPT-j 7B](https://gpt4all.io/models/ggml-gpt4all-j.bin)
+
+**These models don't work "out-of-the-box" and need to be converted to the right ggml type:**
+## LLAMACPP models
 - [Vicuna 7B](https://huggingface.co/eachadea/legacy-ggml-vicuna-7b-4bit/resolve/main/ggml-vicuna-7b-4bit.bin) or visit [repository](https://huggingface.co/eachadea/legacy-ggml-vicuna-7b-4bit)
 - [Vicuna 13B q4 v0](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vicuna-13b-1.1-q4_0.bin) or visit [repository](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/)
 - [Vicuna 13B q4 v1](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/resolve/main/ggml-vicuna-13b-1.1-q4_1.bin) or visit [repository](https://huggingface.co/eachadea/ggml-vicuna-13b-1.1/)
--- a/app.py
+++ b/app.py
@ -82,8 +82,8 @@ class Gpt4AllWebUI(GPT4AllAPI):
        self.add_endpoint(
            "/new_discussion", "new_discussion", self.new_discussion, methods=["GET"]
        )
-        self.add_endpoint("/bot", "bot", self.bot, methods=["POST"])
-        self.add_endpoint("/stop", "stop", self.stop, methods=["POST"])
+        self.add_endpoint("/generate", "generate", self.generate, methods=["POST"])
+        self.add_endpoint("/stop_gen", "stop_gen", self.stop_gen, methods=["GET"])

        self.add_endpoint("/run_to", "run_to", self.run_to, methods=["POST"])
        self.add_endpoint("/rename", "rename", self.rename, methods=["POST"])
@ -239,7 +239,6 @@ class Gpt4AllWebUI(GPT4AllAPI):
    @stream_with_context
    def parse_to_prompt_stream(self, message, message_id):
        bot_says = ""
-        self.stop = False

        # send the message to the bot
        print(f"Received message : {message}")
@ -264,28 +263,33 @@ class Gpt4AllWebUI(GPT4AllAPI):
        self.discussion_messages = self.prepare_query(message_id)
        self.prepare_reception()
        self.generating = True
+        app.config['executor'] = ThreadPoolExecutor(max_workers=1)
        app.config['executor'].submit(self.generate_message)
        while self.generating:
            try:
                while not self.text_queue.empty():
                    value = self.text_queue.get(False)
-                    yield value#.replace("\n","<br>")
+                    if self.cancel_gen:
+                        self.generating = False
+                        break
+                    yield value
                    time.sleep(0)
-            except :
+            except Exception as ex:
+                print(f"Exception {ex}")
                time.sleep(0.1)
            if self.cancel_gen:
-                self.cancel_gen = False
-                app.config['executor'].shutdown(True, True)
-
+                self.generating = False
+        print("## Done ##")
+        app.config['executor'].shutdown(True, timeout=5)
        self.current_discussion.update_message(response_id, self.bot_says)
        self.full_message_list.append(self.bot_says)
        bot_says = markdown.markdown(self.bot_says)

        yield "FINAL:"+bot_says
+        self.cancel_gen = False
        return bot_says

-    def bot(self):
-        self.stop = True
+    def generate(self):

        if self.current_discussion is None:
            if self.db.does_last_discussion_have_messages():
@ -308,14 +312,13 @@ class Gpt4AllWebUI(GPT4AllAPI):
            ), content_type='text/plain; charset=utf-8'
        )
    
-    def stop(self):
+    def stop_gen(self):
        self.cancel_gen = True
        return jsonify({"status": "ok"}) 
           
    def run_to(self):
        data = request.get_json()
        message_id = int(data["id"])
-        self.stop = True
        # Segmented (the user receives the output as it comes)
        # We will first send a json entry that contains the message id and so on, then the text as it goes
        return Response(
@ -380,6 +383,7 @@ class Gpt4AllWebUI(GPT4AllAPI):
    def new_discussion(self):
        title = request.args.get("title")
        timestamp = self.create_new_discussion(title)
+        app.config['executor'] = ThreadPoolExecutor(max_workers=1)
        app.config['executor'].submit(self.create_chatbot)

        # Return a success response
@ -392,13 +396,15 @@ class Gpt4AllWebUI(GPT4AllAPI):
            print("New backend selected")
            
            self.config['backend'] = backend
-            self.load_backend(self.BACKENDS_LIST[self.config["backend"]])
            models_dir = Path('./models')/self.config["backend"]  # replace with the actual path to the models folder
            models = [f.name for f in models_dir.glob(self.backend.file_extension)]
            if len(models)>0:            
                self.config['model'] = models[0]
+                self.load_backend(self.BACKENDS_LIST[self.config["backend"]])
                self.create_chatbot()
                return jsonify({"status": "ok"})
+            else:
+                return jsonify({"status": "no_models_found"})

        return jsonify({"status": "error"})

@ -577,7 +583,7 @@ if __name__ == "__main__":

    personality = load_config(f"personalities/{config['personality_language']}/{config['personality_category']}/{config['personality']}.yaml")

-    executor = ThreadPoolExecutor(max_workers=6)
+    executor = ThreadPoolExecutor(max_workers=1)
    app.config['executor'] = executor

    bot = Gpt4AllWebUI(app, config, personality, config_file_path)
--- a/configs/default.yaml
+++ b/configs/default.yaml
@ -6,8 +6,8 @@ n_threads: 8
 host: localhost
 language: en-US
 # Supported backends are llamacpp and gpt-j
-backend: llama_cpp
-model: gpt4all-lora-quantized-ggml.bin
+backend: gpt_j
+model: ggml-gpt4all-j.bin
 n_predict: 1024
 nb_messages_to_remember: 5
 personality_language: english
@ -23,3 +23,4 @@ top_p: 0.95
 voice: ""
 use_gpu: false # Not active yet
 auto_read: false
+use_avx2: true # By default we require using avx2 but if not supported, make sure you remove it from here
--- a/personalities/english/general/gpt4all_chatbot.yaml
+++ b/personalities/english/general/gpt4all_chatbot.yaml
@ -1,6 +1,6 @@
 # GPT4All Chatbot conditionning file
 # Author : @ParisNeo
-# Version : 1.0
+# Version : 1.1
 # Description :
 # An NLP needs conditionning to instruct it to be whatever we want it to be.
 # This file is used by the GPT4All web ui to condition the personality of the model you are
@ -24,19 +24,19 @@ personality_description: |

 # The conditionning instructions sent to eh model at the start of the discussion
 personality_conditionning: |
-  GPT4All is a smart and helpful AI chat bot built by Nomic-AI. It can generate stories on demand.
+  GPT4All is a smart and helpful Assistant built by Nomic-AI. It can discuss with humans and assist them.

 #Welcome message to be sent to the user when a new discussion is started
-welcome_message: "Welcome! I am GPT4All A free and open discussion AI. What can I do for you today?"
+welcome_message: "Welcome! I am GPT4All A free and open assistant. What can I do for you today?"

 # This prefix is added at the beginning of any message input by the user
-user_message_prefix:  "user: "
+user_message_prefix:  "### Human:"

 # A text to put between user and chatbot messages
 link_text: "\n"

 # This prefix is added at the beginning of any message output by the ai
-ai_message_prefix: "gpt4all: "
+ai_message_prefix: "### Assistant:"

 # Here is the list of extensions this personality requires
 dependencies: []
--- a/pyGpt4All/api.py
+++ b/pyGpt4All/api.py
@ -25,6 +25,7 @@ class GPT4AllAPI():
        self.config = config
        self.personality = personality
        self.config_file_path = config_file_path
+        self.cancel_gen = False

        # This is the queue used to stream text to the ui as the bot spits out its response
        self.text_queue = Queue(0)
@ -57,7 +58,7 @@ class GPT4AllAPI():
        """
        self.prepare_reception()
        self.discussion_messages = "Instruction: Act as gpt4all. A kind and helpful AI bot built to help users solve problems.\nuser: how to build a water rocket?\ngpt4all:"
-        self.chatbot_bindings.generate(
+        text = self.chatbot_bindings.generate(
            self.discussion_messages,
            new_text_callback=self.new_text_callback,
            n_predict=372,
@ -68,9 +69,10 @@ class GPT4AllAPI():
            repeat_last_n = self.config['repeat_last_n'],
            #seed=self.config['seed'],
            n_threads=self.config['n_threads']
-        )        
-        
+        )   
+        print(text)             
        """
+        

        # generation status
        self.generating=False
@ -87,14 +89,10 @@ class GPT4AllAPI():
        loader = importlib.machinery.SourceFileLoader(module_name, str(absolute_path))
        backend_module = loader.load_module()
        backend_class = getattr(backend_module, backend_module.backend_name)
-        self.backend = backend_class(self.config)
+        self.backend = backend_class

    def create_chatbot(self):
-        try:
-            return self.backend(self.config)
-        except Exception as ex:
-            print(f"Exception {ex}")
-            return None
+        return self.backend(self.config)
    
    def condition_chatbot(self, conditionning_message):
        if self.current_discussion is None:
@ -175,23 +173,73 @@ class GPT4AllAPI():
        
        return discussion_messages # Removes the last return

+
+    def remove_text_from_string(self, string, text_to_find):
+        """
+        Removes everything from the first occurrence of the specified text in the string (case-insensitive).
+
+        Parameters:
+        string (str): The original string.
+        text_to_find (str): The text to find in the string.
+
+        Returns:
+        str: The updated string.
+        """
+        index = string.lower().find(text_to_find.lower())
+
+        if index != -1:
+            string = string[:index]
+
+        return string
+
+
+
    def new_text_callback(self, text: str):
+        if self.cancel_gen:
+            return False
        print(text, end="")
        sys.stdout.flush()
-        self.full_text += text
-        if self.is_bot_text_started:
+        if self.chatbot_bindings.inline:
            self.bot_says += text
-            self.text_queue.put(text)
-            
-        #if self.current_message in self.full_text:
-        if len(self.discussion_messages) < len(self.full_text):
-            self.is_bot_text_started = True
+            if not self.personality["user_message_prefix"].lower() in self.bot_says.lower():
+                self.text_queue.put(text)
+                if self.cancel_gen:
+                    print("Generation canceled")
+                    return False
+                else:
+                    return True
+            else:
+                self.bot_says = self.remove_text_from_string(self.bot_says, self.personality["user_message_prefix"].lower())
+                print("The model is halucinating")
+                return False
+        else:
+            self.full_text += text
+            if self.is_bot_text_started:
+                self.bot_says += text
+                if not self.personality["user_message_prefix"].lower() in self.bot_says.lower():
+                    self.text_queue.put(text)
+                    if self.cancel_gen:
+                        print("Generation canceled")
+                        return False
+                    else:
+                        return True
+                else:
+                    self.bot_says = self.remove_text_from_string(self.bot_says, self.personality["user_message_prefix"].lower())
+                    print("The model is halucinating")
+                    self.cancel_gen=True
+                    return False
+                
+            #if self.current_message in self.full_text:
+            if len(self.discussion_messages) < len(self.full_text):
+                self.is_bot_text_started = True
        
    def generate_message(self):
        self.generating=True
        self.text_queue=Queue()
        gc.collect()
-        total_n_predict = len(self.discussion_messages)+self.config['n_predict']
+        total_n_predict = self.config['n_predict']
+        print(f"Generating {total_n_predict} outputs... ")
+        print(f"Input text : {self.discussion_messages}")
        self.chatbot_bindings.generate(
            self.discussion_messages,
            new_text_callback=self.new_text_callback,
--- a/pyGpt4All/backends/backend.py
+++ b/pyGpt4All/backends/backend.py
@ -18,8 +18,10 @@ __license__ = "Apache 2.0"

 class GPTBackend:
    file_extension='*.bin'
-    def __init__(self, config:dict) -> None:
+    def __init__(self, config:dict, inline:bool) -> None:
        self.config = config
+        self.inline = inline
+

    def generate(self, 
                 prompt:str,                  
--- a/pyGpt4All/backends/gpt_j.py
+++ b/pyGpt4All/backends/gpt_j.py
@ -28,12 +28,21 @@ class GPT_J(GPTBackend):
        Args:
            config (dict): The configuration file
        """
-        super().__init__(config)
+        super().__init__(config, True)
        self.config = config
-        self.model = Model(
-                ggml_model=f"./models/gptj/{self.config['model']}"
-                )
-
+        if "use_avx2" in self.config and not self.config["use_avx2"]:
+            self.model = Model(
+                    model=f"./models/gpt_j/{self.config['model']}", instructions='avx'
+                    )
+        else:
+            self.model = Model(
+                    model=f"./models/gpt_j/{self.config['model']}"
+                    )
+            
+        
+            
+    def get_num_tokens(self, prompt):
+        return self.model.num_tokens(prompt)

    def generate(self, 
                 prompt:str,                  
@ -49,16 +58,19 @@ class GPT_J(GPTBackend):
            new_text_callback (Callable[[str], None], optional): A callback function that is called everytime a new text element is generated. Defaults to None.
            verbose (bool, optional): If true, the code will spit many informations about the generation process. Defaults to False.
        """
+        num_tokens = self.get_num_tokens(prompt)
+        print(f"Prompt has {num_tokens} tokens")
        self.model.generate(
            prompt,
-            #new_text_callback=new_text_callback,
-            n_predict=n_predict,
+            callback=new_text_callback,
+            n_predict=num_tokens + n_predict,
+            seed=self.config['seed'] if self.config['seed']>0 else -1,
            temp=self.config['temp'],
            top_k=self.config['top_k'],
            top_p=self.config['top_p'],
-            repeat_penalty=self.config['repeat_penalty'],
-            repeat_last_n = self.config['repeat_last_n'],
+            # repeat_penalty=self.config['repeat_penalty'],
+            # repeat_last_n = self.config['repeat_last_n'],
            n_threads=self.config['n_threads'],
-            verbose=verbose
+            #verbose=verbose
        )
-        new_text_callback()
+        #new_text_callback()
--- a/pyGpt4All/backends/llama_cpp.py
+++ b/pyGpt4All/backends/llama_cpp.py
@ -27,7 +27,7 @@ class LLAMACPP(GPTBackend):
        Args:
            config (dict): The configuration file
        """
-        super().__init__(config)
+        super().__init__(config, False)
        
        self.model = Model(
                ggml_model=f"./models/llama_cpp/{self.config['model']}", 
--- a/pyGpt4All/backends/transformers.py
+++ b/pyGpt4All/backends/transformers.py
@ -29,7 +29,7 @@ class Transformers(GPTBackend):
        Args:
            config (dict): The configuration file
        """
-        super().__init__(config)
+        super().__init__(config, True)
        self.config = config
        self.tokenizer = tokenizer = AutoTokenizer.from_pretrained(f"./models/transformers/{self.config['model']}/tokenizer.json", local_files_only=True)
        self.model = AutoModelForCausalLM.from_pretrained(f"./models/transformers/{self.config['model']}/model.bin", local_files_only=True)
--- a/requirements.txt
+++ b/requirements.txt
@ -4,5 +4,5 @@ pytest
 pyyaml
 markdown
 pyllamacpp==1.0.6
-gpt4all-j 
+gpt4all-j==0.2.1
 transformers
--- a/static/js/chat.js
+++ b/static/js/chat.js
@ -29,7 +29,7 @@ function addMessage(sender, message, id, rank = 0, can_edit = false) {
    senderElement.innerHTML = sender;

    const messageTextElement = document.createElement('div');
-    messageTextElement.classList.add('font-medium', 'text-md');
+    messageTextElement.classList.add('font-medium', 'text-md', 'whitespace-pre-wrap');
    messageTextElement.innerHTML = message;
    // Create a hidden div element needed to buffer responses before commiting them to the visible message
    const hiddenElement = document.createElement('div');
@ -115,8 +115,6 @@ function addMessage(sender, message, id, rank = 0, can_edit = false) {
                                        sendbtn.style.display = "block";
                                        waitAnimation.style.display = "none";
                                        stopGeneration.style.display = "none";
-                                        hiddenElement_.innerHTML = txt
-                                        messageTextElement_.innerHTML = txt
                                        controller.close();
                                        return;
                                    }
@ -153,6 +151,7 @@ function addMessage(sender, message, id, rank = 0, can_edit = false) {
                                entry_counter++;
                                prefix = "FINAL:";
                                if(text.startsWith(prefix)){
+                                    console.log("Final text found")
                                    text = text.substring(prefix.length);
                                    hiddenElement.innerHTML         = text
                                    messageTextElement.innerHTML    = text
--- a/static/js/main.js
+++ b/static/js/main.js
@ -2,12 +2,12 @@ function update_main(){
  const chatWindow = document.getElementById('chat-window');
  const chatForm = document.getElementById('chat-form');
  const userInput = document.getElementById('user-input');
-  const stopGeneration = document.querySelector("#stop-generation")
+  const stopGeneration = document.getElementById("stop-generation")
    
  stopGeneration.addEventListener('click', (event) =>{
    event.preventDefault();
    console.log("Stop clicked");
-    fetch('/stop')
+    fetch('/stop_gen')
    .then(response => response.json())
    .then(data => {
        console.log(data);
@ -43,7 +43,7 @@ function update_main(){
      user_msg = addMessage('',message, 0, 0, can_edit=true);
      bot_msg = addMessage('', '', 0, 0, can_edit=true);

-      fetch('/bot', {
+      fetch('/generate', {
          method: 'POST',
          headers: {
              'Content-Type': 'application/json'
--- a/static/js/settings.js
+++ b/static/js/settings.js
@ -87,9 +87,13 @@ fetch('/settings')
      .then((response) => response.json())
      .then((data) => {
        console.log(data);
-        populate_settings();
-        
-        alert("Backend set successfully")
+        if(data["status"]==="no_models_found"){
+          alert("No models found for this backend. Make sure you select a backend that you have models for or download models from links in our repository")  
+        }
+        else{
+          populate_settings();
+          alert("Backend set successfully")  
+        }
      })
      .catch((error) => {
        console.error('Error:', error);
--- a/templates/main.html
+++ b/templates/main.html
@ -37,7 +37,7 @@
      <form id="chat-form" class="w-full flex-row text-white mr-15 p-15">
        <input type="text" id="user-input" placeholder="Type your message..." class="dark:bg-gray-600 pl-1 pr-1 py-1 my-1 mr-2 text-black dark:text-white outline-none drop-shadow-sm w-full rounded-md flex-1">
        <div id="wait-animation" style="display: none;" class="lds-facebook bg-secondary my-1 mx-1 outline-none drop-shadow-sm w-full rounded-md p-2"><div></div><div></div><div></div></div>
-        <button id="stop-generation" style="display: none; " class="my-1 mx-1 outline-none px-4 dark:bg-gray-900 text-black dark:text-white rounded-md hover:bg-[#7ba0ea] active:bg-[#3d73e1]  transition-colors ease-in-out">Stop generating</button>
+        <button id="stop-generation" style="display: none;" class="my-1 mx-1 outline-none px-4 dark:bg-gray-900 text-black dark:text-white rounded-md hover:bg-[#7ba0ea] active:bg-[#3d73e1]  transition-colors ease-in-out">Stop generating</button>
        <input type="submit" value="Send" id="submit-input" class="my-1 mx-1 outline-none px-4 dark:bg-gray-900 text-black dark:text-white rounded-md hover:bg-[#7ba0ea] active:bg-[#3d73e1]  transition-colors ease-in-out">
      </form>
    </div>