consolidating my llm related repos. big cleanup coming.

2025-07-12 08:17:54 -05:00
parent c17baca52e
commit b5d90c9c56
31 changed files with 1526 additions and 0 deletions
--- a/AICoFounder/Production/TSYS/ChatGPT/COO/Kaitlin-VPTechnology/LocalLLMProject/LocalLLMIntroduction.md
+++ b/AICoFounder/Production/TSYS/ChatGPT/COO/Kaitlin-VPTechnology/LocalLLMProject/LocalLLMIntroduction.md
@@ -0,0 +1,64 @@
+# Selecting a Local Large Language Model
+
+## Step 1: Determine Your Requirements
+
+1. **Purpose**:
+   - What will you use the model for? (e.g., general-purpose assistant, code completion, specific tasks)
+
+   All of the above.
+
+2. **Hardware Specifications**:
+   - **GPU**: Does your laptop have a dedicated GPU (e.g., NVIDIA with CUDA support)?
+
+   Yes it does. It's an older generation NVIDIA GPU.
+
+   - **RAM**: How much system memory is available?
+
+   32 gigs of ram
+
+   - **Disk Space**: How much storage can you allocate for the model and its dependencies?
+
+   1 terabyte or more
+
+3. **Ease of Use**:
+   - Are you comfortable setting up frameworks like Docker or Python environments?
+
+    Yes. Please keep everything in docker including python.
+
+
+4. **Language Model Size**:
+   - Do you need a lightweight model (<10 GB) for responsiveness or a large model for complex tasks?
+
+   How about a mixture and I can switch them based on the task?
+
+## Step 2: Evaluate Available Models (Optional for Editing)
+
+Here are some options to consider:
+- **GPT-Based Models**: GPT-J, GPT-NeoX
+- **LLaMA (Meta)**: LLaMA 2 (7B, 13B, 70B)
+- **Falcon**: Falcon 7B / 40B
+- **Mistral**: Lightweight and efficient 7B
+- **Vicuna**: Fine-tuned for conversational tasks
+
+I have ollama installed and I've downloaded a few models. Here is the ollama models I have so far:
+
+codellama:7b-code                  8df0a30bb1e6    3.8 GB    44 hours ago
+codellama:7b-instruct              8fdf8f752f6e    3.8 GB    44 hours ago
+maryasov/llama3.1-cline:latest     2ecd12227dcf    4.7 GB    45 hours ago
+hhao/qwen2.5-coder-tools:latest    1fbf62f22cd4    4.7 GB    45 hours ago
+deepseek-coder:6.7b-base           585a5cb3b219    3.8 GB    3 weeks ago
+qwen2.5-coder:7b-base              bd8755145f1c    4.7 GB    3 weeks ago
+starcoder:7b                       53fdbc3a2006    4.3 GB    3 weeks ago
+deepseek-coder:base                3b417b786925    776 MB    3 weeks agoo
+
+## Step 3: Tools for Running Models Locally
+
+1. Text-Generation-WebUI: A user-friendly web interface for loading models.
+2. Hugging Face Transformers: A Python-based framework for managing models.
+3. GGML / GPTQ (Quantized Models): Lightweight alternatives with reduced precision.
+
+Ok. I have both my local laptop and a big beefy server running Cloudron. I have Open-WebUI deployed in Cloudron. I would like to use both the local laptop and my remote server.
+
+---
+
+Edit this document with your answers or add any additional details you'd like me to consider. Upload the updated file back, and I'll guide you further!