consolidating my llm related repos. big cleanup coming.

This commit is contained in:
2025-07-12 08:17:54 -05:00
parent c17baca52e
commit b5d90c9c56
31 changed files with 1526 additions and 0 deletions

View File

@@ -0,0 +1,64 @@
# Selecting a Local Large Language Model
## Step 1: Determine Your Requirements
1. **Purpose**:
- What will you use the model for? (e.g., general-purpose assistant, code completion, specific tasks)
All of the above.
2. **Hardware Specifications**:
- **GPU**: Does your laptop have a dedicated GPU (e.g., NVIDIA with CUDA support)?
Yes it does. It's an older generation NVIDIA GPU.
- **RAM**: How much system memory is available?
32 gigs of ram
- **Disk Space**: How much storage can you allocate for the model and its dependencies?
1 terabyte or more
3. **Ease of Use**:
- Are you comfortable setting up frameworks like Docker or Python environments?
Yes. Please keep everything in docker including python.
4. **Language Model Size**:
- Do you need a lightweight model (<10 GB) for responsiveness or a large model for complex tasks?
How about a mixture and I can switch them based on the task?
## Step 2: Evaluate Available Models (Optional for Editing)
Here are some options to consider:
- **GPT-Based Models**: GPT-J, GPT-NeoX
- **LLaMA (Meta)**: LLaMA 2 (7B, 13B, 70B)
- **Falcon**: Falcon 7B / 40B
- **Mistral**: Lightweight and efficient 7B
- **Vicuna**: Fine-tuned for conversational tasks
I have ollama installed and I've downloaded a few models. Here is the ollama models I have so far:
codellama:7b-code 8df0a30bb1e6 3.8 GB 44 hours ago
codellama:7b-instruct 8fdf8f752f6e 3.8 GB 44 hours ago
maryasov/llama3.1-cline:latest 2ecd12227dcf 4.7 GB 45 hours ago
hhao/qwen2.5-coder-tools:latest 1fbf62f22cd4 4.7 GB 45 hours ago
deepseek-coder:6.7b-base 585a5cb3b219 3.8 GB 3 weeks ago
qwen2.5-coder:7b-base bd8755145f1c 4.7 GB 3 weeks ago
starcoder:7b 53fdbc3a2006 4.3 GB 3 weeks ago
deepseek-coder:base 3b417b786925 776 MB 3 weeks agoo
## Step 3: Tools for Running Models Locally
1. Text-Generation-WebUI: A user-friendly web interface for loading models.
2. Hugging Face Transformers: A Python-based framework for managing models.
3. GGML / GPTQ (Quantized Models): Lightweight alternatives with reduced precision.
Ok. I have both my local laptop and a big beefy server running Cloudron. I have Open-WebUI deployed in Cloudron. I would like to use both the local laptop and my remote server.
---
Edit this document with your answers or add any additional details you'd like me to consider. Upload the updated file back, and I'll guide you further!