mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-07 19:18:33 +00:00
chore(model gallery): add qwen3-30b-a1.5b-high-speed (#5311)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
parent
01bbb31fb3
commit
c0a206bc7a
@ -442,6 +442,36 @@
|
||||
- filename: Smoothie-Qwen3-8B.Q4_K_M.gguf
|
||||
sha256: 36fc6df285c35beb8f1fdb46b3854bc4f420d3600afa397bf6a89e2ce5480112
|
||||
uri: huggingface://mradermacher/Smoothie-Qwen3-8B-GGUF/Smoothie-Qwen3-8B.Q4_K_M.gguf
|
||||
- !!merge <<: *qwen3
|
||||
name: "qwen3-30b-a1.5b-high-speed"
|
||||
icon: https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed/resolve/main/star-wars-hans-solo.gif
|
||||
urls:
|
||||
- https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed
|
||||
- https://huggingface.co/mradermacher/Qwen3-30B-A1.5B-High-Speed-GGUF
|
||||
description: |
|
||||
This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.
|
||||
|
||||
This is a simple "finetune" of the Qwen's "Qwen 30B-A3B" (MOE) model, setting the experts in use from 8 to 4 (out of 128 experts).
|
||||
|
||||
This method close to doubles the speed of the model and uses 1.5B (of 30B) parameters instead of 3B (of 30B) parameters. Depending on the application you may want to use the regular model ("30B-A3B"), and use this model for simpler use case(s) although I did not notice any loss of function during routine (but not extensive) testing.
|
||||
|
||||
Example generation (Q4KS, CPU) at the bottom of this page using 4 experts / this model.
|
||||
|
||||
More complex use cases may benefit from using the normal version.
|
||||
|
||||
For reference:
|
||||
|
||||
Cpu only operation Q4KS (windows 11) jumps from 12 t/s to 23 t/s.
|
||||
GPU performance IQ3S jumps from 75 t/s to over 125 t/s. (low to mid level card)
|
||||
|
||||
Context size: 32K + 8K for output (40k total)
|
||||
overrides:
|
||||
parameters:
|
||||
model: Qwen3-30B-A1.5B-High-Speed.Q4_K_M.gguf
|
||||
files:
|
||||
- filename: Qwen3-30B-A1.5B-High-Speed.Q4_K_M.gguf
|
||||
sha256: 2fca25524abe237483de64599bab54eba8fb22088fc21e30ba45ea8fb04dd1e0
|
||||
uri: huggingface://mradermacher/Qwen3-30B-A1.5B-High-Speed-GGUF/Qwen3-30B-A1.5B-High-Speed.Q4_K_M.gguf
|
||||
- &gemma3
|
||||
url: "github:mudler/LocalAI/gallery/gemma.yaml@master"
|
||||
name: "gemma-3-27b-it"
|
||||
|
Loading…
x
Reference in New Issue
Block a user