mirror of
https://github.com/mudler/LocalAI.git
synced 2025-05-06 02:28:15 +00:00
Update README
This commit is contained in:
parent
bffaf2aa42
commit
e14e1b0a77
13
README.md
13
README.md
@ -36,7 +36,8 @@ llama-cli --model <model_path> --instruction <instruction> [--input <input>] [--
|
|||||||
| temperature | TEMPERATURE | 0.95 | Sampling temperature for model output. |
|
| temperature | TEMPERATURE | 0.95 | Sampling temperature for model output. |
|
||||||
| top_p | TOP_P | 0.85 | The cumulative probability for top-p sampling. |
|
| top_p | TOP_P | 0.85 | The cumulative probability for top-p sampling. |
|
||||||
| top_k | TOP_K | 20 | The number of top-k tokens to consider for text generation. |
|
| top_k | TOP_K | 20 | The number of top-k tokens to consider for text generation. |
|
||||||
|
| context-size | CONTEXT_SIZE | 512 | Default token context size. |
|
||||||
|
| alpaca | ALPACA | true | Set to true for alpaca models. |
|
||||||
|
|
||||||
Here's an example of using `llama-cli`:
|
Here's an example of using `llama-cli`:
|
||||||
|
|
||||||
@ -81,6 +82,8 @@ The API takes takes the following:
|
|||||||
| model | MODEL_PATH | | The path to the pre-trained GPT-based model. |
|
| model | MODEL_PATH | | The path to the pre-trained GPT-based model. |
|
||||||
| threads | THREADS | CPU cores | The number of threads to use for text generation. |
|
| threads | THREADS | CPU cores | The number of threads to use for text generation. |
|
||||||
| address | ADDRESS | :8080 | The address and port to listen on. |
|
| address | ADDRESS | :8080 | The address and port to listen on. |
|
||||||
|
| context-size | CONTEXT_SIZE | 512 | Default token context size. |
|
||||||
|
| alpaca | ALPACA | true | Set to true for alpaca models. |
|
||||||
|
|
||||||
|
|
||||||
Once the server is running, you can make requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body:
|
Once the server is running, you can make requests to it using HTTP. For example, to generate text based on an instruction, you can send a POST request to the `/predict` endpoint with the instruction as the request body:
|
||||||
@ -97,26 +100,30 @@ curl --location --request POST 'http://localhost:8080/predict' --header 'Content
|
|||||||
|
|
||||||
## Using other models
|
## Using other models
|
||||||
|
|
||||||
|
You can use the lite images ( for example `quay.io/go-skynet/llama-cli:v0.2-lite`) that don't ship any model, and specify a model binary to be used for inference with `--model`.
|
||||||
|
|
||||||
13B and 30B models are known to work:
|
13B and 30B models are known to work:
|
||||||
|
|
||||||
### 13B
|
### 13B
|
||||||
|
|
||||||
```
|
```
|
||||||
|
# Download the model image, extract the model
|
||||||
docker run --name model --entrypoint /models quay.io/go-skynet/models:ggml2-alpaca-13b-v0.2
|
docker run --name model --entrypoint /models quay.io/go-skynet/models:ggml2-alpaca-13b-v0.2
|
||||||
docker cp model:/models/model.bin ./
|
docker cp model:/models/model.bin ./
|
||||||
|
|
||||||
# Use the model with llama-cli
|
# Use the model with llama-cli
|
||||||
docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.2 api --model /models/model.bin
|
docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.2-lite api --model /models/model.bin
|
||||||
```
|
```
|
||||||
|
|
||||||
### 30B
|
### 30B
|
||||||
|
|
||||||
```
|
```
|
||||||
|
# Download the model image, extract the model
|
||||||
docker run --name model --entrypoint /models quay.io/go-skynet/models:ggml2-alpaca-30b-v0.2
|
docker run --name model --entrypoint /models quay.io/go-skynet/models:ggml2-alpaca-30b-v0.2
|
||||||
docker cp model:/models/model.bin ./
|
docker cp model:/models/model.bin ./
|
||||||
|
|
||||||
# Use the model with llama-cli
|
# Use the model with llama-cli
|
||||||
docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.2 api --model /models/model.bin
|
docker run -v $PWD:/models -p 8080:8080 -ti --rm quay.io/go-skynet/llama-cli:v0.2-lite api --model /models/model.bin
|
||||||
```
|
```
|
||||||
|
|
||||||
### Golang client API
|
### Golang client API
|
||||||
|
Loading…
x
Reference in New Issue
Block a user