mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2025-06-10 03:01:33 +00:00
Some checks failed
Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled
CI / determine-tag (push) Has been cancelled
CI / ubuntu-22 (linux/amd64) (push) Has been cancelled
CI / ubuntu-22 (linux/ppc64le) (push) Has been cancelled
CI / ubuntu-22-arm64 (linux/arm64) (push) Has been cancelled
CI / ubuntu-22-arm-v7 (linux/arm/v7) (push) Has been cancelled
CI / macOS-latest (generic/platform=iOS) (push) Has been cancelled
CI / macOS-latest (generic/platform=macOS) (push) Has been cancelled
CI / macOS-latest (generic/platform=tvOS) (push) Has been cancelled
CI / ubuntu-22-gcc (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-gcc (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-arm64 (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-gcc-arm64 (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-arm-v7 (linux/arm/v7, Debug) (push) Has been cancelled
CI / ubuntu-22-gcc-arm-v7 (linux/arm/v7, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / android_java (push) Has been cancelled
CI / bindings-java (push) Has been cancelled
CI / quantize (push) Has been cancelled
CI / release (push) Has been cancelled
CI / coreml-base-en (push) Has been cancelled
CI / vad (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main-musa.Dockerfile platform:linux/amd64 tag:main-musa]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples WASM / deploy-wasm-github-pages (push) Has been cancelled
* add load testing script and update README for k6 integration
102 lines
4.6 KiB
Markdown
102 lines
4.6 KiB
Markdown
# whisper.cpp/examples/server
|
|
|
|
Simple http server. WAV Files are passed to the inference model via http requests.
|
|
|
|
https://github.com/ggerganov/whisper.cpp/assets/1991296/e983ee53-8741-4eb5-9048-afe5e4594b8f
|
|
|
|
## Usage
|
|
|
|
```
|
|
./build/bin/whisper-server -h
|
|
|
|
usage: ./build/bin/whisper-server [options]
|
|
|
|
options:
|
|
-h, --help [default] show this help message and exit
|
|
-t N, --threads N [4 ] number of threads to use during computation
|
|
-p N, --processors N [1 ] number of processors to use during computation
|
|
-ot N, --offset-t N [0 ] time offset in milliseconds
|
|
-on N, --offset-n N [0 ] segment index offset
|
|
-d N, --duration N [0 ] duration of audio to process in milliseconds
|
|
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
|
|
-ml N, --max-len N [0 ] maximum segment length in characters
|
|
-sow, --split-on-word [false ] split on word rather than on token
|
|
-bo N, --best-of N [2 ] number of best candidates to keep
|
|
-bs N, --beam-size N [-1 ] beam size for beam search
|
|
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
|
|
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
|
|
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
|
|
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
|
|
-tr, --translate [false ] translate from source language to english
|
|
-di, --diarize [false ] stereo audio diarization
|
|
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
|
|
-nf, --no-fallback [false ] do not use temperature fallback while decoding
|
|
-ps, --print-special [false ] print special tokens
|
|
-pc, --print-colors [false ] print colors
|
|
-pr, --print-realtime [false ] print output in realtime
|
|
-pp, --print-progress [false ] print progress
|
|
-nt, --no-timestamps [false ] do not print timestamps
|
|
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
|
|
-dl, --detect-language [false ] exit after automatically detecting language
|
|
--prompt PROMPT [ ] initial prompt
|
|
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
|
|
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
|
|
--host HOST, [127.0.0.1] Hostname/ip-adress for the server
|
|
--port PORT, [8080 ] Port number for the server
|
|
--convert, [false ] Convert audio to WAV, requires ffmpeg on the server
|
|
```
|
|
|
|
> [!WARNING]
|
|
> **Do not run the server example with administrative privileges and ensure it's operated in a sandbox environment, especially since it involves risky operations like accepting user file uploads and using ffmpeg for format conversions. Always validate and sanitize inputs to guard against potential security threats.**
|
|
|
|
## request examples
|
|
|
|
**/inference**
|
|
```
|
|
curl 127.0.0.1:8080/inference \
|
|
-H "Content-Type: multipart/form-data" \
|
|
-F file="@<file-path>" \
|
|
-F temperature="0.0" \
|
|
-F temperature_inc="0.2" \
|
|
-F response_format="json"
|
|
```
|
|
|
|
**/load**
|
|
```
|
|
curl 127.0.0.1:8080/load \
|
|
-H "Content-Type: multipart/form-data" \
|
|
-F model="<path-to-model-file>"
|
|
```
|
|
|
|
## Load testing with k6
|
|
|
|
> **Note:** Install [k6](https://k6.io/docs/get-started/installation/) before running the benchmark script.
|
|
|
|
You can benchmark the Whisper server using the provided bench.js script with [k6](https://k6.io/). This script sends concurrent multipart requests to the /inference endpoint and is fully configurable via environment variables.
|
|
|
|
**Example usage:**
|
|
|
|
```
|
|
k6 run bench.js \
|
|
--env FILE_PATH=/absolute/path/to/samples/jfk.wav \
|
|
--env BASE_URL=http://127.0.0.1:8080 \
|
|
--env ENDPOINT=/inference \
|
|
--env CONCURRENCY=4 \
|
|
--env TEMPERATURE=0.0 \
|
|
--env TEMPERATURE_INC=0.2 \
|
|
--env RESPONSE_FORMAT=json
|
|
```
|
|
|
|
**Environment variables:**
|
|
- `FILE_PATH`: Path to the audio file to send (must be absolute or relative to the k6 working directory)
|
|
- `BASE_URL`: Server base URL (default: `http://127.0.0.1:8080`)
|
|
- `ENDPOINT`: API endpoint (default: `/inference`)
|
|
- `CONCURRENCY`: Number of concurrent requests (default: 4)
|
|
- `TEMPERATURE`: Decoding temperature (default: 0.0)
|
|
- `TEMPERATURE_INC`: Temperature increment (default: 0.2)
|
|
- `RESPONSE_FORMAT`: Response format (default: `json`)
|
|
|
|
**Note:**
|
|
- The server must be running and accessible at the specified `BASE_URL` and `ENDPOINT`.
|
|
- The script is located in the same directory as this README: `bench.js`.
|