Compare commits

..

74 Commits

Author SHA1 Message Date
88112c8afb release : v1.5.2 2023-12-14 17:56:39 +02:00
375585c07c wchess : update readme 2023-12-14 17:51:14 +02:00
fd99ece8e3 wchess : whisper assisted chess (#1595)
* wchess: whisper assisted chess

* wchess: fix allowed moves in check

* wchess: touchstart, touchend events

* wchess: css, disabled button

* wchess : html touches

* wchess : minor fixes and code style

* wchess : bump encoder context to 1280

* wchess : index.html

* wchess : fix CI warnings

* wchess : add array header

* wchess : build static library

* wchess : display grammar

* wchess : update UX

* wchess : add comment

* wchess : add README

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 15:58:26 +02:00
8171e621fc sync : ggml (Metal fixes, new ops, tests) (#1633)
* sync : ggml (Metal fixes, new ops, tests)

* cuda : fix bin bcast when src1 and dst have different types
2023-12-13 21:55:03 +02:00
ec03661b20 cmake : target windows 8 or above for prefetchVirtualMemory in llama-talk (#1617)
Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)
2023-12-12 11:35:00 +00:00
6335933a5b cmake : Fix bug in httplib.h for mingw (#1615)
Fix bug in httlib.h for mingw, please see https://github.com/yhirose/cpp-httplib/issues/1669
2023-12-10 17:47:52 +00:00
885b5563d0 metal : fix ggml_metal_log vargs (#1606) 2023-12-08 13:50:50 +02:00
9521ba6801 whisper.objc : disable timestamps for real-time transcription 2023-12-08 13:43:37 +02:00
29511d33c7 whisper : more debug messages + fix fallback logic 2023-12-08 13:43:12 +02:00
7bc4d22337 metal : fix soft_max kernel src1 argument (#1602) 2023-12-08 13:39:32 +02:00
afce6fa113 sync : ggml (new ops, new backend, etc) (#1602)
* sync : ggml (new ops, new backend, etc)

* whisper : remove obsolete broadcasting code

* ggml : remove backend self-registers + fix ggml_concat + n_task logic

* metal : fix assert

* metal : print resource path

* whisper : fix bug if metal init fails
2023-12-07 22:27:19 +02:00
3163090d89 server : pass max-len argument to the server (#1574)
This commit fixes the missing parameter binding for max-len between the input
arguments and wparams.
2023-12-05 23:01:45 +02:00
f0efd0202d ios : Remove #if arch(arm) check for using Metal (#1561) 2023-12-05 01:14:26 +00:00
3c28d1a571 ggml : Fix 32-bit compiler warning (#1575)
Warning about %lu on 32-bit targets. Updated to %zu.
2023-12-03 14:15:28 +00:00
e369243ebd ggml : re-enable blas for src0 != F32 (#1583) 2023-12-01 23:57:52 +02:00
a0ec3fac54 Server : Add support for .vtt format to Whisper server (#1578)
- The code comes from examples/main
- The output mimetype is set to text/vtt

Example usage:
```shell
curl 127.0.0.1:8080/inference \
-H "Content-Type: multipart/form-data" \
-F file="@samples/jfk.wav" \
-F temperature="0.2" \
-F response-format="vtt"
```
2023-11-30 23:44:26 +00:00
6559b538e5 server : backport .srt output format (#1565)
This commit adds a support of .srt format to Whisper server. The code is
effectively backported from examples/main. The output mimetype is set to
application/x-subrip as per https://en.wikipedia.org/wiki/SubRip.

Example usage:

  curl 127.0.0.1:8080/inference \
    -H "Content-Type: multipart/form-data" \
    -F file="@<file-path>" \
    -F temperature="0.2" \
    -F response-format="srt"
2023-11-28 15:42:58 +02:00
73d5005880 cmake : install required ggml.h header (#1568) 2023-11-28 15:41:49 +02:00
6b094b6dfe server : set default CORS headers to allow all (#1567) 2023-11-28 11:55:20 +02:00
641f2f4282 readme : update help (#1560) 2023-11-27 12:04:08 +02:00
bfacd9f8ce CI : Add CUDA 11.8.0 support (#1554)
* try to fix cublas build in CI

* add multiple cuda-toolkit version

* Update build.yml

* Disable CUDA-toolkit 10.2.89
2023-11-27 12:03:16 +02:00
f52e74d4dc CI : Rectify the Clang-Related workflow issues (#1551)
* fix bugs in workflow

* fix missing clang in workflow

* Update build.yml
2023-11-27 11:35:37 +02:00
23c21e92eb server : automatically convert audio on the server (#1539)
* server : automatically convert audio on the server

* server : remove rebundant comments

* server : automatic conversion refactor

* server : update server readme

* server : remove unnecessary comments and tabs

* server : put back remove calling

* server : apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server : check ffmpeg before the server lunch

* server : fix indentation

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server : fix function typo calling

* server : fix function typo calling

* server : add warning in readme

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-27 11:28:34 +02:00
447d49530c whisper : remove trailing whitespaces 2023-11-24 13:13:21 +02:00
9d6ebd877c release : v1.5.1 2023-11-24 12:41:55 +02:00
0ba365f958 metal : add backend function to check device family support (#1547) 2023-11-24 12:37:08 +02:00
010c8ec3ab cuda : sync some minor stuff from llama.cpp (#1548) 2023-11-24 12:36:21 +02:00
ffdb5c4735 whisper : fix typo 2023-11-24 09:45:10 +02:00
a5881d619c server : add --print-realtime param (#1541)
* server : add --print-realtime param

* Fix duplicate realtime output
2023-11-24 09:35:02 +02:00
34f70b3a56 whisper : add whisper_lang_str_full (#1546)
* Update whisper.h

add whisper_lang_fullstr to retrieve the full language name

* Update whisper.cpp

add whisper_lang_fullstr to return the full language name

* fullstr -> str_full

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-24 09:33:13 +02:00
8328d1900f fix(server): typo in temperature parameter (#1545)
Also fixed another typo in comments.
2023-11-23 20:59:36 +02:00
d2bd5f0bdc metal : fix build (#1544) 2023-11-23 20:20:53 +02:00
34209a37a2 readme : add server example 2023-11-23 17:20:33 +02:00
180e062eda go : fixed Makefile for MacOS ARM 64 (#1530)
* Fixed Makefile for MacOS ARM 64 based on https://github.com/ggerganov/whisper.cpp/issues/1344 + proper ggml-metal env var setting

* conditional to fix broken non-macos compilation

* spaces -> tab

* make : fix whitespaces

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-22 18:08:11 +02:00
5c7be85fdc Change temp file name for server application (#1535)
Avoid issue of removing file if it exists in the current working
directory
2023-11-22 09:23:36 +01:00
146169ec38 bench : pass memcpy threads from cli 2023-11-21 22:27:22 +02:00
9befab5ab9 bench : multi-thread memcpy (#1534) 2023-11-21 22:07:30 +02:00
9ac88f2b57 Close file after writing in server application (#1533)
Fix of mistake leaving file open while reading it again as wav
2023-11-21 20:36:10 +01:00
46f5b6cb08 server : add video to readme 2023-11-21 17:30:43 +02:00
eff3570f78 server : add a REST Whisper server example with OAI-like API (#1380)
* Add first draft of server

* Added json support and base funcs for server.cpp

* Add more user input via api-request

also some clean up

* Add reqest params and load post function

Also some general clean up

* Remove unused function

* Add readme

* Add exception handlers

* Update examples/server/server.cpp

* make : add server target

* Add magic curl syntax

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-20 21:40:24 +02:00
fa19bc4195 whisper : update example in whisper.h (#1529)
update the example in the header, previous examples deprecated.
2023-11-20 20:52:27 +02:00
a01b2e0971 sdl : fix audio callback (#1523) 2023-11-20 13:16:38 +02:00
8159a9ab99 whisper : reuse whisper_decode_with_state (#1521) 2023-11-20 13:16:11 +02:00
7516d9c16d ci : redistribute CUDA DLLs (#1522)
see https://docs.nvidia.com/cuda/eula/index.html#attachment-a
2023-11-19 12:43:22 +02:00
46cc26d1b9 whisper : fix with_state methods to use the correct state (#1519)
Co-authored-by: Sandro Hanea <sandrohanea@microsoft.com>
2023-11-19 11:25:30 +02:00
f784f9fa12 whisper : fix overriding the audio context 2023-11-19 10:32:32 +02:00
ca23f8ee6d cuda : assert ggml_add sources to be contiguous 2023-11-19 10:32:08 +02:00
e2f0eba2d4 ios : sync submodule 2023-11-17 10:42:04 +02:00
d4353e48f7 sync : ggml (ggml-alloc + linker + gguf fixes) (#1501) 2023-11-17 10:00:07 +02:00
bebf0da983 quantize : add support for K-quant types 2023-11-16 16:18:24 +02:00
848e54f3ad bench : fix memcpy bench size 2023-11-16 10:59:32 +02:00
7883d1cae4 talk-llama : improve quote and backtick handling (#1364)
* ISSUE-1329: replace " with ' so it doesn't try to execute code in backticks.

* Typo

* Update to keep possessives in the output

Closes the ' then puts a ' in quotes then reopens the ' to escape the ' characters.
2023-11-16 10:34:05 +02:00
ccc85b4ff8 talk-llama : enable GPU by default 2023-11-15 21:33:00 +02:00
c7606b47df models : add info about distilled models 2023-11-15 21:10:13 +02:00
d38af151a1 release : v1.5.0 2023-11-15 21:02:52 +02:00
94267df08e bench-all : add distil models 2023-11-15 20:49:12 +02:00
8713c67133 js : latest whisper.js 2023-11-15 20:10:16 +02:00
57a60639bb bench-all : indentations 2023-11-15 20:01:15 +02:00
bfbaa4dce5 whisper : make large version explicit + fix data size units (#1493) 2023-11-15 19:42:25 +02:00
1d79e78402 java : fix test (#1492) 2023-11-15 17:42:53 +02:00
b6c5f49b78 whisper : add batched decoding (#1486)
* whisper : add whisper_batch

* whisper : move kv_self to whisper_state

* whisper : full batched decoding support

* whisper : fix memory leak in whisper_batch

* whisper : fix mem leak again + remove oboslete function

* whisper : clear kv cache when using whisper_decode API

* whisper : speed-up sampling

* whisper : fix decoders initializer

* bench : add batch size 5 bench

* whisper : add comment about the KV cache size

* whisper : add check for max number of decoders

* whisper : avoid starting sampling threads with bs=1

* whisper : enable beam-search by default

* cuda : sync llama.cpp fixes
2023-11-15 16:12:52 +02:00
d4231649e6 java : use tiny.en for tests (#1484)
* java : use tiny.en for tests

* java : try to fix full params struct
2023-11-13 16:53:55 +02:00
3e5c7feeff whisper : add grammar-based sampling (#1229)
* whisper : add grammar-based sampling

* build : fix after master merge

* command : fix exception when recognizing the command

* whisper : fine-tuning grammar functionality

* command : grammar-related improvements

- option to read grammar from file
- add sample grammars for colors and chess moves
- fine-tune the performance further

* grammars : add assistant + update comments

* command : enable beam-search, add "no_timestamps", add "context", add p

* whisper : remove comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-13 10:51:34 +02:00
c23598e4ca talk-llama : add n_gpu_layers parameter (#1475) 2023-11-13 10:04:16 +02:00
54a08bde29 examples : add whisper.android.java for compatibility with older Android versions using Java (#1382)
* save the recorded audio to a file

* Alignment -help

* Save the correct audio

* chage to a consistent coding style

* Correct typo

* Update examples/stream/stream.cpp

* Update examples/stream/stream.cpp

* Correct variable misuse

* Update examples/stream/stream.cpp

* Update examples/stream/stream.cpp

* Update examples/stream/stream.cpp

* Update examples/stream/stream.cpp

* add *.bin .cxx/ .gradle/ cmake-build-debug/ to gitignore

* add whisper.android.java

* Added support for older versions of Android of Java

* add examples for android java

* add README.md for android java

* add fullTranscribeWithTime

* 增加 toString()方法和测试

* change return type to void

* update to v1.4.1

* add WhisperService

* chage to whisper_full_get_segment_t1

* add method transcribeDataWithTime

* modified toString
```
return "[" + start + " --> " + end + "]:" + sentence;
```

* Optimize code logic

* update text view on handle

* set max lines

* change Chinese to English

* Update bindings/java/build.gradle

* Update .gitignore

* add android.java to github action

* chage android.java to   android_java in build.yml

* remove gradle

* chage jdk to temurin in android_java of CI

* chage jdk to temurin 11 in android_java of CI

* add x to gradlew

* set api-level for android_java of CI

* Update examples/whisper.android.java/app/src/main/jni/whisper/CMakeLists.txt

* add ndk version in build.gradle

* remove local.properties

* add testFullTranscribeWithTime

---------

Co-authored-by: litongmacos <litongjava@qq.com>
Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>
2023-11-12 18:31:58 +02:00
9f8bbd3fee readme : update comment about source code 2023-11-12 17:47:37 +02:00
3172006a24 ggml : fix some compile warnings 2023-11-12 16:36:20 +02:00
684bc8bd70 readme : update GPU / CUDA 2023-11-12 15:40:37 +02:00
b0502836b8 whisper : add full CUDA and Metal offloading (#1472)
* whisper : migrate to ggml-backend

* whisper : fix logit reading

* whisper : fix tensor allocation during load

* whisper : fix beam-search with CUDA

* whisper : free backends + fix compile warning

* whisper : print when CUDA is enabled

* whisper : fix CoreML

* make : clean-up

* talk : fix compile warning

* whisper : support ggml_conv with CUDA and Metal (#1473)

* ggml : add CUDA support for ggml_conv

* whisper : remove ggml_repeat for conv bias + single backend

* cuda : fix im2col kernel

* metal : add im2col support + mul mat-vec f16 x f16

* bench-all : add q4 models

* whisper : clean-up

* quantize-all : fix

* ggml : im2col opts

* whisper : avoid whisper_model_data wrapper

* whisper : add note that ggml_mul_mat_pad does not work with CUDA

* whisper : factor out graph compute in common function

* whisper : fixes

* whisper : fix UB with measure buffers

* whisper : try to fix the parallel whisper_state functionality (#1479)

* whisper : try to fix the parallel whisper_state functionality

* whisper : fix multi-state Metal

* whisper : free backend instances in whisper_state
2023-11-12 15:31:08 +02:00
ec7a6f04f9 whisper : return with error from whisper_encode_internal and whisper_decode_internal when abort callback is true (#1456)
Co-authored-by: Ben Nortier <ben@bjnortier.com>
2023-11-10 13:51:16 +02:00
37947203e6 talk-llama : add language auto detect (#1467)
* Add '-l auto' to talk-llama example

* Update examples/talk-llama/talk-llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-09 19:21:44 +02:00
953419c69a openvino : update convert-whisper-to-openvino.py to support v3 (#1459) 2023-11-09 12:42:39 +02:00
0de8582f65 coreml : use the correct n_mel value (#1458) 2023-11-08 20:01:41 +00:00
baeb733691 whisper : reset mel time when resetting timings (#1452)
Co-authored-by: Ben Nortier <ben@bjnortier.com>
2023-11-08 15:52:23 +02:00
161 changed files with 51426 additions and 3892 deletions

View File

@ -25,6 +25,7 @@ jobs:
docker run --platform ${{ matrix.arch }} --rm \
-v ${{ github.workspace }}:/workspace \
-w /workspace ${{ env.ubuntu_image }} /bin/sh -c '
set -e
apt update
apt install -y build-essential libsdl2-dev
make
@ -86,6 +87,7 @@ jobs:
docker run --platform ${{ matrix.arch }} --rm \
-v ${{ github.workspace }}:/workspace \
-w /workspace ${{ env.ubuntu_image }} /bin/sh -c '
set -e
apt update
apt install -y build-essential cmake libsdl2-dev
cmake . -DWHISPER_SDL2=ON -DCMAKE_BUILD_TYPE=${{ matrix.build }}
@ -113,8 +115,10 @@ jobs:
docker run --platform ${{ matrix.arch }} --rm \
-v ${{ github.workspace }}:/workspace \
-w /workspace ${{ env.ubuntu_image }} /bin/sh -c '
set -e
apt update
apt install -y build-essential cmake libsdl2-dev
apt install -y clang
apt install -y clang build-essential cmake libsdl2-dev
cmake . -DWHISPER_SDL2=ON -DCMAKE_BUILD_TYPE=${{ matrix.build }} -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang
make
ctest -L gh --output-on-failure'
@ -140,6 +144,7 @@ jobs:
docker run --platform ${{ matrix.arch }} --rm \
-v ${{ github.workspace }}:/workspace \
-w /workspace ${{ env.ubuntu_image }} /bin/sh -c '
set -e
apt update
apt install -y build-essential cmake
cmake . -DCMAKE_BUILD_TYPE=Debug -DWHISPER_SANITIZE_${{ matrix.sanitizer }}=ON
@ -217,10 +222,10 @@ jobs:
sdl2: [ON]
include:
- arch: Win32
obzip: https://github.com/OpenMathLib/OpenBLAS/releases/download/v0.3.24/OpenBLAS-0.3.24-x86.zip
obzip: https://github.com/OpenMathLib/OpenBLAS/releases/download/v0.3.25/OpenBLAS-0.3.25-x86.zip
s2arc: x86
- arch: x64
obzip: https://github.com/OpenMathLib/OpenBLAS/releases/download/v0.3.24/OpenBLAS-0.3.24-x64.zip
obzip: https://github.com/OpenMathLib/OpenBLAS/releases/download/v0.3.25/OpenBLAS-0.3.25-x64.zip
s2arc: x64
- sdl2: ON
s2ver: 2.26.0
@ -285,6 +290,7 @@ jobs:
arch: [x64]
cublas: [ON]
sdl2: [ON]
cuda-toolkit: [12.2.0, 11.8.0]
include:
- arch: x64
s2arc: x64
@ -300,7 +306,9 @@ jobs:
- name: Install CUDA Toolkit
id: cuda-toolkit
uses: Jimver/cuda-toolkit@v0.2.10
uses: Jimver/cuda-toolkit@v0.2.11
with:
cuda: '${{ matrix.cuda-toolkit }}'
- name: Fetch SDL2 and set SDL2_DIR
if: matrix.sdl2 == 'ON'
@ -315,10 +323,17 @@ jobs:
-DCMAKE_BUILD_TYPE=${{ matrix.build }}
-DWHISPER_CUBLAS=1
- name: Build
- name: Build ${{ matrix.cuda-toolkit }}
run: |
cd ./build
msbuild ALL_BUILD.vcxproj -t:build -p:configuration=${{ matrix.build }} -p:platform=${{ matrix.arch }}
cmake --build . --config ${{ matrix.build }}
- name: Copy CUDA DLLs
run: >
Copy-Item -PassThru
-Path "${{ steps.cuda-toolkit.outputs.CUDA_PATH }}/bin/*.dll"
-Include cudart64_*,cublas64_*,cublasLt64_*
-Destination build/bin/${{ matrix.build }}
- name: Copy SDL2.dll
if: matrix.sdl2 == 'ON'
@ -328,7 +343,7 @@ jobs:
if: matrix.sdl2 == 'ON'
uses: actions/upload-artifact@v1
with:
name: whisper-cublas-bin-${{ matrix.arch }}
name: whisper-cublas-${{ matrix.cuda-toolkit }}-bin-${{ matrix.arch }}
path: build/bin/${{ matrix.build }}
emscripten:
@ -396,6 +411,32 @@ jobs:
cd examples/whisper.android
./gradlew assembleRelease --no-daemon
android_java:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v3
- name: set up JDK 11
uses: actions/setup-java@v3
with:
java-version: '11'
distribution: 'temurin'
cache: gradle
- name: Setup Android SDK
uses: android-actions/setup-android@v2
with:
api-level: 30
build-tools-version: 30.0.3
- name: Build
run: |
cd examples/whisper.android.java
chmod +x ./gradlew
./gradlew assembleRelease
java:
needs: [ 'windows' ]
runs-on: windows-latest

6
.gitignore vendored
View File

@ -8,6 +8,7 @@
.DS_Store
build/
build-coreml/
build-em/
build-debug/
build-release/
@ -30,6 +31,7 @@ build-sanitize-thread/
/talk-llama
/bench
/quantize
/server
/lsp
arm_neon.h
@ -53,3 +55,7 @@ bindings/java/.idea/
.idea/
benchmark_results.csv
cmake-build-debug/
.cxx/
.gradle/
local.properties

View File

@ -1,6 +1,6 @@
cmake_minimum_required (VERSION 3.5)
project(whisper.cpp VERSION 1.4.3)
project(whisper.cpp VERSION 1.5.2)
# Add path to modules
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")
@ -533,7 +533,7 @@ target_compile_definitions(${TARGET} PUBLIC
${WHISPER_EXTRA_FLAGS}
)
set_target_properties(${TARGET} PROPERTIES PUBLIC_HEADER "whisper.h")
set_target_properties(${TARGET} PROPERTIES PUBLIC_HEADER "ggml.h;whisper.h")
include(GNUInstallDirs)

View File

@ -1,4 +1,4 @@
default: main bench quantize
default: main bench quantize server
ifndef UNAME_S
UNAME_S := $(shell uname -s)
@ -307,7 +307,7 @@ ggml-backend.o: ggml-backend.c ggml.h ggml-backend.h
ggml-quants.o: ggml-quants.c ggml.h ggml-quants.h
$(CC) $(CFLAGS) -c $< -o $@
WHISPER_OBJ += ggml-alloc.o ggml-backend.o ggml-quants.o
WHISPER_OBJ += ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o
whisper.o: whisper.cpp whisper.h ggml.h ggml-cuda.h
$(CXX) $(CXXFLAGS) -c $< -o $@
@ -331,14 +331,14 @@ ggml-metal.o: ggml-metal.m ggml-metal.h
WHISPER_OBJ += ggml-metal.o
endif
libwhisper.a: ggml.o $(WHISPER_OBJ)
$(AR) rcs libwhisper.a ggml.o $(WHISPER_OBJ)
libwhisper.a: $(WHISPER_OBJ)
$(AR) rcs libwhisper.a $(WHISPER_OBJ)
libwhisper.so: ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) -shared -o libwhisper.so ggml.o $(WHISPER_OBJ) $(LDFLAGS)
libwhisper.so: $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) -shared -o libwhisper.so $(WHISPER_OBJ) $(LDFLAGS)
clean:
rm -f *.o main stream command talk talk-llama bench quantize lsp libwhisper.a libwhisper.so
rm -f *.o main stream command talk talk-llama bench quantize server lsp libwhisper.a libwhisper.so
#
# Examples
@ -349,30 +349,33 @@ CC_SDL=`sdl2-config --cflags --libs`
SRC_COMMON = examples/common.cpp examples/common-ggml.cpp
SRC_COMMON_SDL = examples/common-sdl.cpp
main: examples/main/main.cpp $(SRC_COMMON) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/main/main.cpp $(SRC_COMMON) ggml.o $(WHISPER_OBJ) -o main $(LDFLAGS)
main: examples/main/main.cpp $(SRC_COMMON) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/main/main.cpp $(SRC_COMMON) $(WHISPER_OBJ) -o main $(LDFLAGS)
./main -h
bench: examples/bench/bench.cpp ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/bench/bench.cpp ggml.o $(WHISPER_OBJ) -o bench $(LDFLAGS)
bench: examples/bench/bench.cpp $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/bench/bench.cpp $(WHISPER_OBJ) -o bench $(LDFLAGS)
quantize: examples/quantize/quantize.cpp ggml.o $(WHISPER_OBJ) $(SRC_COMMON)
$(CXX) $(CXXFLAGS) examples/quantize/quantize.cpp $(SRC_COMMON) ggml.o $(WHISPER_OBJ) -o quantize $(LDFLAGS)
quantize: examples/quantize/quantize.cpp $(WHISPER_OBJ) $(SRC_COMMON)
$(CXX) $(CXXFLAGS) examples/quantize/quantize.cpp $(SRC_COMMON) $(WHISPER_OBJ) -o quantize $(LDFLAGS)
stream: examples/stream/stream.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/stream/stream.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ) -o stream $(CC_SDL) $(LDFLAGS)
server: examples/server/server.cpp $(SRC_COMMON) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/server/server.cpp $(SRC_COMMON) $(WHISPER_OBJ) -o server $(LDFLAGS)
command: examples/command/command.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/command/command.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ) -o command $(CC_SDL) $(LDFLAGS)
stream: examples/stream/stream.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/stream/stream.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o stream $(CC_SDL) $(LDFLAGS)
lsp: examples/lsp/lsp.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/lsp/lsp.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ) -o lsp $(CC_SDL) $(LDFLAGS)
command: examples/command/command.cpp examples/grammar-parser.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/command/command.cpp examples/grammar-parser.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o command $(CC_SDL) $(LDFLAGS)
talk: examples/talk/talk.cpp examples/talk/gpt-2.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/talk/talk.cpp examples/talk/gpt-2.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ) -o talk $(CC_SDL) $(LDFLAGS)
lsp: examples/lsp/lsp.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/lsp/lsp.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o lsp $(CC_SDL) $(LDFLAGS)
talk-llama: examples/talk-llama/talk-llama.cpp examples/talk-llama/llama.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/talk-llama/talk-llama.cpp examples/talk-llama/llama.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ) -o talk-llama $(CC_SDL) $(LDFLAGS)
talk: examples/talk/talk.cpp examples/talk/gpt-2.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/talk/talk.cpp examples/talk/gpt-2.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o talk $(CC_SDL) $(LDFLAGS)
talk-llama: examples/talk-llama/talk-llama.cpp examples/talk-llama/llama.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/talk-llama/talk-llama.cpp examples/talk-llama/llama.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) $(WHISPER_OBJ) -o talk-llama $(CC_SDL) $(LDFLAGS)
#
# Audio samples
@ -418,9 +421,9 @@ samples:
.PHONY: medium
.PHONY: large-v1
.PHONY: large-v2
.PHONY: large
.PHONY: large-v3
tiny.en tiny base.en base small.en small medium.en medium large-v1 large-v2 large: main
tiny.en tiny base.en base small.en small medium.en medium large-v1 large-v2 large-v3: main
bash ./models/download-ggml-model.sh $@
@echo ""
@echo "==============================================="

View File

@ -2,33 +2,14 @@
import PackageDescription
#if arch(arm) || arch(arm64)
let platforms: [SupportedPlatform]? = [
.macOS(.v12),
.iOS(.v14),
.watchOS(.v4),
.tvOS(.v14)
]
let exclude: [String] = []
let resources: [Resource] = [
.process("ggml-metal.metal")
]
let additionalSources: [String] = ["ggml-metal.m"]
let additionalSettings: [CSetting] = [
.unsafeFlags(["-fno-objc-arc"]),
.define("GGML_USE_METAL")
]
#else
let platforms: [SupportedPlatform]? = nil
let exclude: [String] = ["ggml-metal.metal"]
let resources: [Resource] = []
let additionalSources: [String] = []
let additionalSettings: [CSetting] = []
#endif
let package = Package(
name: "whisper",
platforms: platforms,
platforms: [
.macOS(.v12),
.iOS(.v14),
.watchOS(.v4),
.tvOS(.v14)
],
products: [
.library(name: "whisper", targets: ["whisper"]),
],
@ -36,7 +17,7 @@ let package = Package(
.target(
name: "whisper",
path: ".",
exclude: exclude + [
exclude: [
"bindings",
"cmake",
"coreml",
@ -55,19 +36,22 @@ let package = Package(
"whisper.cpp",
"ggml-alloc.c",
"ggml-backend.c",
"ggml-quants.c"
] + additionalSources,
resources: resources,
"ggml-quants.c",
"ggml-metal.m"
],
resources: [.process("ggml-metal.metal")],
publicHeadersPath: "spm-headers",
cSettings: [
.unsafeFlags(["-Wno-shorten-64-to-32", "-O3", "-DNDEBUG"]),
.define("GGML_USE_ACCELERATE")
.define("GGML_USE_ACCELERATE"),
.unsafeFlags(["-fno-objc-arc"]),
.define("GGML_USE_METAL")
// NOTE: NEW_LAPACK will required iOS version 16.4+
// We should consider add this in the future when we drop support for iOS 14
// (ref: ref: https://developer.apple.com/documentation/accelerate/1513264-cblas_sgemm?language=objc)
// .define("ACCELERATE_NEW_LAPACK"),
// .define("ACCELERATE_LAPACK_ILP64")
] + additionalSettings,
],
linkerSettings: [
.linkedFramework("Accelerate")
]

View File

@ -6,7 +6,7 @@
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![npm](https://img.shields.io/npm/v/whisper.cpp.svg)](https://www.npmjs.com/package/whisper.cpp/)
Beta: [v1.4.3](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.4.3) / Stable: [v1.2.1](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.2.1) / [Roadmap | F.A.Q.](https://github.com/ggerganov/whisper.cpp/discussions/126)
Stable: [v1.5.2](https://github.com/ggerganov/whisper.cpp/releases/tag/v1.5.2) / [Roadmap | F.A.Q.](https://github.com/ggerganov/whisper.cpp/discussions/126)
High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisper) automatic speech recognition (ASR) model:
@ -16,12 +16,10 @@ High-performance inference of [OpenAI's Whisper](https://github.com/openai/whisp
- VSX intrinsics support for POWER architectures
- Mixed F16 / F32 precision
- [4-bit and 5-bit integer quantization support](https://github.com/ggerganov/whisper.cpp#quantization)
- Low memory usage (Flash Attention)
- Zero memory allocations at runtime
- Support for CPU-only inference
- [Partial GPU support for NVIDIA via cuBLAS](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
- [Efficient GPU support for NVIDIA](https://github.com/ggerganov/whisper.cpp#nvidia-gpu-support-via-cublas)
- [Partial OpenCL GPU support via CLBlast](https://github.com/ggerganov/whisper.cpp#opencl-gpu-support-via-clblast)
- [BLAS CPU support via OpenBLAS](https://github.com/ggerganov/whisper.cpp#blas-cpu-support-via-openblas)
- [OpenVINO Support](https://github.com/ggerganov/whisper.cpp#openvino-support)
- [C-style API](https://github.com/ggerganov/whisper.cpp/blob/master/whisper.h)
@ -36,10 +34,8 @@ Supported platforms:
- [x] Windows ([MSVC](https://github.com/ggerganov/whisper.cpp/blob/master/.github/workflows/build.yml#L117-L144) and [MinGW](https://github.com/ggerganov/whisper.cpp/issues/168)]
- [x] [Raspberry Pi](https://github.com/ggerganov/whisper.cpp/discussions/166)
The entire implementation of the model is contained in 2 source files:
- Tensor operations: [ggml.h](ggml.h) / [ggml.c](ggml.c)
- Transformer inference: [whisper.h](whisper.h) / [whisper.cpp](whisper.cpp)
The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
The rest of the code is part of the [ggml](https://github.com/ggerganov/ggml) machine learning library.
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
@ -114,8 +110,8 @@ options:
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [2 ] number of best candidates to keep
-bs N, --beam-size N [-1 ] beam size for beam search
-bo N, --best-of N [5 ] number of best candidates to keep
-bs N, --beam-size N [5 ] beam size for beam search
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
@ -132,6 +128,7 @@ options:
-fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
-ocsv, --output-csv [false ] output result in a CSV file
-oj, --output-json [false ] output result in a JSON file
-ojf, --output-json-full [false ] include more information in the JSON file
-of FNAME, --output-file FNAME [ ] output file path (without file extension)
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
@ -143,7 +140,8 @@ options:
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-f FNAME, --file FNAME [ ] input WAV file path
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
-ls, --log-score [false ] log best decoder scores of token
-ls, --log-score [false ] log best decoder scores of tokens
-ng, --no-gpu [false ] disable GPU
bash ./models/download-ggml-model.sh base.en
@ -235,18 +233,18 @@ make medium.en
make medium
make large-v1
make large-v2
make large
make large-v3
```
## Memory usage
| Model | Disk | Mem | SHA |
| --- | --- | --- | --- |
| tiny | 75 MB | ~125 MB | `bd577a113a864445d4c299885e0cb97d4ba92b5f` |
| base | 142 MB | ~210 MB | `465707469ff3a37a2b9b8d8f89f2f99de7299dac` |
| small | 466 MB | ~600 MB | `55356645c2b361a969dfd0ef2c5a50d530afd8d5` |
| medium | 1.5 GB | ~1.7 GB | `fd9727b6e1217c2f614f9b698455c4ffd82463b4` |
| large | 2.9 GB | ~3.3 GB | `ad82bf6a9043ceed055076d0fd39f5f186ff8062` |
| Model | Disk | Mem |
| --- | --- | --- |
| tiny | 75 MiB | ~273 MB |
| base | 142 MiB | ~388 MB |
| small | 466 MiB | ~852 MB |
| medium | 1.5 GiB | ~2.1 GB |
| large | 2.9 GiB | ~3.9 GB |
## Quantization
@ -400,12 +398,12 @@ This can result in significant speedup in encoder performance. Here are the inst
The first time run on an OpenVINO device is slow, since the OpenVINO framework will compile the IR (Intermediate Representation) model to a device-specific 'blob'. This device-specific blob will get
cached for the next run.
For more information about the Core ML implementation please refer to PR [#1037](https://github.com/ggerganov/whisper.cpp/pull/1037).
## NVIDIA GPU support via cuBLAS
## NVIDIA GPU support
With NVIDIA cards the Encoder processing can to a large extent be offloaded to the GPU through cuBLAS.
With NVIDIA cards the processing of the models is done efficiently on the GPU via cuBLAS and custom CUDA kernels.
First, make sure you have installed `cuda`: https://developer.nvidia.com/cuda-downloads
Now build `whisper.cpp` with cuBLAS support:
@ -772,6 +770,7 @@ Some of the examples are even ported to run in the browser using WebAssembly. Ch
| [bench](examples/bench) | [bench.wasm](examples/bench.wasm) | Benchmark the performance of Whisper on your machine |
| [stream](examples/stream) | [stream.wasm](examples/stream.wasm) | Real-time transcription of raw microphone capture |
| [command](examples/command) | [command.wasm](examples/command.wasm) | Basic voice assistant example for receiving voice commands from the mic |
| [wchess](examples/wchess) | [wchess.wasm](examples/wchess) | Voice-controlled chess |
| [talk](examples/talk) | [talk.wasm](examples/talk.wasm) | Talk with a GPT-2 bot |
| [talk-llama](examples/talk-llama) | | Talk with a LLaMA bot |
| [whisper.objc](examples/whisper.objc) | | iOS mobile application using whisper.cpp |
@ -781,6 +780,7 @@ Some of the examples are even ported to run in the browser using WebAssembly. Ch
| [generate-karaoke.sh](examples/generate-karaoke.sh) | | Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
| [livestream.sh](examples/livestream.sh) | | [Livestream audio transcription](https://github.com/ggerganov/whisper.cpp/issues/185) |
| [yt-wsp.sh](examples/yt-wsp.sh) | | Download + transcribe and/or translate any VOD [(original)](https://gist.github.com/DaniruKun/96f763ec1a037cc92fe1a059b643b818) |
| [server](examples/server) | | HTTP transcription server with OAI-like API |
## [Discussions](https://github.com/ggerganov/whisper.cpp/discussions)

View File

@ -1,9 +1,26 @@
ifndef UNAME_S
UNAME_S := $(shell uname -s)
endif
ifndef UNAME_P
UNAME_P := $(shell uname -p)
endif
ifndef UNAME_M
UNAME_M := $(shell uname -m)
endif
GGML_METAL_PATH_RESOURCES := $(abspath ../..)
BUILD_DIR := build
MODELS_DIR := models
EXAMPLES_DIR := $(wildcard examples/*)
INCLUDE_PATH := $(abspath ../..)
LIBRARY_PATH := $(abspath ../..)
ifeq ($(UNAME_S),Darwin)
EXT_LDFLAGS := -framework Foundation -framework Metal -framework MetalKit
endif
all: clean whisper examples
whisper: mkdir
@ -11,8 +28,13 @@ whisper: mkdir
@${MAKE} -C ../.. libwhisper.a
test: model-small whisper modtidy
ifeq ($(UNAME_S),Darwin)
@C_INCLUDE_PATH=${INCLUDE_PATH} LIBRARY_PATH=${LIBRARY_PATH} GGML_METAL_PATH_RESOURCES=${GGML_METAL_PATH_RESOURCES} go test -ldflags "-extldflags '$(EXT_LDFLAGS)'" -v .
@C_INCLUDE_PATH=${INCLUDE_PATH} LIBRARY_PATH=${LIBRARY_PATH} GGML_METAL_PATH_RESOURCES=${GGML_METAL_PATH_RESOURCES} go test -ldflags "-extldflags '$(EXT_LDFLAGS)'" -v ./pkg/whisper/...
else
@C_INCLUDE_PATH=${INCLUDE_PATH} LIBRARY_PATH=${LIBRARY_PATH} go test -v .
@C_INCLUDE_PATH=${INCLUDE_PATH} LIBRARY_PATH=${LIBRARY_PATH} go test -v ./pkg/whisper/...
endif
examples: $(EXAMPLES_DIR)
@ -21,7 +43,11 @@ model-small: mkdir examples/go-model-download
$(EXAMPLES_DIR): mkdir whisper modtidy
@echo Build example $(notdir $@)
ifeq ($(UNAME_S),Darwin)
@C_INCLUDE_PATH=${INCLUDE_PATH} LIBRARY_PATH=${LIBRARY_PATH} GGML_METAL_PATH_RESOURCES=${GGML_METAL_PATH_RESOURCES} go build ${BUILD_FLAGS} -ldflags "-extldflags '$(EXT_LDFLAGS)'" -o ${BUILD_DIR}/$(notdir $@) ./$@
else
@C_INCLUDE_PATH=${INCLUDE_PATH} LIBRARY_PATH=${LIBRARY_PATH} go build ${BUILD_FLAGS} -o ${BUILD_DIR}/$(notdir $@) ./$@
endif
mkdir:
@echo Mkdir ${BUILD_DIR}

View File

@ -24,7 +24,7 @@ const (
var (
// The models which will be downloaded, if no model is specified as an argument
modelNames = []string{"ggml-tiny.en", "ggml-tiny", "ggml-base.en", "ggml-base", "ggml-small.en", "ggml-small", "ggml-medium.en", "ggml-medium", "ggml-large-v1", "ggml-large-v2", "ggml-large"}
modelNames = []string{"ggml-tiny.en", "ggml-tiny", "ggml-base.en", "ggml-base", "ggml-small.en", "ggml-small", "ggml-medium.en", "ggml-medium", "ggml-large-v1", "ggml-large-v2", "ggml-large-v3"}
)
var (

View File

@ -9,6 +9,7 @@ archivesBaseName = 'whispercpp'
group = 'io.github.ggerganov'
version = '1.4.0'
sourceCompatibility = 1.8
targetCompatibility = 1.8

View File

@ -2,6 +2,7 @@ package io.github.ggerganov.whispercpp;
import com.sun.jna.Native;
import com.sun.jna.Pointer;
import io.github.ggerganov.whispercpp.bean.WhisperSegment;
import io.github.ggerganov.whispercpp.params.WhisperContextParams;
import io.github.ggerganov.whispercpp.params.WhisperFullParams;
import io.github.ggerganov.whispercpp.params.WhisperSamplingStrategy;
@ -9,6 +10,8 @@ import io.github.ggerganov.whispercpp.params.WhisperSamplingStrategy;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
/**
* Before calling most methods, you must call `initContext(modelPath)` to initialise the `ctx` Pointer.
@ -160,6 +163,28 @@ public class WhisperCpp implements AutoCloseable {
return str.toString().trim();
}
public List<WhisperSegment> fullTranscribeWithTime(WhisperFullParams whisperParams, float[] audioData) throws IOException {
if (ctx == null) {
throw new IllegalStateException("Model not initialised");
}
if (lib.whisper_full(ctx, whisperParams, audioData, audioData.length) != 0) {
throw new IOException("Failed to process audio");
}
int nSegments = lib.whisper_full_n_segments(ctx);
List<WhisperSegment> segments= new ArrayList<>(nSegments);
for (int i = 0; i < nSegments; i++) {
long t0 = lib.whisper_full_get_segment_t0(ctx, i);
String text = lib.whisper_full_get_segment_text(ctx, i);
long t1 = lib.whisper_full_get_segment_t1(ctx, i);
segments.add(new WhisperSegment(t0,t1,text));
}
return segments;
}
// public int getTextSegmentCount(Pointer ctx) {
// return lib.whisper_full_n_segments(ctx);

View File

@ -0,0 +1,47 @@
package io.github.ggerganov.whispercpp.bean;
/**
* Created by litonglinux@qq.com on 10/21/2023_7:48 AM
*/
public class WhisperSegment {
private long start, end;
private String sentence;
public WhisperSegment() {
}
public WhisperSegment(long start, long end, String sentence) {
this.start = start;
this.end = end;
this.sentence = sentence;
}
public long getStart() {
return start;
}
public long getEnd() {
return end;
}
public String getSentence() {
return sentence;
}
public void setStart(long start) {
this.start = start;
}
public void setEnd(long end) {
this.end = end;
}
public void setSentence(String sentence) {
this.sentence = sentence;
}
@Override
public String toString() {
return "[" + start + " --> " + end + "]:" + sentence;
}
}

View File

@ -58,6 +58,9 @@ public class WhisperFullParams extends Structure {
no_context = enable ? CBool.FALSE : CBool.TRUE;
}
/** Generate timestamps or not? */
public CBool no_timestamps;
/** Flag to force single segment output (useful for streaming). (default = false) */
public CBool single_segment;
@ -304,10 +307,16 @@ public class WhisperFullParams extends Structure {
logits_filter_callback = CallbackReference.getFunctionPointer(callback);
}
/** Grammar stuff */
public Pointer grammar_rules;
public long n_grammar_rules;
public long i_start_rule;
public float grammar_penalty;
@Override
protected List<String> getFieldOrder() {
return Arrays.asList("strategy", "n_threads", "n_max_text_ctx", "offset_ms", "duration_ms", "translate",
"no_context", "single_segment",
"no_context", "single_segment", "no_timestamps",
"print_special", "print_progress", "print_realtime", "print_timestamps", "token_timestamps",
"thold_pt", "thold_ptsum", "max_len", "split_on_word", "max_tokens", "speed_up", "audio_ctx",
"tdrz_enable", "initial_prompt", "prompt_tokens", "prompt_n_tokens", "language", "detect_language",
@ -316,6 +325,7 @@ public class WhisperFullParams extends Structure {
"new_segment_callback", "new_segment_callback_user_data",
"progress_callback", "progress_callback_user_data",
"encoder_begin_callback", "encoder_begin_callback_user_data",
"logits_filter_callback", "logits_filter_callback_user_data");
"logits_filter_callback", "logits_filter_callback_user_data",
"grammar_rules", "n_grammar_rules", "i_start_rule", "grammar_penalty");
}
}

View File

@ -2,6 +2,7 @@ package io.github.ggerganov.whispercpp;
import static org.junit.jupiter.api.Assertions.*;
import io.github.ggerganov.whispercpp.bean.WhisperSegment;
import io.github.ggerganov.whispercpp.params.CBool;
import io.github.ggerganov.whispercpp.params.WhisperFullParams;
import io.github.ggerganov.whispercpp.params.WhisperSamplingStrategy;
@ -11,6 +12,7 @@ import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.List;
class WhisperCppTest {
private static WhisperCpp whisper = new WhisperCpp();
@ -20,11 +22,12 @@ class WhisperCppTest {
static void init() throws FileNotFoundException {
// By default, models are loaded from ~/.cache/whisper/ and are usually named "ggml-${name}.bin"
// or you can provide the absolute path to the model file.
//String modelName = "../../models/ggml-tiny.bin";
String modelName = "../../models/ggml-tiny.en.bin";
try {
whisper.initContext(modelName);
// whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
// whisper.getJavaDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_BEAM_SEARCH);
//whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
//whisper.getJavaDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_BEAM_SEARCH);
modelInitialised = true;
} catch (FileNotFoundException ex) {
System.out.println("Model " + modelName + " not found");
@ -42,7 +45,7 @@ class WhisperCppTest {
assertEquals(16384, params.n_max_text_ctx);
assertFalse(params.translate);
assertEquals(0.01f, params.thold_pt);
assertEquals(2, params.beam_search.beam_size);
assertEquals(5, params.beam_search.beam_size);
assertEquals(-1.0f, params.beam_search.patience);
}
@ -55,7 +58,7 @@ class WhisperCppTest {
assertEquals(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY.ordinal(), params.strategy);
assertNotEquals(0, params.n_threads);
assertEquals(16384, params.n_max_text_ctx);
assertEquals(2, params.greedy.best_of);
assertEquals(5, params.greedy.best_of);
}
@Test
@ -72,11 +75,11 @@ class WhisperCppTest {
byte[] b = new byte[audioInputStream.available()];
float[] floats = new float[b.length / 2];
// WhisperFullParams params = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
//WhisperFullParams params = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
WhisperFullParams params = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_BEAM_SEARCH);
params.setProgressCallback((ctx, state, progress, user_data) -> System.out.println("progress: " + progress));
params.print_progress = CBool.FALSE;
// params.initial_prompt = "and so my fellow Americans um, like";
//params.initial_prompt = "and so my fellow Americans um, like";
try {
@ -99,4 +102,43 @@ class WhisperCppTest {
audioInputStream.close();
}
}
@Test
void testFullTranscribeWithTime() throws Exception {
if (!modelInitialised) {
System.out.println("Model not initialised, skipping test");
return;
}
// Given
File file = new File(System.getProperty("user.dir"), "../../samples/jfk.wav");
AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(file);
byte[] b = new byte[audioInputStream.available()];
float[] floats = new float[b.length / 2];
//WhisperFullParams params = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_GREEDY);
WhisperFullParams params = whisper.getFullDefaultParams(WhisperSamplingStrategy.WHISPER_SAMPLING_BEAM_SEARCH);
params.setProgressCallback((ctx, state, progress, user_data) -> System.out.println("progress: " + progress));
params.print_progress = CBool.FALSE;
//params.initial_prompt = "and so my fellow Americans um, like";
try {
audioInputStream.read(b);
for (int i = 0, j = 0; i < b.length; i += 2, j++) {
int intSample = (int) (b[i + 1]) << 8 | (int) (b[i]) & 0xFF;
floats[j] = intSample / 32767.0f;
}
List<WhisperSegment> segments = whisper.fullTranscribeWithTime(params, floats);
assertTrue(segments.size() > 0, "The size of segments should be greater than 0");
for (WhisperSegment segment : segments) {
System.out.println(segment);
}
} finally {
audioInputStream.close();
}
}
}

View File

@ -1,6 +1,6 @@
{
"name": "whisper.cpp",
"version": "1.4.3",
"version": "1.5.2",
"description": "Whisper speech recognition",
"main": "whisper.js",
"scripts": {

File diff suppressed because one or more lines are too long

View File

@ -123,7 +123,7 @@ API_AVAILABLE(macos(12.0), ios(15.0), watchos(8.0), tvos(15.0)) __attribute__((v
/**
Make a prediction using the convenience interface
@param logmel_data as 1 × 80 × 3000 3-dimensional array of floats:
@param logmel_data as 1 × n_mel × 3000 3-dimensional array of floats:
@param error If an error occurs, upon return contains an NSError object that describes the problem. If you are not interested in possible errors, pass in NULL.
@return the prediction as whisper_encoder_implOutput
*/

View File

@ -3,6 +3,8 @@
// Code is derived from the work of Github user @wangchou
// ref: https://github.com/wangchou/callCoreMLFromCpp
#include <stdint.h>
#if __cplusplus
extern "C" {
#endif
@ -14,6 +16,8 @@ void whisper_coreml_free(struct whisper_coreml_context * ctx);
void whisper_coreml_encode(
const whisper_coreml_context * ctx,
int64_t n_ctx,
int64_t n_mel,
float * mel,
float * out);

View File

@ -48,13 +48,15 @@ void whisper_coreml_free(struct whisper_coreml_context * ctx) {
void whisper_coreml_encode(
const whisper_coreml_context * ctx,
int64_t n_ctx,
int64_t n_mel,
float * mel,
float * out) {
MLMultiArray * inMultiArray = [
[MLMultiArray alloc] initWithDataPointer: mel
shape: @[@1, @80, @3000]
shape: @[@1, @(n_mel), @(n_ctx)]
dataType: MLMultiArrayDataTypeFloat32
strides: @[@(240000), @(3000), @1]
strides: @[@(n_ctx*n_mel), @(n_ctx), @1]
deallocator: nil
error: nil
];

View File

@ -23,6 +23,7 @@ add_library(${TARGET} STATIC
common.cpp
common-ggml.h
common-ggml.cpp
grammar-parser.cpp
)
include(DefaultTargetOptions)
@ -64,6 +65,7 @@ elseif(CMAKE_JS_VERSION)
else()
add_subdirectory(main)
add_subdirectory(stream)
add_subdirectory(server)
add_subdirectory(command)
add_subdirectory(bench)
add_subdirectory(quantize)
@ -71,3 +73,5 @@ else()
add_subdirectory(talk-llama)
add_subdirectory(lsp)
endif()
add_subdirectory(wchess)

View File

@ -81,7 +81,7 @@ int whisper_bench_full(const whisper_params & params) {
}
// heat encoder
if (int ret = whisper_encode(ctx, 0, params.n_threads) != 0) {
fprintf(stderr, "error: failed to encode model: %d\n", ret);
fprintf(stderr, "error: failed to encode: %d\n", ret);
return 4;
}
@ -90,13 +90,13 @@ int whisper_bench_full(const whisper_params & params) {
// prompt heat
if (int ret = whisper_decode(ctx, tokens, 256, 0, params.n_threads) != 0) {
fprintf(stderr, "error: failed to encode model: %d\n", ret);
fprintf(stderr, "error: failed to decode: %d\n", ret);
return 4;
}
// text-generation heat
if (int ret = whisper_decode(ctx, tokens, 1, 256, params.n_threads) != 0) {
fprintf(stderr, "error: failed to encode model: %d\n", ret);
fprintf(stderr, "error: failed to decode: %d\n", ret);
return 4;
}
@ -104,20 +104,30 @@ int whisper_bench_full(const whisper_params & params) {
// actual run
if (int ret = whisper_encode(ctx, 0, params.n_threads) != 0) {
fprintf(stderr, "error: failed to encode model: %d\n", ret);
fprintf(stderr, "error: failed to encode: %d\n", ret);
return 4;
}
for (int i = 0; i < 16; i++) {
if (int ret = whisper_decode(ctx, tokens, 256, 0, params.n_threads) != 0) {
fprintf(stderr, "error: failed to encode model: %d\n", ret);
// text-generation
for (int i = 0; i < 256; i++) {
if (int ret = whisper_decode(ctx, tokens, 1, i, params.n_threads) != 0) {
fprintf(stderr, "error: failed to decode: %d\n", ret);
return 4;
}
}
for (int i = 0; i < 256; i++) {
if (int ret = whisper_decode(ctx, tokens, 1, i, params.n_threads) != 0) {
fprintf(stderr, "error: failed to encode model: %d\n", ret);
// batched decoding
for (int i = 0; i < 64; i++) {
if (int ret = whisper_decode(ctx, tokens, 5, 0, params.n_threads) != 0) {
fprintf(stderr, "error: failed to decode: %d\n", ret);
return 4;
}
}
// prompt processing
for (int i = 0; i < 16; i++) {
if (int ret = whisper_decode(ctx, tokens, 256, 0, params.n_threads) != 0) {
fprintf(stderr, "error: failed to decode: %d\n", ret);
return 4;
}
}

View File

@ -9,6 +9,7 @@
#include "common-sdl.h"
#include "common.h"
#include "whisper.h"
#include "grammar-parser.h"
#include <sstream>
#include <cassert>
@ -21,6 +22,11 @@
#include <vector>
#include <map>
bool file_exists(const std::string & fname) {
std::ifstream f(fname.c_str());
return f.good();
}
// command-line parameters
struct whisper_params {
int32_t n_threads = std::min(4, (int32_t) std::thread::hardware_concurrency());
@ -30,8 +36,12 @@ struct whisper_params {
int32_t max_tokens = 32;
int32_t audio_ctx = 0;
float vad_thold = 0.6f;
float freq_thold = 100.0f;
float vad_thold = 0.6f;
float freq_thold = 100.0f;
float grammar_penalty = 100.0f;
grammar_parser::parse_state grammar_parsed;
bool speed_up = false;
bool translate = false;
@ -45,6 +55,8 @@ struct whisper_params {
std::string fname_out;
std::string commands;
std::string prompt;
std::string context;
std::string grammar;
};
void whisper_print_usage(int argc, char ** argv, const whisper_params & params);
@ -75,6 +87,9 @@ bool whisper_params_parse(int argc, char ** argv, whisper_params & params) {
else if (arg == "-f" || arg == "--file") { params.fname_out = argv[++i]; }
else if (arg == "-cmd" || arg == "--commands") { params.commands = argv[++i]; }
else if (arg == "-p" || arg == "--prompt") { params.prompt = argv[++i]; }
else if (arg == "-ctx" || arg == "--context") { params.context = argv[++i]; }
else if ( arg == "--grammar") { params.grammar = argv[++i]; }
else if ( arg == "--grammar-penalty") { params.grammar_penalty = std::stof(argv[++i]); }
else {
fprintf(stderr, "error: unknown argument: %s\n", arg.c_str());
whisper_print_usage(argc, argv, params);
@ -109,16 +124,30 @@ void whisper_print_usage(int /*argc*/, char ** argv, const whisper_params & para
fprintf(stderr, " -f FNAME, --file FNAME [%-7s] text output file name\n", params.fname_out.c_str());
fprintf(stderr, " -cmd FNAME, --commands FNAME [%-7s] text file with allowed commands\n", params.commands.c_str());
fprintf(stderr, " -p, --prompt [%-7s] the required activation prompt\n", params.prompt.c_str());
fprintf(stderr, " -ctx, --context [%-7s] sample text to help the transcription\n", params.context.c_str());
fprintf(stderr, " --grammar GRAMMAR [%-7s] GBNF grammar to guide decoding\n", params.grammar.c_str());
fprintf(stderr, " --grammar-penalty N [%-7.1f] scales down logits of nongrammar tokens\n", params.grammar_penalty);
fprintf(stderr, "\n");
}
std::string transcribe(whisper_context * ctx, const whisper_params & params, const std::vector<float> & pcmf32, float & prob, int64_t & t_ms) {
std::string transcribe(
whisper_context * ctx,
const whisper_params & params,
const std::vector<float> & pcmf32,
const std::string & grammar_rule,
float & logprob_min,
float & logprob_sum,
int & n_tokens,
int64_t & t_ms) {
const auto t_start = std::chrono::high_resolution_clock::now();
prob = 0.0f;
logprob_min = 0.0f;
logprob_sum = 0.0f;
n_tokens = 0;
t_ms = 0;
whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
//whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_BEAM_SEARCH);
wparams.print_progress = false;
wparams.print_special = params.print_special;
@ -126,19 +155,41 @@ std::string transcribe(whisper_context * ctx, const whisper_params & params, con
wparams.print_timestamps = !params.no_timestamps;
wparams.translate = params.translate;
wparams.no_context = true;
wparams.no_timestamps = params.no_timestamps;
wparams.single_segment = true;
wparams.max_tokens = params.max_tokens;
wparams.language = params.language.c_str();
wparams.n_threads = params.n_threads;
wparams.audio_ctx = params.audio_ctx;
wparams.speed_up = params.speed_up;
wparams.audio_ctx = params.audio_ctx;
wparams.speed_up = params.speed_up;
wparams.temperature = 0.4f;
wparams.temperature_inc = 1.0f;
wparams.greedy.best_of = 5;
wparams.beam_search.beam_size = 5;
wparams.initial_prompt = params.context.data();
const auto & grammar_parsed = params.grammar_parsed;
auto grammar_rules = grammar_parsed.c_rules();
if (!params.grammar_parsed.rules.empty() && !grammar_rule.empty()) {
if (grammar_parsed.symbol_ids.find(grammar_rule) == grammar_parsed.symbol_ids.end()) {
fprintf(stderr, "%s: warning: grammar rule '%s' not found - skipping grammar sampling\n", __func__, grammar_rule.c_str());
} else {
wparams.grammar_rules = grammar_rules.data();
wparams.n_grammar_rules = grammar_rules.size();
wparams.i_start_rule = grammar_parsed.symbol_ids.at(grammar_rule);
wparams.grammar_penalty = params.grammar_penalty;
}
}
if (whisper_full(ctx, wparams, pcmf32.data(), pcmf32.size()) != 0) {
return "";
}
int prob_n = 0;
std::string result;
const int n_segments = whisper_full_n_segments(ctx);
@ -147,19 +198,17 @@ std::string transcribe(whisper_context * ctx, const whisper_params & params, con
result += text;
const int n_tokens = whisper_full_n_tokens(ctx, i);
for (int j = 0; j < n_tokens; ++j) {
const int n = whisper_full_n_tokens(ctx, i);
for (int j = 0; j < n; ++j) {
const auto token = whisper_full_get_token_data(ctx, i, j);
prob += token.p;
++prob_n;
if(token.plog > 0.0f) exit(0);
logprob_min = std::min(logprob_min, token.plog);
logprob_sum += token.plog;
++n_tokens;
}
}
if (prob_n > 0) {
prob /= prob_n;
}
const auto t_end = std::chrono::high_resolution_clock::now();
t_ms = std::chrono::duration_cast<std::chrono::milliseconds>(t_end - t_start).count();
@ -250,7 +299,7 @@ int process_command_list(struct whisper_context * ctx, audio_async &audio, const
fprintf(stderr, " ]\n");
}
std::string k_prompt = "select one from the available words: ";
std::string k_prompt = "select one from the available words: ";
for (int i = 0; i < (int) allowed_commands.size(); ++i) {
if (i > 0) {
k_prompt += ", ";
@ -418,7 +467,9 @@ int always_prompt_transcription(struct whisper_context * ctx, audio_async & audi
bool is_running = true;
bool ask_prompt = true;
float prob = 0.0f;
float logprob_min = 0.0f;
float logprob_sum = 0.0f;
int n_tokens = 0;
std::vector<float> pcmf32_cur;
@ -456,7 +507,7 @@ int always_prompt_transcription(struct whisper_context * ctx, audio_async & audi
// detect the commands
audio.get(params.command_ms, pcmf32_cur);
const auto txt = ::trim(::transcribe(ctx, params, pcmf32_cur, prob, t_ms));
const auto txt = ::trim(::transcribe(ctx, params, pcmf32_cur, "", logprob_min, logprob_sum, n_tokens, t_ms));
const auto words = get_words(txt);
@ -492,18 +543,27 @@ int always_prompt_transcription(struct whisper_context * ctx, audio_async & audi
// general-purpose mode
// freely transcribe the voice into text
int process_general_transcription(struct whisper_context * ctx, audio_async &audio, const whisper_params &params) {
int process_general_transcription(struct whisper_context * ctx, audio_async & audio, const whisper_params & params) {
bool is_running = true;
bool have_prompt = false;
bool ask_prompt = true;
float prob0 = 0.0f;
float prob = 0.0f;
float logprob_min0 = 0.0f;
float logprob_min = 0.0f;
float logprob_sum0 = 0.0f;
float logprob_sum = 0.0f;
int n_tokens0 = 0;
int n_tokens = 0;
std::vector<float> pcmf32_cur;
std::vector<float> pcmf32_prompt;
const std::string k_prompt = "Ok Whisper, start listening for commands.";
std::string k_prompt = "Ok Whisper, start listening for commands.";
if (!params.prompt.empty()) {
k_prompt = params.prompt;
}
fprintf(stderr, "\n");
fprintf(stderr, "%s: general-purpose mode\n", __func__);
@ -536,9 +596,11 @@ int process_general_transcription(struct whisper_context * ctx, audio_async &aud
// wait for activation phrase
audio.get(params.prompt_ms, pcmf32_cur);
const auto txt = ::trim(::transcribe(ctx, params, pcmf32_cur, prob0, t_ms));
const auto txt = ::trim(::transcribe(ctx, params, pcmf32_cur, "prompt", logprob_min0, logprob_sum0, n_tokens0, t_ms));
fprintf(stdout, "%s: Heard '%s%s%s', (t = %d ms)\n", __func__, "\033[1m", txt.c_str(), "\033[0m", (int) t_ms);
const float p = 100.0f * std::exp(logprob_min0);
fprintf(stdout, "%s: Heard '%s%s%s', (t = %d ms, p = %.2f%%)\n", __func__, "\033[1m", txt.c_str(), "\033[0m", (int) t_ms, p);
const float sim = similarity(txt, k_prompt);
@ -559,19 +621,30 @@ int process_general_transcription(struct whisper_context * ctx, audio_async &aud
// we have heard the activation phrase, now detect the commands
audio.get(params.command_ms, pcmf32_cur);
//printf("len prompt: %.4f\n", pcmf32_prompt.size() / (float) WHISPER_SAMPLE_RATE);
//printf("len command: %.4f\n", pcmf32_cur.size() / (float) WHISPER_SAMPLE_RATE);
// prepend 3 second of silence
pcmf32_cur.insert(pcmf32_cur.begin(), 3.0f*WHISPER_SAMPLE_RATE, 0.0f);
// prepend the prompt audio
pcmf32_cur.insert(pcmf32_cur.begin(), pcmf32_prompt.begin(), pcmf32_prompt.end());
const auto txt = ::trim(::transcribe(ctx, params, pcmf32_cur, prob, t_ms));
const auto txt = ::trim(::transcribe(ctx, params, pcmf32_cur, "root", logprob_min, logprob_sum, n_tokens, t_ms));
prob = 100.0f*(prob - prob0);
//const float p = 100.0f * std::exp((logprob - logprob0) / (n_tokens - n_tokens0));
const float p = 100.0f * std::exp(logprob_min);
//fprintf(stdout, "%s: heard '%s'\n", __func__, txt.c_str());
// find the prompt in the text
float best_sim = 0.0f;
size_t best_len = 0;
for (int n = 0.8*k_prompt.size(); n <= 1.2*k_prompt.size(); ++n) {
for (size_t n = 0.8*k_prompt.size(); n <= 1.2*k_prompt.size(); ++n) {
if (n >= txt.size()) {
break;
}
const auto prompt = txt.substr(0, n);
const float sim = similarity(prompt, k_prompt);
@ -584,9 +657,16 @@ int process_general_transcription(struct whisper_context * ctx, audio_async &aud
}
}
const std::string command = ::trim(txt.substr(best_len));
fprintf(stdout, "%s: DEBUG: txt = '%s', prob = %.2f%%\n", __func__, txt.c_str(), p);
if (best_len == 0) {
fprintf(stdout, "%s: WARNING: command not recognized, try again\n", __func__);
} else {
// cut the prompt from the decoded text
const std::string command = ::trim(txt.substr(best_len));
fprintf(stdout, "%s: Command '%s%s%s', (t = %d ms)\n", __func__, "\033[1m", command.c_str(), "\033[0m", (int) t_ms);
}
fprintf(stdout, "%s: Command '%s%s%s', (t = %d ms)\n", __func__, "\033[1m", command.c_str(), "\033[0m", (int) t_ms);
fprintf(stdout, "\n");
}
@ -654,12 +734,36 @@ int main(int argc, char ** argv) {
int ret_val = 0;
if (!params.commands.empty()) {
ret_val = process_command_list(ctx, audio, params);
} else if (!params.prompt.empty()) {
ret_val = always_prompt_transcription(ctx, audio, params);
} else {
ret_val = process_general_transcription(ctx, audio, params);
if (!params.grammar.empty()) {
auto & grammar = params.grammar_parsed;
if (file_exists(params.grammar.c_str())) {
// read grammar from file
std::ifstream ifs(params.grammar.c_str());
const std::string txt = std::string((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
grammar = grammar_parser::parse(txt.c_str());
} else {
// read grammar from string
grammar = grammar_parser::parse(params.grammar.c_str());
}
// will be empty (default) if there are parse errors
if (grammar.rules.empty()) {
ret_val = 1;
} else {
fprintf(stderr, "%s: grammar:\n", __func__);
grammar_parser::print_grammar(stderr, grammar);
fprintf(stderr, "\n");
}
}
if (ret_val == 0) {
if (!params.commands.empty()) {
ret_val = process_command_list(ctx, audio, params);
} else if (!params.prompt.empty() && params.grammar_parsed.rules.empty()) {
ret_val = always_prompt_transcription(ctx, audio, params);
} else {
ret_val = process_general_transcription(ctx, audio, params);
}
}
audio.pause();

View File

@ -9,6 +9,11 @@ static const std::map<std::string, enum ggml_ftype> GGML_FTYPE_MAP = {
{"q5_0", GGML_FTYPE_MOSTLY_Q5_0},
{"q5_1", GGML_FTYPE_MOSTLY_Q5_1},
{"q8_0", GGML_FTYPE_MOSTLY_Q8_0},
{"q2_k", GGML_FTYPE_MOSTLY_Q2_K},
{"q3_k", GGML_FTYPE_MOSTLY_Q3_K},
{"q4_k", GGML_FTYPE_MOSTLY_Q4_K},
{"q5_k", GGML_FTYPE_MOSTLY_Q5_K},
{"q6_k", GGML_FTYPE_MOSTLY_Q6_K},
};
void ggml_print_ftypes(FILE * fp) {
@ -48,15 +53,15 @@ bool ggml_common_quantize_0(
case GGML_FTYPE_MOSTLY_Q5_0: qtype = GGML_TYPE_Q5_0; break;
case GGML_FTYPE_MOSTLY_Q5_1: qtype = GGML_TYPE_Q5_1; break;
case GGML_FTYPE_MOSTLY_Q8_0: qtype = GGML_TYPE_Q8_0; break;
case GGML_FTYPE_MOSTLY_Q2_K: qtype = GGML_TYPE_Q2_K; break;
case GGML_FTYPE_MOSTLY_Q3_K: qtype = GGML_TYPE_Q3_K; break;
case GGML_FTYPE_MOSTLY_Q4_K: qtype = GGML_TYPE_Q4_K; break;
case GGML_FTYPE_MOSTLY_Q5_K: qtype = GGML_TYPE_Q5_K; break;
case GGML_FTYPE_MOSTLY_Q6_K: qtype = GGML_TYPE_Q6_K; break;
case GGML_FTYPE_UNKNOWN:
case GGML_FTYPE_ALL_F32:
case GGML_FTYPE_MOSTLY_F16:
case GGML_FTYPE_MOSTLY_Q4_1_SOME_F16:
case GGML_FTYPE_MOSTLY_Q2_K:
case GGML_FTYPE_MOSTLY_Q3_K:
case GGML_FTYPE_MOSTLY_Q4_K:
case GGML_FTYPE_MOSTLY_Q5_K:
case GGML_FTYPE_MOSTLY_Q6_K:
{
fprintf(stderr, "%s: invalid model type %d\n", __func__, ftype);
return false;
@ -167,24 +172,17 @@ bool ggml_common_quantize_0(
switch ((ggml_type) ttype) {
case GGML_TYPE_Q4_0:
{
cur_size = ggml_quantize_q4_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());
} break;
case GGML_TYPE_Q4_1:
{
cur_size = ggml_quantize_q4_1(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());
} break;
case GGML_TYPE_Q5_0:
{
cur_size = ggml_quantize_q5_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());
} break;
case GGML_TYPE_Q5_1:
{
cur_size = ggml_quantize_q5_1(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());
} break;
case GGML_TYPE_Q8_0:
case GGML_TYPE_Q2_K:
case GGML_TYPE_Q3_K:
case GGML_TYPE_Q4_K:
case GGML_TYPE_Q5_K:
case GGML_TYPE_Q6_K:
{
cur_size = ggml_quantize_q8_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());
cur_size = ggml_quantize_chunk((ggml_type) ttype, data_f32.data(), work.data(), 0, nelements, hist_cur.data());
} break;
case GGML_TYPE_F32:
case GGML_TYPE_F16:
@ -192,11 +190,6 @@ bool ggml_common_quantize_0(
case GGML_TYPE_I16:
case GGML_TYPE_I32:
case GGML_TYPE_Q8_1:
case GGML_TYPE_Q2_K:
case GGML_TYPE_Q3_K:
case GGML_TYPE_Q4_K:
case GGML_TYPE_Q5_K:
case GGML_TYPE_Q6_K:
case GGML_TYPE_Q8_K:
case GGML_TYPE_COUNT:
{

View File

@ -139,10 +139,13 @@ void audio_async::callback(uint8_t * stream, int len) {
return;
}
const size_t n_samples = len / sizeof(float);
size_t n_samples = len / sizeof(float);
m_audio_new.resize(n_samples);
memcpy(m_audio_new.data(), stream, n_samples * sizeof(float));
if (n_samples > m_audio.size()) {
n_samples = m_audio.size();
stream += (len - (n_samples * sizeof(float)));
}
//fprintf(stderr, "%s: %zu samples, pos %zu, len %zu\n", __func__, n_samples, m_audio_pos, m_audio_len);
@ -153,7 +156,7 @@ void audio_async::callback(uint8_t * stream, int len) {
const size_t n0 = m_audio.size() - m_audio_pos;
memcpy(&m_audio[m_audio_pos], stream, n0 * sizeof(float));
memcpy(&m_audio[0], &stream[n0], (n_samples - n0) * sizeof(float));
memcpy(&m_audio[0], stream + n0 * sizeof(float), (n_samples - n0) * sizeof(float));
m_audio_pos = (m_audio_pos + n_samples) % m_audio.size();
m_audio_len = m_audio.size();

View File

@ -41,7 +41,6 @@ private:
std::mutex m_mutex;
std::vector<float> m_audio;
std::vector<float> m_audio_new;
size_t m_audio_pos = 0;
size_t m_audio_len = 0;
};

View File

@ -181,7 +181,7 @@ private:
// It is assumed that PCM data is normalized to a range from -1 to 1
bool write_audio(const float * data, size_t length) {
for (size_t i = 0; i < length; ++i) {
const auto intSample = static_cast<const int16_t>(data[i] * 32767);
const int16_t intSample = data[i] * 32767;
file.write(reinterpret_cast<const char *>(&intSample), sizeof(int16_t));
dataSize += sizeof(int16_t);
}

423
examples/grammar-parser.cpp Normal file
View File

@ -0,0 +1,423 @@
#include "grammar-parser.h"
#include <cstdint>
#include <cwchar>
#include <string>
#include <utility>
#include <stdexcept>
#include <exception>
namespace grammar_parser {
// NOTE: assumes valid utf8 (but checks for overrun)
// copied from whisper.cpp
std::pair<uint32_t, const char *> decode_utf8(const char * src) {
static const int lookup[] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 3, 4 };
uint8_t first_byte = static_cast<uint8_t>(*src);
uint8_t highbits = first_byte >> 4;
int len = lookup[highbits];
uint8_t mask = (1 << (8 - len)) - 1;
uint32_t value = first_byte & mask;
const char * end = src + len; // may overrun!
const char * pos = src + 1;
for ( ; pos < end && *pos; pos++) {
value = (value << 6) + (static_cast<uint8_t>(*pos) & 0x3F);
}
return std::make_pair(value, pos);
}
uint32_t get_symbol_id(parse_state & state, const char * src, size_t len) {
uint32_t next_id = static_cast<uint32_t>(state.symbol_ids.size());
auto result = state.symbol_ids.insert(std::make_pair(std::string(src, len), next_id));
return result.first->second;
}
uint32_t generate_symbol_id(parse_state & state, const std::string & base_name) {
uint32_t next_id = static_cast<uint32_t>(state.symbol_ids.size());
state.symbol_ids[base_name + '_' + std::to_string(next_id)] = next_id;
return next_id;
}
void add_rule(
parse_state & state,
uint32_t rule_id,
const std::vector<whisper_grammar_element> & rule) {
if (state.rules.size() <= rule_id) {
state.rules.resize(rule_id + 1);
}
state.rules[rule_id] = rule;
}
bool is_word_char(char c) {
return ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z') || c == '-' || ('0' <= c && c <= '9');
}
std::pair<uint32_t, const char *> parse_hex(const char * src, int size) {
const char * pos = src;
const char * end = src + size;
uint32_t value = 0;
for ( ; pos < end && *pos; pos++) {
value <<= 4;
char c = *pos;
if ('a' <= c && c <= 'f') {
value += c - 'a' + 10;
} else if ('A' <= c && c <= 'F') {
value += c - 'A' + 10;
} else if ('0' <= c && c <= '9') {
value += c - '0';
} else {
break;
}
}
if (pos != end) {
throw std::runtime_error("expecting " + std::to_string(size) + " hex chars at " + src);
}
return std::make_pair(value, pos);
}
const char * parse_space(const char * src, bool newline_ok) {
const char * pos = src;
while (*pos == ' ' || *pos == '\t' || *pos == '#' ||
(newline_ok && (*pos == '\r' || *pos == '\n'))) {
if (*pos == '#') {
while (*pos && *pos != '\r' && *pos != '\n') {
pos++;
}
} else {
pos++;
}
}
return pos;
}
const char * parse_name(const char * src) {
const char * pos = src;
while (is_word_char(*pos)) {
pos++;
}
if (pos == src) {
throw std::runtime_error(std::string("expecting name at ") + src);
}
return pos;
}
std::pair<uint32_t, const char *> parse_char(const char * src) {
if (*src == '\\') {
switch (src[1]) {
case 'x': return parse_hex(src + 2, 2);
case 'u': return parse_hex(src + 2, 4);
case 'U': return parse_hex(src + 2, 8);
case 't': return std::make_pair('\t', src + 2);
case 'r': return std::make_pair('\r', src + 2);
case 'n': return std::make_pair('\n', src + 2);
case '\\':
case '"':
case '[':
case ']':
return std::make_pair(src[1], src + 2);
default:
throw std::runtime_error(std::string("unknown escape at ") + src);
}
} else if (*src) {
return decode_utf8(src);
}
throw std::runtime_error("unexpected end of input");
}
const char * parse_alternates(
parse_state & state,
const char * src,
const std::string & rule_name,
uint32_t rule_id,
bool is_nested);
const char * parse_sequence(
parse_state & state,
const char * src,
const std::string & rule_name,
std::vector<whisper_grammar_element> & out_elements,
bool is_nested) {
size_t last_sym_start = out_elements.size();
const char * pos = src;
while (*pos) {
if (*pos == '"') { // literal string
pos++;
last_sym_start = out_elements.size();
while (*pos != '"') {
auto char_pair = parse_char(pos);
pos = char_pair.second;
out_elements.push_back({WHISPER_GRETYPE_CHAR, char_pair.first});
}
pos = parse_space(pos + 1, is_nested);
} else if (*pos == '[') { // char range(s)
pos++;
enum whisper_gretype start_type = WHISPER_GRETYPE_CHAR;
if (*pos == '^') {
pos++;
start_type = WHISPER_GRETYPE_CHAR_NOT;
}
last_sym_start = out_elements.size();
while (*pos != ']') {
auto char_pair = parse_char(pos);
pos = char_pair.second;
enum whisper_gretype type = last_sym_start < out_elements.size()
? WHISPER_GRETYPE_CHAR_ALT
: start_type;
out_elements.push_back({type, char_pair.first});
if (pos[0] == '-' && pos[1] != ']') {
auto endchar_pair = parse_char(pos + 1);
pos = endchar_pair.second;
out_elements.push_back({WHISPER_GRETYPE_CHAR_RNG_UPPER, endchar_pair.first});
}
}
pos = parse_space(pos + 1, is_nested);
} else if (is_word_char(*pos)) { // rule reference
const char * name_end = parse_name(pos);
uint32_t ref_rule_id = get_symbol_id(state, pos, name_end - pos);
pos = parse_space(name_end, is_nested);
last_sym_start = out_elements.size();
out_elements.push_back({WHISPER_GRETYPE_RULE_REF, ref_rule_id});
} else if (*pos == '(') { // grouping
// parse nested alternates into synthesized rule
pos = parse_space(pos + 1, true);
uint32_t sub_rule_id = generate_symbol_id(state, rule_name);
pos = parse_alternates(state, pos, rule_name, sub_rule_id, true);
last_sym_start = out_elements.size();
// output reference to synthesized rule
out_elements.push_back({WHISPER_GRETYPE_RULE_REF, sub_rule_id});
if (*pos != ')') {
throw std::runtime_error(std::string("expecting ')' at ") + pos);
}
pos = parse_space(pos + 1, is_nested);
} else if (*pos == '*' || *pos == '+' || *pos == '?') { // repetition operator
if (last_sym_start == out_elements.size()) {
throw std::runtime_error(std::string("expecting preceeding item to */+/? at ") + pos);
}
// apply transformation to previous symbol (last_sym_start to end) according to
// rewrite rules:
// S* --> S' ::= S S' |
// S+ --> S' ::= S S' | S
// S? --> S' ::= S |
uint32_t sub_rule_id = generate_symbol_id(state, rule_name);
std::vector<whisper_grammar_element> sub_rule;
// add preceding symbol to generated rule
sub_rule.insert(
sub_rule.end(), out_elements.begin() + last_sym_start, out_elements.end());
if (*pos == '*' || *pos == '+') {
// cause generated rule to recurse
sub_rule.push_back({WHISPER_GRETYPE_RULE_REF, sub_rule_id});
}
// mark start of alternate def
sub_rule.push_back({WHISPER_GRETYPE_ALT, 0});
if (*pos == '+') {
// add preceding symbol as alternate only for '+' (otherwise empty)
sub_rule.insert(
sub_rule.end(), out_elements.begin() + last_sym_start, out_elements.end());
}
sub_rule.push_back({WHISPER_GRETYPE_END, 0});
add_rule(state, sub_rule_id, sub_rule);
// in original rule, replace previous symbol with reference to generated rule
out_elements.resize(last_sym_start);
out_elements.push_back({WHISPER_GRETYPE_RULE_REF, sub_rule_id});
pos = parse_space(pos + 1, is_nested);
} else {
break;
}
}
return pos;
}
const char * parse_alternates(
parse_state & state,
const char * src,
const std::string & rule_name,
uint32_t rule_id,
bool is_nested) {
std::vector<whisper_grammar_element> rule;
const char * pos = parse_sequence(state, src, rule_name, rule, is_nested);
while (*pos == '|') {
rule.push_back({WHISPER_GRETYPE_ALT, 0});
pos = parse_space(pos + 1, true);
pos = parse_sequence(state, pos, rule_name, rule, is_nested);
}
rule.push_back({WHISPER_GRETYPE_END, 0});
add_rule(state, rule_id, rule);
return pos;
}
const char * parse_rule(parse_state & state, const char * src) {
const char * name_end = parse_name(src);
const char * pos = parse_space(name_end, false);
size_t name_len = name_end - src;
uint32_t rule_id = get_symbol_id(state, src, name_len);
const std::string name(src, name_len);
if (!(pos[0] == ':' && pos[1] == ':' && pos[2] == '=')) {
throw std::runtime_error(std::string("expecting ::= at ") + pos);
}
pos = parse_space(pos + 3, true);
pos = parse_alternates(state, pos, name, rule_id, false);
if (*pos == '\r') {
pos += pos[1] == '\n' ? 2 : 1;
} else if (*pos == '\n') {
pos++;
} else if (*pos) {
throw std::runtime_error(std::string("expecting newline or end at ") + pos);
}
return parse_space(pos, true);
}
parse_state parse(const char * src) {
try {
parse_state state;
const char * pos = parse_space(src, true);
while (*pos) {
pos = parse_rule(state, pos);
}
return state;
} catch (const std::exception & err) {
fprintf(stderr, "%s: error parsing grammar: %s\n", __func__, err.what());
return parse_state();
}
}
void print_grammar_char(FILE * file, uint32_t c) {
if (0x20 <= c && c <= 0x7f) {
fprintf(file, "%c", static_cast<char>(c));
} else {
// cop out of encoding UTF-8
fprintf(file, "<U+%04X>", c);
}
}
bool is_char_element(whisper_grammar_element elem) {
switch (elem.type) {
case WHISPER_GRETYPE_CHAR: return true;
case WHISPER_GRETYPE_CHAR_NOT: return true;
case WHISPER_GRETYPE_CHAR_ALT: return true;
case WHISPER_GRETYPE_CHAR_RNG_UPPER: return true;
default: return false;
}
}
void print_rule_binary(FILE * file, const std::vector<whisper_grammar_element> & rule) {
for (auto elem : rule) {
switch (elem.type) {
case WHISPER_GRETYPE_END: fprintf(file, "END"); break;
case WHISPER_GRETYPE_ALT: fprintf(file, "ALT"); break;
case WHISPER_GRETYPE_RULE_REF: fprintf(file, "RULE_REF"); break;
case WHISPER_GRETYPE_CHAR: fprintf(file, "CHAR"); break;
case WHISPER_GRETYPE_CHAR_NOT: fprintf(file, "CHAR_NOT"); break;
case WHISPER_GRETYPE_CHAR_RNG_UPPER: fprintf(file, "CHAR_RNG_UPPER"); break;
case WHISPER_GRETYPE_CHAR_ALT: fprintf(file, "CHAR_ALT"); break;
}
switch (elem.type) {
case WHISPER_GRETYPE_END:
case WHISPER_GRETYPE_ALT:
case WHISPER_GRETYPE_RULE_REF:
fprintf(file, "(%u) ", elem.value);
break;
case WHISPER_GRETYPE_CHAR:
case WHISPER_GRETYPE_CHAR_NOT:
case WHISPER_GRETYPE_CHAR_RNG_UPPER:
case WHISPER_GRETYPE_CHAR_ALT:
fprintf(file, "(\"");
print_grammar_char(file, elem.value);
fprintf(file, "\") ");
break;
}
}
fprintf(file, "\n");
}
void print_rule(
FILE * file,
uint32_t rule_id,
const std::vector<whisper_grammar_element> & rule,
const std::map<uint32_t, std::string> & symbol_id_names) {
if (rule.empty() || rule.back().type != WHISPER_GRETYPE_END) {
throw std::runtime_error(
"malformed rule, does not end with WHISPER_GRETYPE_END: " + std::to_string(rule_id));
}
fprintf(file, "%s ::= ", symbol_id_names.at(rule_id).c_str());
for (size_t i = 0, end = rule.size() - 1; i < end; i++) {
whisper_grammar_element elem = rule[i];
switch (elem.type) {
case WHISPER_GRETYPE_END:
throw std::runtime_error(
"unexpected end of rule: " + std::to_string(rule_id) + "," +
std::to_string(i));
case WHISPER_GRETYPE_ALT:
fprintf(file, "| ");
break;
case WHISPER_GRETYPE_RULE_REF:
fprintf(file, "%s ", symbol_id_names.at(elem.value).c_str());
break;
case WHISPER_GRETYPE_CHAR:
fprintf(file, "[");
print_grammar_char(file, elem.value);
break;
case WHISPER_GRETYPE_CHAR_NOT:
fprintf(file, "[^");
print_grammar_char(file, elem.value);
break;
case WHISPER_GRETYPE_CHAR_RNG_UPPER:
if (i == 0 || !is_char_element(rule[i - 1])) {
throw std::runtime_error(
"WHISPER_GRETYPE_CHAR_RNG_UPPER without preceding char: " +
std::to_string(rule_id) + "," + std::to_string(i));
}
fprintf(file, "-");
print_grammar_char(file, elem.value);
break;
case WHISPER_GRETYPE_CHAR_ALT:
if (i == 0 || !is_char_element(rule[i - 1])) {
throw std::runtime_error(
"WHISPER_GRETYPE_CHAR_ALT without preceding char: " +
std::to_string(rule_id) + "," + std::to_string(i));
}
print_grammar_char(file, elem.value);
break;
}
if (is_char_element(elem)) {
switch (rule[i + 1].type) {
case WHISPER_GRETYPE_CHAR_ALT:
case WHISPER_GRETYPE_CHAR_RNG_UPPER:
break;
default:
fprintf(file, "] ");
}
}
}
fprintf(file, "\n");
}
void print_grammar(FILE * file, const parse_state & state) {
try {
std::map<uint32_t, std::string> symbol_id_names;
for (auto kv : state.symbol_ids) {
symbol_id_names[kv.second] = kv.first;
}
for (size_t i = 0, end = state.rules.size(); i < end; i++) {
// fprintf(file, "%zu: ", i);
// print_rule_binary(file, state.rules[i]);
print_rule(file, uint32_t(i), state.rules[i], symbol_id_names);
// fprintf(file, "\n");
}
} catch (const std::exception & err) {
fprintf(stderr, "\n%s: error printing grammar: %s\n", __func__, err.what());
}
}
std::vector<const whisper_grammar_element *> parse_state::c_rules() const{
std::vector<const whisper_grammar_element *> ret;
for (const auto & rule : rules) {
ret.push_back(rule.data());
}
return ret;
}
}

29
examples/grammar-parser.h Normal file
View File

@ -0,0 +1,29 @@
// Implements a parser for an extended Backus-Naur form (BNF), producing the
// binary context-free grammar format specified by whisper.h. Supports character
// ranges, grouping, and repetition operators. As an example, a grammar for
// arithmetic might look like:
//
// root ::= expr
// expr ::= term ([-+*/] term)*
// term ::= num | "(" space expr ")" space
// num ::= [0-9]+ space
// space ::= [ \t\n]*
#pragma once
#include "whisper.h"
#include <vector>
#include <map>
#include <cstdint>
#include <string>
namespace grammar_parser {
struct parse_state {
std::map<std::string, uint32_t> symbol_ids;
std::vector<std::vector<whisper_grammar_element>> rules;
std::vector<const whisper_grammar_element *> c_rules() const;
};
parse_state parse(const char * src);
void print_grammar(FILE * file, const parse_state & state);
}

View File

@ -22,6 +22,7 @@ var printTextarea = (function() {
async function clearCache() {
if (confirm('Are you sure you want to clear the cache?\nAll the models will be downloaded again.')) {
indexedDB.deleteDatabase(dbName);
location.reload();
}
}

View File

@ -48,7 +48,7 @@ if [ -n "$3" ]; then
fi
# Whisper models
models=( "tiny.en" "tiny" "base.en" "base" "small.en" "small" "medium.en" "medium" "large-v1" "large-v2" "large" )
models=( "tiny.en" "tiny" "base.en" "base" "small.en" "small" "medium.en" "medium" "large-v1" "large-v2" "large-v3" )
# list available models
function list_models {

View File

@ -17,28 +17,37 @@ options:
-d N, --duration N [0 ] duration of audio to process in milliseconds
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [5 ] number of best candidates to keep
-bs N, --beam-size N [-1 ] beam size for beam search
-bs N, --beam-size N [5 ] beam size for beam search
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
-su, --speed-up [false ] speed up audio by x2 (reduced accuracy)
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, --translate [false ] translate from source language to english
-di, --diarize [false ] stereo audio diarization
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, --no-fallback [false ] do not use temperature fallback while decoding
-otxt, --output-txt [false ] output result in a text file
-ovtt, --output-vtt [false ] output result in a vtt file
-osrt, --output-srt [false ] output result in a srt file
-olrc, --output-lrc [false ] output result in a lrc file
-owts, --output-words [false ] output script for generating karaoke video
-fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
-ocsv, --output-csv [false ] output result in a CSV file
-oj, --output-json [false ] output result in a JSON file
-ojf, --output-json-full [false ] include more information in the JSON file
-of FNAME, --output-file FNAME [ ] output file path (without file extension)
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
-pp, --print-progress [false ] print progress
-nt, --no-timestamps [true ] do not print timestamps
-nt, --no-timestamps [false ] do not print timestamps
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
-dl, --detect-language [false ] exit after automatically detecting language
--prompt PROMPT [ ] initial prompt
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-f FNAME, --file FNAME [ ] input WAV file path
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
-ls, --log-score [false ] log best decoder scores of tokens
-ng, --no-gpu [false ] disable GPU
```

View File

@ -62,8 +62,8 @@ struct whisper_params {
int32_t progress_step = 5;
int32_t max_context = -1;
int32_t max_len = 0;
int32_t best_of = 2;
int32_t beam_size = -1;
int32_t best_of = whisper_full_default_params(WHISPER_SAMPLING_GREEDY).greedy.best_of;
int32_t beam_size = whisper_full_default_params(WHISPER_SAMPLING_BEAM_SEARCH).beam_search.beam_size;
float word_thold = 0.01f;
float entropy_thold = 2.40f;
@ -165,8 +165,8 @@ bool whisper_params_parse(int argc, char ** argv, whisper_params & params) {
else if (arg == "-m" || arg == "--model") { params.model = argv[++i]; }
else if (arg == "-f" || arg == "--file") { params.fname_inp.emplace_back(argv[++i]); }
else if (arg == "-oved" || arg == "--ov-e-device") { params.openvino_encode_device = argv[++i]; }
else if (arg == "-ls" || arg == "--log-score") { params.log_score = true; }
else if (arg == "-ng" || arg == "--no-gpu") { params.use_gpu = false; }
else if (arg == "-ls" || arg == "--log-score") { params.log_score = true; }
else if (arg == "-ng" || arg == "--no-gpu") { params.use_gpu = false; }
else {
fprintf(stderr, "error: unknown argument: %s\n", arg.c_str());
whisper_print_usage(argc, argv, params);
@ -925,9 +925,9 @@ int main(int argc, char ** argv) {
if (params.detect_language) {
params.language = "auto";
}
fprintf(stderr, "%s: processing '%s' (%d samples, %.1f sec), %d threads, %d processors, lang = %s, task = %s, %stimestamps = %d ...\n",
fprintf(stderr, "%s: processing '%s' (%d samples, %.1f sec), %d threads, %d processors, %d beams + best of %d, lang = %s, task = %s, %stimestamps = %d ...\n",
__func__, fname_inp.c_str(), int(pcmf32.size()), float(pcmf32.size())/WHISPER_SAMPLE_RATE,
params.n_threads, params.n_processors,
params.n_threads, params.n_processors, params.beam_size, params.best_of,
params.language.c_str(),
params.translate ? "translate" : "transcribe",
params.tinydiarize ? "tdrz = 1, " : "",

View File

@ -0,0 +1,12 @@
set(TARGET server)
add_executable(${TARGET} server.cpp httplib.h json.hpp)
include(DefaultTargetOptions)
target_link_libraries(${TARGET} PRIVATE common whisper ${CMAKE_THREAD_LIBS_INIT})
# Check if the compiler is MinGW
if(MINGW)
# Link the necessary libraries for SSL and Winsock
target_link_libraries(${TARGET} PRIVATE -lcrypt32 -lssl -lcrypto -lws2_32)
endif()

68
examples/server/README.md Normal file
View File

@ -0,0 +1,68 @@
# whisper.cpp http server
Simple http server. WAV Files are passed to the inference model via http requests.
https://github.com/ggerganov/whisper.cpp/assets/1991296/e983ee53-8741-4eb5-9048-afe5e4594b8f
## Usage
```
./server -h
usage: ./bin/server [options]
options:
-h, --help [default] show this help message and exit
-t N, --threads N [4 ] number of threads to use during computation
-p N, --processors N [1 ] number of processors to use during computation
-ot N, --offset-t N [0 ] time offset in milliseconds
-on N, --offset-n N [0 ] segment index offset
-d N, --duration N [0 ] duration of audio to process in milliseconds
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [2 ] number of best candidates to keep
-bs N, --beam-size N [-1 ] beam size for beam search
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, --translate [false ] translate from source language to english
-di, --diarize [false ] stereo audio diarization
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, --no-fallback [false ] do not use temperature fallback while decoding
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
-pr, --print-realtime [false ] print output in realtime
-pp, --print-progress [false ] print progress
-nt, --no-timestamps [false ] do not print timestamps
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
-dl, --detect-language [false ] exit after automatically detecting language
--prompt PROMPT [ ] initial prompt
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
--host HOST, [127.0.0.1] Hostname/ip-adress for the server
--port PORT, [8080 ] Port number for the server
--convert, [false ] Convert audio to WAV, requires ffmpeg on the server
```
> [!WARNING]
> **Do not run the server example with administrative privileges and ensure it's operated in a sandbox environment, especially since it involves risky operations like accepting user file uploads and using ffmpeg for format conversions. Always validate and sanitize inputs to guard against potential security threats.**
## request examples
**/inference**
```
curl 127.0.0.1:8080/inference \
-H "Content-Type: multipart/form-data" \
-F file="@<file-path>" \
-F temperature="0.2" \
-F response-format="json"
```
**/load**
```
curl 127.0.0.1:8080/load \
-H "Content-Type: multipart/form-data" \
-F model="<path-to-model-file>"
```

9262
examples/server/httplib.h Normal file

File diff suppressed because it is too large Load Diff

24596
examples/server/json.hpp Normal file

File diff suppressed because it is too large Load Diff

811
examples/server/server.cpp Normal file
View File

@ -0,0 +1,811 @@
#include "common.h"
#include "whisper.h"
#include "httplib.h"
#include "json.hpp"
#include <cmath>
#include <fstream>
#include <cstdio>
#include <string>
#include <thread>
#include <vector>
#include <cstring>
#include <sstream>
#if defined(_MSC_VER)
#pragma warning(disable: 4244 4267) // possible loss of data
#endif
using namespace httplib;
using json = nlohmann::json;
namespace {
// Terminal color map. 10 colors grouped in ranges [0.0, 0.1, ..., 0.9]
// Lowest is red, middle is yellow, highest is green.
const std::vector<std::string> k_colors = {
"\033[38;5;196m", "\033[38;5;202m", "\033[38;5;208m", "\033[38;5;214m", "\033[38;5;220m",
"\033[38;5;226m", "\033[38;5;190m", "\033[38;5;154m", "\033[38;5;118m", "\033[38;5;82m",
};
// output formats
const std::string json_format = "json";
const std::string text_format = "text";
const std::string srt_format = "srt";
const std::string vjson_format = "verbose_json";
const std::string vtt_format = "vtt";
struct server_params
{
std::string hostname = "127.0.0.1";
std::string public_path = "examples/server/public";
int32_t port = 8080;
int32_t read_timeout = 600;
int32_t write_timeout = 600;
bool ffmpeg_converter = false;
};
struct whisper_params {
int32_t n_threads = std::min(4, (int32_t) std::thread::hardware_concurrency());
int32_t n_processors = 1;
int32_t offset_t_ms = 0;
int32_t offset_n = 0;
int32_t duration_ms = 0;
int32_t progress_step = 5;
int32_t max_context = -1;
int32_t max_len = 0;
int32_t best_of = 2;
int32_t beam_size = -1;
float word_thold = 0.01f;
float entropy_thold = 2.40f;
float logprob_thold = -1.00f;
float userdef_temp = 0.20f;
bool speed_up = false;
bool debug_mode = false;
bool translate = false;
bool detect_language = false;
bool diarize = false;
bool tinydiarize = false;
bool split_on_word = false;
bool no_fallback = false;
bool print_special = false;
bool print_colors = false;
bool print_realtime = false;
bool print_progress = false;
bool no_timestamps = false;
bool use_gpu = true;
std::string language = "en";
std::string prompt = "";
std::string font_path = "/System/Library/Fonts/Supplemental/Courier New Bold.ttf";
std::string model = "models/ggml-base.en.bin";
std::string response_format = json_format;
// [TDRZ] speaker turn string
std::string tdrz_speaker_turn = " [SPEAKER_TURN]"; // TODO: set from command line
std::string openvino_encode_device = "CPU";
};
// 500 -> 00:05.000
// 6000 -> 01:00.000
std::string to_timestamp(int64_t t, bool comma = false) {
int64_t msec = t * 10;
int64_t hr = msec / (1000 * 60 * 60);
msec = msec - hr * (1000 * 60 * 60);
int64_t min = msec / (1000 * 60);
msec = msec - min * (1000 * 60);
int64_t sec = msec / 1000;
msec = msec - sec * 1000;
char buf[32];
snprintf(buf, sizeof(buf), "%02d:%02d:%02d%s%03d", (int) hr, (int) min, (int) sec, comma ? "," : ".", (int) msec);
return std::string(buf);
}
int timestamp_to_sample(int64_t t, int n_samples) {
return std::max(0, std::min((int) n_samples - 1, (int) ((t*WHISPER_SAMPLE_RATE)/100)));
}
bool is_file_exist(const char *fileName)
{
std::ifstream infile(fileName);
return infile.good();
}
void whisper_print_usage(int /*argc*/, char ** argv, const whisper_params & params,
const server_params& sparams) {
fprintf(stderr, "\n");
fprintf(stderr, "usage: %s [options] \n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "options:\n");
fprintf(stderr, " -h, --help [default] show this help message and exit\n");
fprintf(stderr, " -t N, --threads N [%-7d] number of threads to use during computation\n", params.n_threads);
fprintf(stderr, " -p N, --processors N [%-7d] number of processors to use during computation\n", params.n_processors);
fprintf(stderr, " -ot N, --offset-t N [%-7d] time offset in milliseconds\n", params.offset_t_ms);
fprintf(stderr, " -on N, --offset-n N [%-7d] segment index offset\n", params.offset_n);
fprintf(stderr, " -d N, --duration N [%-7d] duration of audio to process in milliseconds\n", params.duration_ms);
fprintf(stderr, " -mc N, --max-context N [%-7d] maximum number of text context tokens to store\n", params.max_context);
fprintf(stderr, " -ml N, --max-len N [%-7d] maximum segment length in characters\n", params.max_len);
fprintf(stderr, " -sow, --split-on-word [%-7s] split on word rather than on token\n", params.split_on_word ? "true" : "false");
fprintf(stderr, " -bo N, --best-of N [%-7d] number of best candidates to keep\n", params.best_of);
fprintf(stderr, " -bs N, --beam-size N [%-7d] beam size for beam search\n", params.beam_size);
fprintf(stderr, " -wt N, --word-thold N [%-7.2f] word timestamp probability threshold\n", params.word_thold);
fprintf(stderr, " -et N, --entropy-thold N [%-7.2f] entropy threshold for decoder fail\n", params.entropy_thold);
fprintf(stderr, " -lpt N, --logprob-thold N [%-7.2f] log probability threshold for decoder fail\n", params.logprob_thold);
// fprintf(stderr, " -su, --speed-up [%-7s] speed up audio by x2 (reduced accuracy)\n", params.speed_up ? "true" : "false");
fprintf(stderr, " -debug, --debug-mode [%-7s] enable debug mode (eg. dump log_mel)\n", params.debug_mode ? "true" : "false");
fprintf(stderr, " -tr, --translate [%-7s] translate from source language to english\n", params.translate ? "true" : "false");
fprintf(stderr, " -di, --diarize [%-7s] stereo audio diarization\n", params.diarize ? "true" : "false");
fprintf(stderr, " -tdrz, --tinydiarize [%-7s] enable tinydiarize (requires a tdrz model)\n", params.tinydiarize ? "true" : "false");
fprintf(stderr, " -nf, --no-fallback [%-7s] do not use temperature fallback while decoding\n", params.no_fallback ? "true" : "false");
fprintf(stderr, " -ps, --print-special [%-7s] print special tokens\n", params.print_special ? "true" : "false");
fprintf(stderr, " -pc, --print-colors [%-7s] print colors\n", params.print_colors ? "true" : "false");
fprintf(stderr, " -pr, --print-realtime [%-7s] print output in realtime\n", params.print_realtime ? "true" : "false");
fprintf(stderr, " -pp, --print-progress [%-7s] print progress\n", params.print_progress ? "true" : "false");
fprintf(stderr, " -nt, --no-timestamps [%-7s] do not print timestamps\n", params.no_timestamps ? "true" : "false");
fprintf(stderr, " -l LANG, --language LANG [%-7s] spoken language ('auto' for auto-detect)\n", params.language.c_str());
fprintf(stderr, " -dl, --detect-language [%-7s] exit after automatically detecting language\n", params.detect_language ? "true" : "false");
fprintf(stderr, " --prompt PROMPT [%-7s] initial prompt\n", params.prompt.c_str());
fprintf(stderr, " -m FNAME, --model FNAME [%-7s] model path\n", params.model.c_str());
fprintf(stderr, " -oved D, --ov-e-device DNAME [%-7s] the OpenVINO device used for encode inference\n", params.openvino_encode_device.c_str());
// server params
fprintf(stderr, " --host HOST, [%-7s] Hostname/ip-adress for the server\n", sparams.hostname.c_str());
fprintf(stderr, " --port PORT, [%-7d] Port number for the server\n", sparams.port);
fprintf(stderr, " --public PATH, [%-7s] Path to the public folder\n", sparams.public_path.c_str());
fprintf(stderr, " --convert, [%-7s] Convert audio to WAV, requires ffmpeg on the server", sparams.ffmpeg_converter ? "true" : "false");
fprintf(stderr, "\n");
}
bool whisper_params_parse(int argc, char ** argv, whisper_params & params, server_params & sparams) {
for (int i = 1; i < argc; i++) {
std::string arg = argv[i];
if (arg == "-h" || arg == "--help") {
whisper_print_usage(argc, argv, params, sparams);
exit(0);
}
else if (arg == "-t" || arg == "--threads") { params.n_threads = std::stoi(argv[++i]); }
else if (arg == "-p" || arg == "--processors") { params.n_processors = std::stoi(argv[++i]); }
else if (arg == "-ot" || arg == "--offset-t") { params.offset_t_ms = std::stoi(argv[++i]); }
else if (arg == "-on" || arg == "--offset-n") { params.offset_n = std::stoi(argv[++i]); }
else if (arg == "-d" || arg == "--duration") { params.duration_ms = std::stoi(argv[++i]); }
else if (arg == "-mc" || arg == "--max-context") { params.max_context = std::stoi(argv[++i]); }
else if (arg == "-ml" || arg == "--max-len") { params.max_len = std::stoi(argv[++i]); }
else if (arg == "-bo" || arg == "--best-of") { params.best_of = std::stoi(argv[++i]); }
else if (arg == "-bs" || arg == "--beam-size") { params.beam_size = std::stoi(argv[++i]); }
else if (arg == "-wt" || arg == "--word-thold") { params.word_thold = std::stof(argv[++i]); }
else if (arg == "-et" || arg == "--entropy-thold") { params.entropy_thold = std::stof(argv[++i]); }
else if (arg == "-lpt" || arg == "--logprob-thold") { params.logprob_thold = std::stof(argv[++i]); }
// else if (arg == "-su" || arg == "--speed-up") { params.speed_up = true; }
else if (arg == "-debug"|| arg == "--debug-mode") { params.debug_mode = true; }
else if (arg == "-tr" || arg == "--translate") { params.translate = true; }
else if (arg == "-di" || arg == "--diarize") { params.diarize = true; }
else if (arg == "-tdrz" || arg == "--tinydiarize") { params.tinydiarize = true; }
else if (arg == "-sow" || arg == "--split-on-word") { params.split_on_word = true; }
else if (arg == "-nf" || arg == "--no-fallback") { params.no_fallback = true; }
else if (arg == "-fp" || arg == "--font-path") { params.font_path = argv[++i]; }
else if (arg == "-ps" || arg == "--print-special") { params.print_special = true; }
else if (arg == "-pc" || arg == "--print-colors") { params.print_colors = true; }
else if (arg == "-pr" || arg == "--print-realtime") { params.print_realtime = true; }
else if (arg == "-pp" || arg == "--print-progress") { params.print_progress = true; }
else if (arg == "-nt" || arg == "--no-timestamps") { params.no_timestamps = true; }
else if (arg == "-l" || arg == "--language") { params.language = argv[++i]; }
else if (arg == "-dl" || arg == "--detect-language") { params.detect_language = true; }
else if ( arg == "--prompt") { params.prompt = argv[++i]; }
else if (arg == "-m" || arg == "--model") { params.model = argv[++i]; }
else if (arg == "-oved" || arg == "--ov-e-device") { params.openvino_encode_device = argv[++i]; }
else if (arg == "-ng" || arg == "--no-gpu") { params.use_gpu = false; }
// server params
else if ( arg == "--port") { sparams.port = std::stoi(argv[++i]); }
else if ( arg == "--host") { sparams.hostname = argv[++i]; }
else if ( arg == "--public") { sparams.public_path = argv[++i]; }
else if ( arg == "--convert") { sparams.ffmpeg_converter = true; }
else {
fprintf(stderr, "error: unknown argument: %s\n", arg.c_str());
whisper_print_usage(argc, argv, params, sparams);
exit(0);
}
}
return true;
}
struct whisper_print_user_data {
const whisper_params * params;
const std::vector<std::vector<float>> * pcmf32s;
int progress_prev;
};
void check_ffmpeg_availibility() {
int result = system("ffmpeg -version");
if (result == 0) {
std::cout << "ffmpeg is available." << std::endl;
} else {
// ffmpeg is not available
std::cout << "ffmpeg is not found. Please ensure that ffmpeg is installed ";
std::cout << "and that its executable is included in your system's PATH. ";
exit(0);
}
}
bool convert_to_wav(const std::string & temp_filename, std::string & error_resp) {
std::ostringstream cmd_stream;
std::string converted_filename_temp = temp_filename + "_temp.wav";
cmd_stream << "ffmpeg -i \"" << temp_filename << "\" -ar 16000 -ac 1 -c:a pcm_s16le \"" << converted_filename_temp << "\" 2>&1";
std::string cmd = cmd_stream.str();
int status = std::system(cmd.c_str());
if (status != 0) {
error_resp = "{\"error\":\"FFmpeg conversion failed.\"}";
return false;
}
// Remove the original file
if (remove(temp_filename.c_str()) != 0) {
error_resp = "{\"error\":\"Failed to remove the original file.\"}";
return false;
}
// Rename the temporary file to match the original filename
if (rename(converted_filename_temp.c_str(), temp_filename.c_str()) != 0) {
error_resp = "{\"error\":\"Failed to rename the temporary file.\"}";
return false;
}
return true;
}
std::string estimate_diarization_speaker(std::vector<std::vector<float>> pcmf32s, int64_t t0, int64_t t1, bool id_only = false) {
std::string speaker = "";
const int64_t n_samples = pcmf32s[0].size();
const int64_t is0 = timestamp_to_sample(t0, n_samples);
const int64_t is1 = timestamp_to_sample(t1, n_samples);
double energy0 = 0.0f;
double energy1 = 0.0f;
for (int64_t j = is0; j < is1; j++) {
energy0 += fabs(pcmf32s[0][j]);
energy1 += fabs(pcmf32s[1][j]);
}
if (energy0 > 1.1*energy1) {
speaker = "0";
} else if (energy1 > 1.1*energy0) {
speaker = "1";
} else {
speaker = "?";
}
//printf("is0 = %lld, is1 = %lld, energy0 = %f, energy1 = %f, speaker = %s\n", is0, is1, energy0, energy1, speaker.c_str());
if (!id_only) {
speaker.insert(0, "(speaker ");
speaker.append(")");
}
return speaker;
}
void whisper_print_progress_callback(struct whisper_context * /*ctx*/, struct whisper_state * /*state*/, int progress, void * user_data) {
int progress_step = ((whisper_print_user_data *) user_data)->params->progress_step;
int * progress_prev = &(((whisper_print_user_data *) user_data)->progress_prev);
if (progress >= *progress_prev + progress_step) {
*progress_prev += progress_step;
fprintf(stderr, "%s: progress = %3d%%\n", __func__, progress);
}
}
void whisper_print_segment_callback(struct whisper_context * ctx, struct whisper_state * /*state*/, int n_new, void * user_data) {
const auto & params = *((whisper_print_user_data *) user_data)->params;
const auto & pcmf32s = *((whisper_print_user_data *) user_data)->pcmf32s;
const int n_segments = whisper_full_n_segments(ctx);
std::string speaker = "";
int64_t t0 = 0;
int64_t t1 = 0;
// print the last n_new segments
const int s0 = n_segments - n_new;
if (s0 == 0) {
printf("\n");
}
for (int i = s0; i < n_segments; i++) {
if (!params.no_timestamps || params.diarize) {
t0 = whisper_full_get_segment_t0(ctx, i);
t1 = whisper_full_get_segment_t1(ctx, i);
}
if (!params.no_timestamps) {
printf("[%s --> %s] ", to_timestamp(t0).c_str(), to_timestamp(t1).c_str());
}
if (params.diarize && pcmf32s.size() == 2) {
speaker = estimate_diarization_speaker(pcmf32s, t0, t1);
}
if (params.print_colors) {
for (int j = 0; j < whisper_full_n_tokens(ctx, i); ++j) {
if (params.print_special == false) {
const whisper_token id = whisper_full_get_token_id(ctx, i, j);
if (id >= whisper_token_eot(ctx)) {
continue;
}
}
const char * text = whisper_full_get_token_text(ctx, i, j);
const float p = whisper_full_get_token_p (ctx, i, j);
const int col = std::max(0, std::min((int) k_colors.size() - 1, (int) (std::pow(p, 3)*float(k_colors.size()))));
printf("%s%s%s%s", speaker.c_str(), k_colors[col].c_str(), text, "\033[0m");
}
} else {
const char * text = whisper_full_get_segment_text(ctx, i);
printf("%s%s", speaker.c_str(), text);
}
if (params.tinydiarize) {
if (whisper_full_get_segment_speaker_turn_next(ctx, i)) {
printf("%s", params.tdrz_speaker_turn.c_str());
}
}
// with timestamps or speakers: each segment on new line
if (!params.no_timestamps || params.diarize) {
printf("\n");
}
fflush(stdout);
}
}
std::string output_str(struct whisper_context * ctx, const whisper_params & params, std::vector<std::vector<float>> pcmf32s) {
std::stringstream result;
const int n_segments = whisper_full_n_segments(ctx);
for (int i = 0; i < n_segments; ++i) {
const char * text = whisper_full_get_segment_text(ctx, i);
std::string speaker = "";
if (params.diarize && pcmf32s.size() == 2)
{
const int64_t t0 = whisper_full_get_segment_t0(ctx, i);
const int64_t t1 = whisper_full_get_segment_t1(ctx, i);
speaker = estimate_diarization_speaker(pcmf32s, t0, t1);
}
result << speaker << text << "\n";
}
return result.str();
}
void get_req_parameters(const Request & req, whisper_params & params)
{
// user model configu.has_fileion
if (req.has_file("offset-t"))
{
params.offset_t_ms = std::stoi(req.get_file_value("offset-t").content);
}
if (req.has_file("offset-n"))
{
params.offset_n = std::stoi(req.get_file_value("offset-n").content);
}
if (req.has_file("duration"))
{
params.duration_ms = std::stoi(req.get_file_value("duration").content);
}
if (req.has_file("max-context"))
{
params.max_context = std::stoi(req.get_file_value("max-context").content);
}
if (req.has_file("prompt"))
{
params.prompt = req.get_file_value("prompt").content;
}
if (req.has_file("response-format"))
{
params.response_format = req.get_file_value("response-format").content;
}
if (req.has_file("temperature"))
{
params.userdef_temp = std::stof(req.get_file_value("temperature").content);
}
}
} // namespace
int main(int argc, char ** argv) {
whisper_params params;
server_params sparams;
std::mutex whisper_mutex;
if (whisper_params_parse(argc, argv, params, sparams) == false) {
whisper_print_usage(argc, argv, params, sparams);
return 1;
}
if (params.language != "auto" && whisper_lang_id(params.language.c_str()) == -1) {
fprintf(stderr, "error: unknown language '%s'\n", params.language.c_str());
whisper_print_usage(argc, argv, params, sparams);
exit(0);
}
if (params.diarize && params.tinydiarize) {
fprintf(stderr, "error: cannot use both --diarize and --tinydiarize\n");
whisper_print_usage(argc, argv, params, sparams);
exit(0);
}
if (sparams.ffmpeg_converter) {
check_ffmpeg_availibility();
}
// whisper init
struct whisper_context_params cparams;
cparams.use_gpu = params.use_gpu;
struct whisper_context * ctx = whisper_init_from_file_with_params(params.model.c_str(), cparams);
if (ctx == nullptr) {
fprintf(stderr, "error: failed to initialize whisper context\n");
return 3;
}
// initialize openvino encoder. this has no effect on whisper.cpp builds that don't have OpenVINO configured
whisper_ctx_init_openvino_encoder(ctx, nullptr, params.openvino_encode_device.c_str(), nullptr);
Server svr;
svr.set_default_headers({{"Server", "whisper.cpp"},
{"Access-Control-Allow-Origin", "*"},
{"Access-Control-Allow-Headers", "content-type"}});
std::string const default_content = "<html>hello</html>";
// this is only called if no index.html is found in the public --path
svr.Get("/", [&default_content](const Request &, Response &res){
res.set_content(default_content, "text/html");
return false;
});
svr.Post("/inference", [&](const Request &req, Response &res){
// acquire whisper model mutex lock
whisper_mutex.lock();
// first check user requested fields of the request
if (!req.has_file("file"))
{
fprintf(stderr, "error: no 'file' field in the request\n");
const std::string error_resp = "{\"error\":\"no 'file' field in the request\"}";
res.set_content(error_resp, "application/json");
whisper_mutex.unlock();
return;
}
auto audio_file = req.get_file_value("file");
// check non-required fields
get_req_parameters(req, params);
std::string filename{audio_file.filename};
printf("Received request: %s\n", filename.c_str());
// audio arrays
std::vector<float> pcmf32; // mono-channel F32 PCM
std::vector<std::vector<float>> pcmf32s; // stereo-channel F32 PCM
// write to temporary file
const std::string temp_filename = "whisper_server_temp_file.wav";
std::ofstream temp_file{temp_filename, std::ios::binary};
temp_file << audio_file.content;
temp_file.close();
// if file is not wav, convert to wav
if (sparams.ffmpeg_converter) {
std::string error_resp = "{\"error\":\"Failed to execute ffmpeg command.\"}";
const bool is_converted = convert_to_wav(temp_filename, error_resp);
if (!is_converted) {
res.set_content(error_resp, "application/json");
whisper_mutex.unlock();
return;
}
}
// read wav content into pcmf32
if (!::read_wav(temp_filename, pcmf32, pcmf32s, params.diarize)) {
fprintf(stderr, "error: failed to read WAV file '%s'\n", temp_filename.c_str());
const std::string error_resp = "{\"error\":\"failed to read WAV file\"}";
res.set_content(error_resp, "application/json");
std::remove(temp_filename.c_str());
whisper_mutex.unlock();
return;
}
// remove temp file
std::remove(temp_filename.c_str());
printf("Successfully loaded %s\n", filename.c_str());
// print system information
{
fprintf(stderr, "\n");
fprintf(stderr, "system_info: n_threads = %d / %d | %s\n",
params.n_threads*params.n_processors, std::thread::hardware_concurrency(), whisper_print_system_info());
}
// print some info about the processing
{
fprintf(stderr, "\n");
if (!whisper_is_multilingual(ctx)) {
if (params.language != "en" || params.translate) {
params.language = "en";
params.translate = false;
fprintf(stderr, "%s: WARNING: model is not multilingual, ignoring language and translation options\n", __func__);
}
}
if (params.detect_language) {
params.language = "auto";
}
fprintf(stderr, "%s: processing '%s' (%d samples, %.1f sec), %d threads, %d processors, lang = %s, task = %s, %stimestamps = %d ...\n",
__func__, filename.c_str(), int(pcmf32.size()), float(pcmf32.size())/WHISPER_SAMPLE_RATE,
params.n_threads, params.n_processors,
params.language.c_str(),
params.translate ? "translate" : "transcribe",
params.tinydiarize ? "tdrz = 1, " : "",
params.no_timestamps ? 0 : 1);
fprintf(stderr, "\n");
}
// run the inference
{
printf("Running whisper.cpp inference on %s\n", filename.c_str());
whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
wparams.strategy = params.beam_size > 1 ? WHISPER_SAMPLING_BEAM_SEARCH : WHISPER_SAMPLING_GREEDY;
wparams.print_realtime = false;
wparams.print_progress = params.print_progress;
wparams.print_timestamps = !params.no_timestamps;
wparams.print_special = params.print_special;
wparams.translate = params.translate;
wparams.language = params.language.c_str();
wparams.detect_language = params.detect_language;
wparams.n_threads = params.n_threads;
wparams.n_max_text_ctx = params.max_context >= 0 ? params.max_context : wparams.n_max_text_ctx;
wparams.offset_ms = params.offset_t_ms;
wparams.duration_ms = params.duration_ms;
wparams.thold_pt = params.word_thold;
wparams.max_len = params.max_len == 0 ? 60 : params.max_len;
wparams.split_on_word = params.split_on_word;
wparams.speed_up = params.speed_up;
wparams.debug_mode = params.debug_mode;
wparams.tdrz_enable = params.tinydiarize; // [TDRZ]
wparams.initial_prompt = params.prompt.c_str();
wparams.greedy.best_of = params.best_of;
wparams.beam_search.beam_size = params.beam_size;
wparams.temperature_inc = params.userdef_temp;
wparams.entropy_thold = params.entropy_thold;
wparams.logprob_thold = params.logprob_thold;
whisper_print_user_data user_data = { &params, &pcmf32s, 0 };
// this callback is called on each new segment
if (params.print_realtime) {
wparams.new_segment_callback = whisper_print_segment_callback;
wparams.new_segment_callback_user_data = &user_data;
}
if (wparams.print_progress) {
wparams.progress_callback = whisper_print_progress_callback;
wparams.progress_callback_user_data = &user_data;
}
// examples for abort mechanism
// in examples below, we do not abort the processing, but we could if the flag is set to true
// the callback is called before every encoder run - if it returns false, the processing is aborted
{
static bool is_aborted = false; // NOTE: this should be atomic to avoid data race
wparams.encoder_begin_callback = [](struct whisper_context * /*ctx*/, struct whisper_state * /*state*/, void * user_data) {
bool is_aborted = *(bool*)user_data;
return !is_aborted;
};
wparams.encoder_begin_callback_user_data = &is_aborted;
}
// the callback is called before every computation - if it returns true, the computation is aborted
{
static bool is_aborted = false; // NOTE: this should be atomic to avoid data race
wparams.abort_callback = [](void * user_data) {
bool is_aborted = *(bool*)user_data;
return is_aborted;
};
wparams.abort_callback_user_data = &is_aborted;
}
if (whisper_full_parallel(ctx, wparams, pcmf32.data(), pcmf32.size(), params.n_processors) != 0) {
fprintf(stderr, "%s: failed to process audio\n", argv[0]);
const std::string error_resp = "{\"error\":\"failed to process audio\"}";
res.set_content(error_resp, "application/json");
whisper_mutex.unlock();
return;
}
}
// return results to user
if (params.response_format == text_format)
{
std::string results = output_str(ctx, params, pcmf32s);
res.set_content(results.c_str(), "text/html");
}
else if (params.response_format == srt_format)
{
std::stringstream ss;
const int n_segments = whisper_full_n_segments(ctx);
for (int i = 0; i < n_segments; ++i) {
const char * text = whisper_full_get_segment_text(ctx, i);
const int64_t t0 = whisper_full_get_segment_t0(ctx, i);
const int64_t t1 = whisper_full_get_segment_t1(ctx, i);
std::string speaker = "";
if (params.diarize && pcmf32s.size() == 2)
{
speaker = estimate_diarization_speaker(pcmf32s, t0, t1);
}
ss << i + 1 + params.offset_n << "\n";
ss << to_timestamp(t0, true) << " --> " << to_timestamp(t1, true) << "\n";
ss << speaker << text << "\n\n";
}
res.set_content(ss.str(), "application/x-subrip");
} else if (params.response_format == vtt_format) {
std::stringstream ss;
ss << "WEBVTT\n\n";
const int n_segments = whisper_full_n_segments(ctx);
for (int i = 0; i < n_segments; ++i) {
const char * text = whisper_full_get_segment_text(ctx, i);
const int64_t t0 = whisper_full_get_segment_t0(ctx, i);
const int64_t t1 = whisper_full_get_segment_t1(ctx, i);
std::string speaker = "";
if (params.diarize && pcmf32s.size() == 2)
{
speaker = estimate_diarization_speaker(pcmf32s, t0, t1, true);
speaker.insert(0, "<v Speaker");
speaker.append(">");
}
ss << to_timestamp(t0) << " --> " << to_timestamp(t1) << "\n";
ss << speaker << text << "\n\n";
}
res.set_content(ss.str(), "text/vtt");
}
// TODO add more output formats
else
{
std::string results = output_str(ctx, params, pcmf32s);
json jres = json{
{"text", results}
};
res.set_content(jres.dump(-1, ' ', false, json::error_handler_t::replace),
"application/json");
}
// return whisper model mutex lock
whisper_mutex.unlock();
});
svr.Post("/load", [&](const Request &req, Response &res){
whisper_mutex.lock();
if (!req.has_file("model"))
{
fprintf(stderr, "error: no 'model' field in the request\n");
const std::string error_resp = "{\"error\":\"no 'model' field in the request\"}";
res.set_content(error_resp, "application/json");
whisper_mutex.unlock();
return;
}
std::string model = req.get_file_value("model").content;
if (!is_file_exist(model.c_str()))
{
fprintf(stderr, "error: 'model': %s not found!\n", model.c_str());
const std::string error_resp = "{\"error\":\"model not found!\"}";
res.set_content(error_resp, "application/json");
whisper_mutex.unlock();
return;
}
// clean up
whisper_free(ctx);
// whisper init
ctx = whisper_init_from_file_with_params(model.c_str(), cparams);
// TODO perhaps load prior model here instead of exit
if (ctx == nullptr) {
fprintf(stderr, "error: model init failed, no model loaded must exit\n");
exit(1);
}
// initialize openvino encoder. this has no effect on whisper.cpp builds that don't have OpenVINO configured
whisper_ctx_init_openvino_encoder(ctx, nullptr, params.openvino_encode_device.c_str(), nullptr);
const std::string success = "Load was successful!";
res.set_content(success, "application/text");
// check if the model is in the file system
whisper_mutex.unlock();
});
svr.set_exception_handler([](const Request &, Response &res, std::exception_ptr ep) {
const char fmt[] = "500 Internal Server Error\n%s";
char buf[BUFSIZ];
try {
std::rethrow_exception(std::move(ep));
} catch (std::exception &e) {
snprintf(buf, sizeof(buf), fmt, e.what());
} catch (...) {
snprintf(buf, sizeof(buf), fmt, "Unknown Exception");
}
res.set_content(buf, "text/plain");
res.status = 500;
});
svr.set_error_handler([](const Request &, Response &res) {
if (res.status == 400) {
res.set_content("Invalid request", "text/plain");
} else if (res.status != 500) {
res.set_content("File Not Found", "text/plain");
res.status = 404;
}
});
// set timeouts and change hostname and port
svr.set_read_timeout(sparams.read_timeout);
svr.set_write_timeout(sparams.write_timeout);
if (!svr.bind_to_port(sparams.hostname, sparams.port))
{
fprintf(stderr, "\ncouldn't bind to server socket: hostname=%s port=%d\n\n",
sparams.hostname.c_str(), sparams.port);
return 1;
}
// Set the base directory for serving static files
svr.set_base_dir(sparams.public_path);
// to make it ctrl+clickable:
printf("\nwhisper server listening at http://%s:%d\n\n", sparams.hostname.c_str(), sparams.port);
if (!svr.listen_after_bind())
{
return 1;
}
whisper_print_timings(ctx);
whisper_free(ctx);
return 0;
}

View File

@ -18,6 +18,11 @@ if (WHISPER_SDL2)
../../ggml-quants.c
../../whisper.cpp)
if(WIN32)
# It requires Windows 8.1 or later for PrefetchVirtualMemory
target_compile_definitions(${TARGET} PRIVATE -D_WIN32_WINNT=0x0602)
endif()
target_include_directories(${TARGET} PRIVATE ${SDL2_INCLUDE_DIRS} ../../)
target_link_libraries(${TARGET} PRIVATE ${SDL2_LIBRARIES} ${CMAKE_THREAD_LIBS_INIT})

View File

@ -53,6 +53,7 @@ struct whisper_params {
int32_t capture_id = -1;
int32_t max_tokens = 32;
int32_t audio_ctx = 0;
int32_t n_gpu_layers = 999;
float vad_thold = 0.6f;
float freq_thold = 100.0f;
@ -90,6 +91,7 @@ bool whisper_params_parse(int argc, char ** argv, whisper_params & params) {
else if (arg == "-c" || arg == "--capture") { params.capture_id = std::stoi(argv[++i]); }
else if (arg == "-mt" || arg == "--max-tokens") { params.max_tokens = std::stoi(argv[++i]); }
else if (arg == "-ac" || arg == "--audio-ctx") { params.audio_ctx = std::stoi(argv[++i]); }
else if (arg == "-ngl" || arg == "--n-gpu-layers") { params.n_gpu_layers = std::stoi(argv[++i]); }
else if (arg == "-vth" || arg == "--vad-thold") { params.vad_thold = std::stof(argv[++i]); }
else if (arg == "-fth" || arg == "--freq-thold") { params.freq_thold = std::stof(argv[++i]); }
else if (arg == "-su" || arg == "--speed-up") { params.speed_up = true; }
@ -134,6 +136,7 @@ void whisper_print_usage(int /*argc*/, char ** argv, const whisper_params & para
fprintf(stderr, " -c ID, --capture ID [%-7d] capture device ID\n", params.capture_id);
fprintf(stderr, " -mt N, --max-tokens N [%-7d] maximum number of tokens per audio chunk\n", params.max_tokens);
fprintf(stderr, " -ac N, --audio-ctx N [%-7d] audio context size (0 - all)\n", params.audio_ctx);
fprintf(stderr, " -ngl N, --n-gpu-layers N [%-7d] number of layers to store in VRAM\n", params.n_gpu_layers);
fprintf(stderr, " -vth N, --vad-thold N [%-7.2f] voice activity detection threshold\n", params.vad_thold);
fprintf(stderr, " -fth N, --freq-thold N [%-7.2f] high-pass frequency cutoff\n", params.freq_thold);
fprintf(stderr, " -su, --speed-up [%-7s] speed up audio by x2 (reduced accuracy)\n", params.speed_up ? "true" : "false");
@ -248,7 +251,7 @@ int main(int argc, char ** argv) {
return 1;
}
if (whisper_lang_id(params.language.c_str()) == -1) {
if (params.language != "auto" && whisper_lang_id(params.language.c_str()) == -1) {
fprintf(stderr, "error: unknown language '%s'\n", params.language.c_str());
whisper_print_usage(argc, argv, params);
exit(0);
@ -268,6 +271,8 @@ int main(int argc, char ** argv) {
auto lmparams = llama_model_default_params();
if (!params.use_gpu) {
lmparams.n_gpu_layers = 0;
} else {
lmparams.n_gpu_layers = params.n_gpu_layers;
}
struct llama_model * model_llama = llama_load_model_from_file(params.model_llama.c_str(), lmparams);
@ -681,8 +686,8 @@ int main(int argc, char ** argv) {
}
}
text_to_speak = ::replace(text_to_speak, "\"", "");
int ret = system((params.speak + " " + std::to_string(voice_id) + " \"" + text_to_speak + "\"").c_str());
text_to_speak = ::replace(text_to_speak, "'", "'\"'\"'");
int ret = system((params.speak + " " + std::to_string(voice_id) + " '" + text_to_speak + "'").c_str());
if (ret != 0) {
fprintf(stderr, "%s: failed to speak\n", __func__);
}

View File

@ -121,13 +121,13 @@ bool gpt2_model_load(const std::string & fname, gpt2_model & model, gpt_vocab &
return false;
}
std::string word;
char word[129];
for (int i = 0; i < n_vocab; i++) {
uint32_t len;
fin.read((char *) &len, sizeof(len));
word.resize(len);
fin.read((char *) word.data(), len);
word[len] = '\0';
fin.read((char *) word, len);
vocab.token_to_id[word] = i;
vocab.id_to_token[i] = word;

View File

@ -21,7 +21,7 @@ help()
echo "Usage: ./twitch.sh -s [step] -m [model] -t [threads] [url]"
echo "options:"
echo "-s Step in seconds (default is $step)."
echo "-m Choose model, options are: 'tiny.en' 'tiny' 'base.en' 'base' 'small.en' 'small' 'medium.en' 'medium' 'large-v1' 'large-v2' 'large' (default is '$model')."
echo "-m Choose model, options are: 'tiny.en' 'tiny' 'base.en' 'base' 'small.en' 'small' 'medium.en' 'medium' 'large-v1' 'large-v2' 'large-v3' (default is '$model')."
echo "-t Number of threads to use."
echo "-h Print this help page."
echo

View File

@ -0,0 +1,9 @@
set(CMAKE_CXX_STANDARD 11)
add_subdirectory(libwchess)
if (EMSCRIPTEN)
add_subdirectory(wchess.wasm)
else()
add_subdirectory(wchess.cmd)
endif()

40
examples/wchess/README.md Normal file
View File

@ -0,0 +1,40 @@
# wchess
Voice-controlled chess using Whisper
Online demo: https://whisper.ggerganov.com/wchess/
https://github.com/ggerganov/whisper.cpp/assets/1991296/c2b2f03c-9684-49f3-8106-357d2d4e67fa
## Command-line tool
```bash
mkdir build && cd build
cmake -DWHISPER_SDL2=1 ..
make -j
./bin/wchess -m ../models/ggml-base.en.bin
Move: start
a b c d e f g h
r n b q k b n r 8
p p p p p p p p 7
. * . * . * . * 6
* . * . * . * . 5
. * . * . * . * 4
* . * . * . * . 3
P P P P P P P P 2
R N B Q K B N R 1
White's turn
[(l)isten/(p)ause/(q)uit]:
```
## TODO
- Improve web-browser audio capture - sometimes it does not record the voice properly
- Add support for more languages by making the generated grammar string multi-lingual
- Fix bugs in the chess moves logic
PRs welcome!

View File

@ -0,0 +1,19 @@
add_library(wchess-core STATIC
WChess.cpp
WChess.h
Chessboard.cpp
Chessboard.h
)
target_link_libraries(wchess-core
PUBLIC
whisper
common
)
target_include_directories(wchess-core
PUBLIC
"$<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}>"
)
# add_executable(test-chessboard test-chessboard.cpp Chessboard.cpp)

View File

@ -0,0 +1,803 @@
#include "Chessboard.h"
#include <array>
#include <vector>
#include <algorithm>
#include <cstring>
#include <set>
#include <list>
#include <chrono>
namespace {
constexpr std::array<const char*, 64> positions = {
"a1", "b1", "c1", "d1", "e1", "f1", "g1", "h1",
"a2", "b2", "c2", "d2", "e2", "f2", "g2", "h2",
"a3", "b3", "c3", "d3", "e3", "f3", "g3", "h3",
"a4", "b4", "c4", "d4", "e4", "f4", "g4", "h4",
"a5", "b5", "c5", "d5", "e5", "f5", "g5", "h5",
"a6", "b6", "c6", "d6", "e6", "f6", "g6", "h6",
"a7", "b7", "c7", "d7", "e7", "f7", "g7", "h7",
"a8", "b8", "c8", "d8", "e8", "f8", "g8", "h8",
};
constexpr char INVALID_POS = positions.size();
constexpr int R = 0; // rank index
constexpr int F = 1; // file index
#define FILE (c[F] - '1')
#define RANK (c[R] - 'a')
constexpr char operator ""_P(const char * c, size_t size) {
return size < 2 || RANK < 0 || RANK > 7 ||
FILE < 0 || FILE > 7 ? INVALID_POS : FILE * 8 + RANK;
}
#undef FILE
#undef RANK
struct sview {
const char * ptr = nullptr;
size_t size = 0;
sview() = default;
sview(const char * p, size_t s) : ptr(p), size(s) {}
sview(const std::string& s) : ptr(s.data()), size(s.size()) {}
size_t find(char del, size_t pos) {
while (pos < size && ptr[pos] != del) ++pos;
return pos < size ? pos : std::string::npos;
}
};
std::vector<sview> split(sview str, char del) {
std::vector<sview> res;
size_t cur = 0;
size_t last = 0;
while (cur != std::string::npos) {
if (str.ptr[last] == ' ') {
++last;
continue;
}
cur = str.find(del, last);
size_t len = cur == std::string::npos ? str.size - last : cur - last;
res.emplace_back(str.ptr + last, len);
last = cur + 1;
}
return res;
}
char strToPos(sview str) {
return operator ""_P(str.ptr, str.size);
}
constexpr std::array<const char*, 6> pieceNames = {
"pawn", "knight", "bishop", "rook", "queen", "king",
};
static constexpr std::array<char, 6> blackShort = {
'p', 'n', 'b', 'r', 'q', 'k',
};
static constexpr std::array<char, 6> whiteShort = {
'P', 'N', 'B', 'R', 'Q', 'K',
};
char strToType(sview str) {
auto it = std::find_if(pieceNames.begin(), pieceNames.end(), [str] (const char* name) { return strncmp(name, str.ptr, str.size) == 0; });
return it != pieceNames.end() ? it - pieceNames.begin() : pieceNames.size();
}
// directions
using Direction = std::array<char, 2>;
constexpr Direction N = {(char) 0, (char) 1};
constexpr Direction NNE = {(char) 1, (char) 2};
constexpr Direction NE = {(char) 1, (char) 1};
constexpr Direction ENE = {(char) 2, (char) 1};
constexpr Direction E = {(char) 1, (char) 0};
constexpr Direction ESE = {(char) 2, (char) -1};
constexpr Direction SE = {(char) 1, (char) -1};
constexpr Direction SSE = {(char) 1, (char) -2};
constexpr Direction S = {(char) 0, (char) -1};
constexpr Direction SSW = {(char) -1, (char) -2};
constexpr Direction SW = {(char) -1, (char) -1};
constexpr Direction WSW = {(char) -2, (char) -1};
constexpr Direction W = {(char) -1, (char) 0};
constexpr Direction WNW = {(char) -2, (char) 1};
constexpr Direction NW = {(char) -1, (char) 1};
constexpr Direction NNW = {(char) -1, (char) 2};
char makeStep(char pos, const Direction& d) {
char next[2] = { char(positions[pos][R] + d[R]) , char(positions[pos][F] + d[F]) };
return strToPos(sview{next, sizeof(next)});
}
template<class Modifier>
char traverse(char pos, const Direction& d, const Modifier& m, int count = 8) {
while (--count >= 0) {
pos = makeStep(pos, d);
if (pos == INVALID_POS || m(pos)) break;
}
return pos;
}
Direction normalize(const Direction& distance) {
//return {char((distance[R] > 0) - (distance[R] < 0)), char((distance[F] > 0) - (distance[F] < 0))};
const int drp = distance[R] > 0 ? 1 : 0;
const int drn = distance[R] < 0 ? 1 : 0;
const int dfp = distance[F] > 0 ? 1 : 0;
const int dfn = distance[F] < 0 ? 1 : 0;
return {char(drp - drn), char(dfp - dfn)};
}
struct Pin {
Direction d;
Piece* pinner;
Piece* pinned;
};
using Pins = std::list<Pin>;
using Board = std::array<Piece*, 64>;
std::vector<Direction> filter(const Direction& pin, std::initializer_list<Direction> directions) {
if (pin[R] == 0 && pin[F] == 0) return directions;
std::vector<Direction> result;
for (auto& d : directions) {
if ((d[R] == pin[R] || d[R] == -pin[R]) && (d[F] == pin[F] || d[F] == -pin[F])) result.push_back(d);
}
return result;
}
}
class Piece {
public:
enum Types : char {
Pawn,
Knight,
Bishop,
Rook,
Queen,
King,
//
NUM_PIECES
};
enum Colors : char {
White,
Black,
};
const char* name() const;
char initial() const;
Types type() const { return m_type; }
Colors color() const { return m_color; }
char pos() const { return m_pos; }
void setPos(char pos) {
m_pos = pos;
invalidate();
}
const char* coord() const;
const std::set<char>& allowed() const { return m_allowed; }
bool canReach(char pos) const;
virtual bool movePattern(char pos) const = 0;
void take();
virtual void reinit(const State& state) = 0;
void invalidate();
protected:
Piece(Types type, Colors color, char pos, std::set<char> allowed)
: m_type(type), m_color(color), m_pos(pos), m_allowed(std::move(allowed)) {}
Piece(const Piece&) = delete;
~Piece() = default;
const Types m_type;
const Colors m_color;
char m_pos;
std::set<char> m_allowed;
bool m_update = false;
};
struct Pawn : public Piece {
Pawn(Colors color, char pos, std::set<char> next) : Piece(Types::Pawn, color, pos, std::move(next)) {}
bool is_first_move() const {
return m_color ? coord()[F] == '7' : coord()[F] == '2';
}
virtual bool movePattern(char pos) const override {
if (m_pos == INVALID_POS) return false;
auto cur = coord();
auto next = positions[pos];
Direction distance = {char(next[R] - cur[R]), char(next[F] - cur[F])};
char forward = m_color ? -1 : 1;
return (forward == distance[F] && distance[R] * distance[R] <= 1)
|| (is_first_move() && 2 * forward == distance[F] && distance[R] == 0);
}
virtual void reinit(const State& state) override;
};
struct Knight : public Piece {
Knight(Colors color, char pos, std::set<char> next) : Piece(Types::Knight, color, pos, std::move(next)) {}
virtual bool movePattern(char pos) const override {
if (m_pos == INVALID_POS) return false;
auto cur = coord();
auto next = positions[pos];
Direction diff = {char(next[R] - cur[R]), char(next[F] - cur[F])};
return diff[R]*diff[R] + diff[F]*diff[F] == 5;
}
virtual void reinit(const State& state) override;
};
struct Bishop : public Piece {
Bishop(Colors color, char pos) : Piece(Types::Bishop, color, pos, {}) {}
virtual bool movePattern(char pos) const override {
if (m_pos == INVALID_POS) return false;
auto cur = coord();
auto next = positions[pos];
return cur[R] - cur[F] == next[R] - next[F] || cur[R] + cur[F] == next[R] + next[F];
}
virtual void reinit(const State& state) override;
};
struct Rook : public Piece {
Rook(Colors color, char pos) : Piece(Types::Rook, color, pos, {}) {}
virtual bool movePattern(char pos) const override {
if (m_pos == INVALID_POS) return false;
auto cur = coord();
auto next = positions[pos];
return cur[R] == next[R] || cur[F] == next[F];
}
virtual void reinit(const State& state) override;
};
struct Queen : public Piece {
Queen(Colors color, char pos) : Piece(Types::Queen, color, pos, {}) {}
virtual bool movePattern(char pos) const override {
if (m_pos == INVALID_POS) return false;
auto cur = coord();
auto next = positions[pos];
return cur[R] == next[R] || cur[F] == next[F] || cur[R] - cur[F] == next[R] - next[F] || cur[R] + cur[F] == next[R] + next[F];
}
virtual void reinit(const State& state) override;
};
struct King : public Piece {
King(Colors color, char pos) : Piece(Types::King, color, pos, {}) {}
virtual bool movePattern(char pos) const override {
if (m_pos == INVALID_POS) return false;
auto cur = coord();
auto next = positions[pos];
Direction diff = {char(next[R] - cur[R]), char(next[F] - cur[F])};
return diff[R]*diff[R] + diff[F]*diff[F] <= 2;
}
virtual void reinit(const State& state) override;
};
struct PieceSet {
Piece* begin() { return &p1; }
Piece* end() { return &r2 + 1; }
const Piece* begin() const { return &p1; }
const Piece* end() const { return &r2 + 1; }
Piece& operator[](int i) { return *(begin() + i); }
const Piece& operator[](int i) const { return *(begin() + i); }
Pawn p1;
Pawn p2;
Pawn p3;
Pawn p4;
Pawn p5;
Pawn p6;
Pawn p7;
Pawn p8;
Rook r1;
Knight n1;
Bishop b1;
Queen q;
King k;
Bishop b2;
Knight n2;
Rook r2;
};
struct State {
State();
PieceSet blacks;
PieceSet whites;
Board board;
Pins blackPins;
Pins whitePins;
};
Direction findPin(const Piece& piece, const State& state) {
auto& pins = piece.color() ? state.blackPins : state.whitePins;
auto it = std::find_if(pins.begin(), pins.end(), [&] (const Pin& pin) { return pin.pinned == &piece; });
if (it != pins.end()) return it->d;
return {0, 0};
}
struct Find {
Find(const Board& board) : m_board(board) {}
bool operator() (char pos) const { return m_board[pos]; }
const Board& m_board;
};
struct Add {
Add(const Board& board, std::set<char>& moves, Piece::Colors color) : m_board(board), m_moves(moves), m_color(color) {}
bool operator() (char pos) const {
if (!m_board[pos] || m_board[pos]->color() != m_color) m_moves.insert(pos);
return m_board[pos];
}
const Board& m_board;
std::set<char>& m_moves;
Piece::Colors m_color;
};
void Pawn::reinit(const State& state) {
if (m_pos == INVALID_POS) return;
if (!m_update) return;
m_update = false;
m_allowed.clear();
auto pin = findPin(*this, state);
auto & left = m_color ? SW : NW;
auto & right = m_color ? SE : NE;
for (auto& direction : filter(pin, { left, right })) {
auto pos = makeStep(m_pos, direction);
if (pos != INVALID_POS && state.board[pos] && state.board[pos]->color() != m_color) m_allowed.insert(pos);
}
auto & forward = m_color ? S : N;
if (!filter(pin, {forward}).empty()) {
traverse(m_pos, forward, [&] (char pos) {
if (!state.board[pos]) m_allowed.insert(pos);
return state.board[pos] || !is_first_move();
}, 2);
}
}
void Knight::reinit(const State& state) {
if (m_pos == INVALID_POS) return;
if (!m_update) return;
m_update = false;
m_allowed.clear();
auto pin = findPin(*this, state);
if (pin[R] != 0 || pin[F] != 0) return;
for (auto& direction : { NNE, ENE, ESE, SSE, SSW, WSW, WNW, NNW }) {
auto pos = makeStep(m_pos, direction);
if (pos != INVALID_POS && (!state.board[pos] || state.board[pos]->color() != m_color)) m_allowed.insert(pos);
}
}
void Bishop::reinit(const State& state) {
if (m_pos == INVALID_POS) return;
if (!m_update) return;
m_update = false;
m_allowed.clear();
auto pin = findPin(*this, state);
for (auto& direction : filter(pin, { NE, SE, SW, NW })) {
traverse(m_pos, direction, Add(state.board, m_allowed, m_color));
}
}
void Rook::reinit(const State& state) {
if (m_pos == INVALID_POS) return;
if (!m_update) return;
m_update = false;
m_allowed.clear();
auto pin = findPin(*this, state);
for (auto& direction : filter(pin, { N, E, S, W })) {
traverse(m_pos, direction, Add(state.board, m_allowed, m_color));
}
}
void Queen::reinit(const State& state) {
if (m_pos == INVALID_POS) return;
if (!m_update) return;
m_update = false;
m_allowed.clear();
auto pin = findPin(*this, state);
for (auto& direction : filter(pin, { N, NE, E, SE, S, SW, W, NW })) {
traverse(m_pos, direction, Add(state.board, m_allowed, m_color));
}
}
void King::reinit(const State& state) {
if (m_pos == INVALID_POS) return;
if (!m_update) return;
m_update = false;
m_allowed.clear();
auto& enemyPieces = m_color ? state.whites : state.blacks;
auto& pawnAttackLeft = m_color ? SW : NW;
auto& pawnAttackRight = m_color ? SE : NE;
for (auto& direction : { N, NE, E, SE, S, SW, W, NW }) {
auto pos = makeStep(m_pos, direction);
bool accept = pos != INVALID_POS && !(state.board[pos] && state.board[pos]->color() == m_color);
if (accept) {
for (auto& p : enemyPieces) {
if (!p.movePattern(pos)) continue;
if (p.type() == Piece::Knight || p.type() == Piece::King) {
accept = false;
break;
}
else if (p.type() == Piece::Pawn) {
auto from = positions[pos];
auto to = p.coord();
Direction d {char(to[R] - from[R]), char(to[F] - from[F])};
if (d == pawnAttackLeft || d == pawnAttackRight) {
accept = false;
break;
}
}
else {
auto from = positions[pos];
auto to = p.coord();
Direction d = normalize({char(to[R] - from[R]), char(to[F] - from[F])});
auto reached = traverse(pos, d, Find(state.board));
if (p.pos() == reached) {
accept = false;
break;
}
}
}
}
if (accept) m_allowed.insert(pos);
}
}
const char* Piece::name() const {
static_assert(pieceNames.size() == Piece::NUM_PIECES, "Mismatch between piece names and types");
return pieceNames[m_type];
}
char Piece::initial() const {
static_assert(blackShort.size() == Piece::NUM_PIECES, "Mismatch between piece names and types");
static_assert(whiteShort.size() == Piece::NUM_PIECES, "Mismatch between piece names and types");
return m_color ? blackShort[m_type] : whiteShort[m_type];
}
void Piece::invalidate() {
m_update = true;
}
const char* Piece::coord() const {
if (m_pos == INVALID_POS) return "";
return positions[m_pos];
}
bool Piece::canReach(char pos) const {
return movePattern(pos) && m_allowed.count(pos);
}
void Piece::take() {
m_pos = INVALID_POS;
m_allowed = {};
}
State::State()
: blacks {
{Piece::Black, "a7"_P, {"a5"_P, "a6"_P} },
{Piece::Black, "b7"_P, {"b5"_P, "b6"_P} },
{Piece::Black, "c7"_P, {"c5"_P, "c6"_P} },
{Piece::Black, "d7"_P, {"d5"_P, "d6"_P} },
{Piece::Black, "e7"_P, {"e5"_P, "e6"_P} },
{Piece::Black, "f7"_P, {"f5"_P, "f6"_P} },
{Piece::Black, "g7"_P, {"g5"_P, "g6"_P} },
{Piece::Black, "h7"_P, {"h5"_P, "h6"_P} },
{Piece::Black, "a8"_P},
{Piece::Black, "b8"_P, {"a6"_P, "c6"_P} },
{Piece::Black, "c8"_P},
{Piece::Black, "d8"_P},
{Piece::Black, "e8"_P},
{Piece::Black, "f8"_P},
{Piece::Black, "g8"_P, {"f6"_P, "h6"_P} },
{Piece::Black, "h8"_P},
}
, whites {
{Piece::White, "a2"_P, {"a3"_P, "a4"_P} },
{Piece::White, "b2"_P, {"b3"_P, "b4"_P} },
{Piece::White, "c2"_P, {"c3"_P, "c4"_P} },
{Piece::White, "d2"_P, {"d3"_P, "d4"_P} },
{Piece::White, "e2"_P, {"e3"_P, "e4"_P} },
{Piece::White, "f2"_P, {"f3"_P, "f4"_P} },
{Piece::White, "g2"_P, {"g3"_P, "g4"_P} },
{Piece::White, "h2"_P, {"h3"_P, "h4"_P} },
{Piece::White, "a1"_P},
{Piece::White, "b1"_P, {"a3"_P, "c3"_P} },
{Piece::White, "c1"_P},
{Piece::White, "d1"_P},
{Piece::White, "e1"_P},
{Piece::White, "f1"_P},
{Piece::White, "g1"_P, {"f3"_P, "h3"_P} },
{Piece::White, "h1"_P},
}
, board {{
&whites[ 8], &whites[ 9], &whites[10], &whites[11], &whites[12], &whites[13], &whites[14], &whites[15],
&whites[ 0], &whites[ 1], &whites[ 2], &whites[ 3], &whites[ 4], &whites[ 5], &whites[ 6], &whites[ 7],
nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
&blacks[ 0], &blacks[ 1], &blacks[ 2], &blacks[ 3], &blacks[ 4], &blacks[ 5], &blacks[ 6], &blacks[ 7],
&blacks[ 8], &blacks[ 9], &blacks[10], &blacks[11], &blacks[12], &blacks[13], &blacks[14], &blacks[15],
}}
{}
Chessboard::Chessboard()
: m_state(new State())
{
setGrammar();
}
Chessboard::~Chessboard() = default;
void Chessboard::setPrompt(const std::string& prompt) {
m_prompt = prompt;
setGrammar();
}
void Chessboard::setGrammar() {
m_grammar.clear();
std::string result;
if (m_prompt.empty()) {
result += "move ::= \" \" ((piece | frompos) \" \" \"to \"?)? topos\n";
//result += "move ::= \" \" frompos \" \" \"to \"? topos\n";
}
else {
// result += "move ::= prompt \" \" ((piece | frompos) \" \" \"to \"?)? topos\n"
result += "move ::= prompt \" \" frompos \" \" \"to \"? topos\n"
"prompt ::= \" " + m_prompt + "\"\n";
}
std::set<Piece::Types> pieceTypes;
std::set<char> from_pos;
std::set<char> to_pos;
auto& pieces = m_moveCounter % 2 ? m_state->blacks : m_state->whites;
std::set<size_t> flags;
for (auto& p : pieces) {
if (p.allowed().empty()) continue;
bool addPiece = false;
if (!m_inCheck || p.type() == Piece::King) {
to_pos.insert(p.allowed().begin(), p.allowed().end());
addPiece = !p.allowed().empty();
}
else {
for (auto move : p.allowed()) {
if (m_allowedInCheck.count(move)) {
to_pos.insert(move);
addPiece = true;
}
}
}
if (addPiece) {
pieceTypes.insert(p.type());
from_pos.insert(p.pos());
}
}
if (pieceTypes.empty()) return;
result += "piece ::= (";
for (auto& p : pieceTypes) result += " \"" + std::string(pieceNames[p]) + "\" |";
result.pop_back();
result += ")\n\n";
result += "frompos ::= (";
for (auto& p : from_pos) result += " \"" + std::string(positions[p]) + "\" |";
result.pop_back();
result += ")\n";
result += "topos ::= (";
for (auto& p : to_pos) result += " \"" + std::string(positions[p]) + "\" |";
result.pop_back();
result += ")\n";
m_grammar = std::move(result);
}
std::string Chessboard::stringifyBoard() {
std::string result;
result.reserve(16 + 2 * 64 + 16);
for (char rank = 'a'; rank <= 'h'; ++rank) {
result.push_back(rank);
result.push_back(' ');
}
result.back() = '\n';
for (int i = 7; i >= 0; --i) {
for (int j = 0; j < 8; ++j) {
auto p = m_state->board[i * 8 + j];
if (p) result.push_back(p->initial());
else result.push_back((i + j) % 2 ? '.' : '*');
result.push_back(' ');
}
result.push_back('0' + i + 1);
result.push_back('\n');
}
return result;
}
std::string Chessboard::process(const std::string& command) {
const auto t_start = std::chrono::high_resolution_clock::now();
auto color = Piece::Colors(m_moveCounter % 2);
Piece* piece = nullptr;
auto pos_to = INVALID_POS;
if (!parseCommand(command, piece, pos_to)) return "";
auto pos_from = piece->pos();
if (!move(*piece, pos_to)) return "";
flagUpdates(pos_from, pos_to);
detectChecks();
auto& enemyPieces = color ? m_state->whites : m_state->blacks;
for (auto& p : enemyPieces) p.reinit(*m_state); // only enemy moves needed next
std::string result = {positions[pos_from][R], positions[pos_from][F], '-', positions[pos_to][R], positions[pos_to][F]};
++m_moveCounter;
setGrammar();
const auto t_end = std::chrono::high_resolution_clock::now();
auto t_ms = std::chrono::duration_cast<std::chrono::milliseconds>(t_end - t_start).count();
fprintf(stdout, "%s: Move '%s%s%s', (t = %d ms)\n", __func__, "\033[1m", result.data(), "\033[0m", (int) t_ms);
if (m_grammar.empty()) result.push_back('#');
return result;
}
bool Chessboard::parseCommand(const std::string& command, Piece*& piece, char& pos_to) {
auto color = Piece::Colors(m_moveCounter % 2);
fprintf(stdout, "%s: Command to %s: '%s%.*s%s'\n", __func__, (color ? "Black" : "White"), "\033[1m", int(command.size()), command.data(), "\033[0m");
if (command.empty()) return false;
auto tokens = split(command, ' ');
auto pos_from = INVALID_POS;
auto type = Piece::Types::NUM_PIECES;
if (tokens.size() == 1) {
type = Piece::Types::Pawn;
pos_to = strToPos(tokens.front());
}
else {
pos_from = strToPos(tokens.front());
if (pos_from == INVALID_POS) type = Piece::Types(strToType(tokens.front()));
pos_to = strToPos(tokens.back());
}
if (pos_to == INVALID_POS) return false;
if (pos_from == INVALID_POS) {
if (type == Piece::Types::NUM_PIECES) return false;
auto& pieces = color ? m_state->blacks : m_state->whites;
for (auto& p : pieces) {
if (p.type() == type && p.canReach(pos_to)) {
pos_from = p.pos();
break;
}
}
}
if (pos_from == INVALID_POS) return false;
if (m_state->board[pos_from] == nullptr) return false;
piece = m_state->board[pos_from];
if (piece->color() != color) return false;
return true;
}
void Chessboard::flagUpdates(char pos_from, char pos_to) {
auto color = Piece::Colors(m_moveCounter % 2);
auto& enemyPieces = color ? m_state->whites : m_state->blacks;
auto& ownPieces = color ? m_state->blacks : m_state->whites;
for (auto& p : enemyPieces) {
if (p.movePattern(pos_to) || p.movePattern(pos_from)) {
updatePins(p);
p.invalidate();
}
}
for (auto& p : ownPieces) {
if (p.movePattern(pos_to) || p.movePattern(pos_from)) {
updatePins(p);
p.invalidate();
}
}
}
void Chessboard::updatePins(Piece& piece) {
if (piece.type() == Piece::Pawn || piece.type() == Piece::Knight || piece.type() == Piece::King) return;
auto& enemyPieces = piece.color() ? m_state->whites : m_state->blacks;
auto& enemyPins = piece.color() ? m_state->whitePins : m_state->blackPins;
auto& king = enemyPieces.k;
auto it = std::find_if(enemyPins.begin(), enemyPins.end(), [&] (const Pin& pin) { return pin.pinner == &piece; });
if (it != enemyPins.end()) {
it->pinned->invalidate();
enemyPins.erase(it);
}
if (piece.movePattern(king.pos())) {
auto to = positions[king.pos()];
auto from = piece.coord();
Direction d = normalize({char(to[R] - from[R]), char(to[F] - from[F])});
auto reached = traverse(piece.pos(), d, Find(m_state->board));
auto foundPiece = m_state->board[reached];
if (&king == foundPiece) {
// check
king.invalidate();
}
else if (foundPiece && foundPiece->color() != piece.color()) {
reached = traverse(reached, d, Find(m_state->board));
if (&king == m_state->board[reached]) {
enemyPins.push_back({d, &piece, foundPiece});
foundPiece->invalidate();
}
}
}
}
void Chessboard::detectChecks() {
auto color = Piece::Colors(m_moveCounter % 2);
auto& enemyPieces = color ? m_state->whites : m_state->blacks;
auto& ownPieces = color ? m_state->blacks : m_state->whites;
auto& king = enemyPieces.k;
auto& pawnAttackLeft = color ? SW : NW;
auto& pawnAttackRight = color ? SE : NE;
for (auto& p : ownPieces) {
if (!p.movePattern(king.pos())) continue;
auto to = positions[king.pos()];
auto from = p.coord();
if (p.type() == Piece::Knight) {
if (!m_inCheck) {
m_allowedInCheck = { p.pos() };
}
else {
m_allowedInCheck.clear();
}
m_inCheck = true;
}
else if (p.type() == Piece::Pawn) {
Direction d {char(to[R] - from[R]), char(to[F] - from[F])};
if (d == pawnAttackLeft || d == pawnAttackRight) {
if (!m_inCheck) {
m_allowedInCheck = { p.pos() };
}
else {
m_allowedInCheck.clear();
}
m_inCheck = true;
}
}
else {
Direction d = normalize({char(to[R] - from[R]), char(to[F] - from[F])});
std::set<char> tmp;
auto pos = traverse(p.pos(), d, Add(m_state->board, tmp, king.color()));
if (pos == king.pos()) {
tmp.insert(p.pos());
if (!m_inCheck) {
m_allowedInCheck = std::move(tmp);
}
else {
m_allowedInCheck.clear();
}
m_inCheck = true;
}
}
}
}
bool Chessboard::move(Piece& piece, char pos_to) {
auto& allowed = piece.allowed();
if (allowed.count(pos_to) == 0 || (m_inCheck && piece.type() != Piece::King && m_allowedInCheck.count(pos_to) == 0)) return false;
if (m_state->board[pos_to] && m_state->board[pos_to]->color() == piece.color()) return false;
if (m_state->board[pos_to]) m_state->board[pos_to]->take();
m_state->board[piece.pos()] = nullptr;
m_state->board[pos_to] = &piece;
piece.setPos(pos_to);
m_inCheck = false;
m_allowedInCheck.clear();
return true;
}

View File

@ -0,0 +1,33 @@
#pragma once
#include <string>
#include <set>
#include <memory>
// just basic validation
// fixme: missing en passant, castling, promotion, etc.
struct State;
class Piece;
class Chessboard {
public:
Chessboard();
~Chessboard();
std::string process(const std::string& command);
std::string stringifyBoard();
const std::string& grammar() { return m_grammar; }
const std::string& prompt() { return m_prompt; }
void setPrompt(const std::string& prompt);
private:
bool parseCommand(const std::string& command, Piece*& piece, char& pos_to);
bool move(Piece& piece, char pos);
void flagUpdates(char pos_from, char pos_to);
void updatePins(Piece& piece);
void detectChecks();
void setGrammar();
std::unique_ptr<State> m_state;
std::set<char> m_allowedInCheck;
bool m_inCheck = false;
int m_moveCounter = 0;
std::string m_grammar;
std::string m_prompt;
};

View File

@ -0,0 +1,193 @@
#include "WChess.h"
#include "Chessboard.h"
#include "grammar-parser.h"
#include "common.h"
#include <thread>
WChess::WChess(whisper_context * ctx,
const whisper_full_params & wparams,
callbacks cb,
settings s)
: m_ctx(ctx)
, m_wparams(wparams)
, m_cb(cb)
, m_settings(s)
, m_board(new Chessboard())
{}
WChess::~WChess() = default;
void WChess::set_move(const std::string& moves, float prob) const {
if (m_cb.set_move) (*m_cb.set_move)(moves, prob);
}
void WChess::set_grammar(const std::string& grammar) const {
if (m_cb.set_grammar) (*m_cb.set_grammar)(grammar);
}
bool WChess::get_audio(std::vector<float>& pcmf32) const {
if (m_cb.get_audio) return (*m_cb.get_audio)(pcmf32);
return false;
}
std::string WChess::stringify_board() const {
return m_board->stringifyBoard();
}
std::string WChess::get_grammar() const {
return m_board->grammar();
}
void WChess::run() {
bool have_prompt = true;
bool ask_prompt = !have_prompt;
float logprob_min = 0.0f;
float logprob_sum = 0.0f;
int n_tokens = 0;
std::vector<float> pcmf32_cur;
std::vector<float> pcmf32_prompt;
const std::string k_prompt = have_prompt ? "" : "rook to d4, f3";
int64_t t_ms = 0;
if (ask_prompt) {
fprintf(stdout, "\n");
fprintf(stdout, "%s: Say the following phrase: '%s%s%s'\n", __func__, "\033[1m", k_prompt.c_str(), "\033[0m");
fprintf(stdout, "\n");
ask_prompt = false;
}
while (get_audio(pcmf32_cur)) {
if (!pcmf32_cur.empty()) {
// fprintf(stdout, "%s: Processing ...\n", __func__);
if (!have_prompt) {
const auto txt = ::trim(transcribe(pcmf32_cur, logprob_min, logprob_sum, n_tokens, t_ms));
fprintf(stdout, "%s: Heard '%s%s%s', (t = %d ms)\n", __func__, "\033[1m", txt.c_str(), "\033[0m", (int) t_ms);
const float sim = similarity(txt, k_prompt);
if (txt.length() < 0.8*k_prompt.length() || txt.length() > 1.2*k_prompt.length() || sim < 0.8f) {
fprintf(stdout, "%s: WARNING: prompt not recognized, try again\n", __func__);
ask_prompt = true;
} else {
fprintf(stdout, "\n");
fprintf(stdout, "%s: The prompt has been recognized!\n", __func__);
fprintf(stdout, "%s: Waiting for voice commands ...\n", __func__);
fprintf(stdout, "\n");
// save the audio for the prompt
pcmf32_prompt = pcmf32_cur;
have_prompt = true;
m_board->setPrompt(k_prompt);
}
} else {
if (!pcmf32_prompt.empty()) pcmf32_cur.insert(pcmf32_cur.begin(), pcmf32_prompt.begin(), pcmf32_prompt.end());
constexpr size_t MIN_SIZE = 1.2 * WHISPER_SAMPLE_RATE;
if (MIN_SIZE > pcmf32_cur.size()) pcmf32_cur.insert(pcmf32_cur.begin(), MIN_SIZE - pcmf32_cur.size(), 0.0f);
// fprintf(stdout, "%s: grammar rules:\n'%s'\n", __func__, m_board->grammar().c_str());
auto grammar_parsed = grammar_parser::parse(m_board->grammar().c_str());
auto grammar_rules = grammar_parsed.c_rules();
m_wparams.grammar_rules = grammar_rules.data();
m_wparams.n_grammar_rules = grammar_rules.size();
m_wparams.i_start_rule = grammar_parsed.symbol_ids.at("move");
auto txt = ::trim(transcribe(pcmf32_cur, logprob_min, logprob_sum, n_tokens, t_ms));
const float p = 100.0f * std::exp(logprob_min);
fprintf(stdout, "%s: heard '%s'\n", __func__, txt.c_str());
// find the prompt in the text
float best_sim = 0.0f;
size_t best_len = 0;
for (int n = 0.8*k_prompt.size(); n <= 1.2*k_prompt.size(); ++n) {
const auto prompt = txt.substr(0, n);
const float sim = similarity(prompt, k_prompt);
//fprintf(stderr, "%s: prompt = '%s', sim = %f\n", __func__, prompt.c_str(), sim);
if (sim > best_sim) {
best_sim = sim;
best_len = n;
}
}
fprintf(stdout, "%s: DEBUG: txt = '%s', prob = %.2f%%\n", __func__, txt.c_str(), p);
std::string command = ::trim(txt.substr(best_len));
fprintf(stdout, "%s: Command '%s%s%s', (t = %d ms)\n", __func__, "\033[1m", command.c_str(), "\033[0m", (int) t_ms);
fprintf(stdout, "\n");
if (!command.empty()) {
set_move(m_board->process(command), p);
set_grammar(m_board->grammar());
}
if (m_board->grammar().empty()) {
fprintf(stdout, "%s: No more moves possible\n", __func__);
break;
}
}
}
if (ask_prompt) {
fprintf(stdout, "\n");
fprintf(stdout, "%s: Say the following phrase: '%s%s%s'\n", __func__, "\033[1m", k_prompt.c_str(), "\033[0m");
fprintf(stdout, "\n");
ask_prompt = false;
}
}
}
std::string WChess::transcribe(
const std::vector<float> & pcmf32,
float & logprob_min,
float & logprob_sum,
int & n_tokens,
int64_t & t_ms) {
const auto t_start = std::chrono::high_resolution_clock::now();
logprob_min = 0.0f;
logprob_sum = 0.0f;
n_tokens = 0;
t_ms = 0;
if (whisper_full(m_ctx, m_wparams, pcmf32.data(), pcmf32.size()) != 0) {
return {};
}
std::string result;
const int n_segments = whisper_full_n_segments(m_ctx);
for (int i = 0; i < n_segments; ++i) {
const char * text = whisper_full_get_segment_text(m_ctx, i);
result += text;
const int n = whisper_full_n_tokens(m_ctx, i);
for (int j = 0; j < n; ++j) {
const auto token = whisper_full_get_token_data(m_ctx, i, j);
if(token.plog > 0.0f) return {};
logprob_min = std::min(logprob_min, token.plog);
logprob_sum += token.plog;
++n_tokens;
}
}
const auto t_end = std::chrono::high_resolution_clock::now();
t_ms = std::chrono::duration_cast<std::chrono::milliseconds>(t_end - t_start).count();
return result;
}

View File

@ -0,0 +1,63 @@
#pragma once
#include "whisper.h"
#include <string>
#include <vector>
#include <memory>
class Chessboard;
class WChess {
public:
using CheckRunningCb = bool (*)();
using GetAudioCb = bool (*)(std::vector<float> &);
using SetMovesCb = void (*)(const std::string &, float);
using SetGrammarCb = void (*)(const std::string &);
using ClearAudioCb = void (*)();
struct callbacks {
GetAudioCb get_audio = nullptr;
SetMovesCb set_move = nullptr;
SetGrammarCb set_grammar = nullptr;
};
struct settings {
int32_t vad_ms = 2000;
int32_t prompt_ms = 5000;
int32_t command_ms = 4000;
float vad_thold = 0.2f;
float freq_thold = 100.0f;
bool print_energy = false;
};
WChess(
whisper_context * ctx,
const whisper_full_params & wparams,
callbacks cb,
settings s
);
~WChess();
void run();
std::string stringify_board() const;
std::string get_grammar() const;
private:
bool get_audio(std::vector<float>& pcmf32) const;
void set_move(const std::string& moves, float prob) const;
void set_grammar(const std::string& grammar) const;
std::string transcribe(
const std::vector<float> & pcmf32,
float & logprob_min,
float & logprob_sum,
int & n_tokens,
int64_t & t_ms);
whisper_context * m_ctx;
whisper_full_params m_wparams;
const callbacks m_cb;
const settings m_settings;
std::unique_ptr<Chessboard> m_board;
};

View File

@ -0,0 +1,117 @@
#include "Chessboard.h"
#define ASSERT(x) \
do { \
if (!(x)) { \
fprintf(stderr, "ASSERT: %s:%d: %s\n", __FILE__, __LINE__, #x); \
fflush(stderr); \
exit(1); \
} \
} while (0)
int main() {
{
Chessboard chess;
ASSERT(chess.process("pawn to d4") == "d2-d4");
ASSERT(chess.process("e5") == "e7-e5");
ASSERT(chess.process("c1 h6") == "c1-h6");
ASSERT(chess.process("queen h4") == "d8-h4");
ASSERT(chess.process("bishop to g5") == "h6-g5");
ASSERT(chess.process("bishop to b4") == "f8-b4");
ASSERT(chess.process("c4") == "");
ASSERT(chess.process("knight c3") == "b1-c3");
ASSERT(chess.process("knight c6") == "b8-c6");
ASSERT(chess.process("f3") == "");
}
{
Chessboard chess;
ASSERT(chess.process("d4") == "d2-d4");
ASSERT(chess.process("e5") == "e7-e5");
ASSERT(chess.process("e4") == "e2-e4");
ASSERT(chess.process("queen h4") == "d8-h4");
ASSERT(chess.process("queen h5") == "d1-h5");
ASSERT(chess.process("f5") == "");
ASSERT(chess.process("g6") == "g7-g6");
ASSERT(chess.process("knight e2") == "g1-e2");
ASSERT(chess.process("f5") == "f7-f5");
ASSERT(chess.process("knight g3") == "e2-g3");
ASSERT(chess.process("g5") == "");
ASSERT(chess.process("king e7") == "e8-e7");
ASSERT(chess.process("f4") == "f2-f4");
ASSERT(chess.process("g5") == "g6-g5");
}
{
Chessboard chess;
ASSERT(chess.process("e4") == "e2-e4");
ASSERT(chess.process("c5") == "c7-c5");
ASSERT(chess.process("e5") == "e4-e5");
ASSERT(chess.process("c4") == "c5-c4");
ASSERT(chess.process("e6") == "e5-e6");
ASSERT(chess.process("c3") == "c4-c3");
ASSERT(chess.process("e7") == "");
ASSERT(chess.process("f7") == "e6-f7");
ASSERT(chess.process("d2") == "");
ASSERT(chess.process("king to f7") == "e8-f7");
ASSERT(chess.process("f4") == "f2-f4");
ASSERT(chess.process("d2") == "c3-d2");
ASSERT(chess.process("f5") == "");
ASSERT(chess.process("king to e2") == "e1-e2");
ASSERT(chess.process("king to g6") == "f7-g6");
ASSERT(chess.process("f5") == "f4-f5");
ASSERT(chess.process("e6") == "");
ASSERT(chess.process("king to h5") == "g6-h5");
ASSERT(chess.process("g4") == "g2-g4");
ASSERT(chess.process("king to g5") == "h5-g5");
ASSERT(chess.process("h4") == "h2-h4");
ASSERT(chess.process("king to h5") == "");
ASSERT(chess.process("king to g6") == "");
ASSERT(chess.process("king to h6") == "g5-h6");
ASSERT(chess.process("bishop to d2") == "c1-d2");
ASSERT(chess.process("king to g5") == "");
ASSERT(chess.process("g5") == "g7-g5");
}
{
Chessboard chess;
ASSERT(chess.process("f4") == "f2-f4");
ASSERT(chess.process("e5") == "e7-e5");
ASSERT(chess.process("g4") == "g2-g4");
ASSERT(chess.process("queen to h4") == "d8-h4#");
ASSERT(chess.process("knight f3") == "");
ASSERT(chess.grammar().empty());
}
{
Chessboard chess;
ASSERT(chess.process("f4") == "f2-f4");
ASSERT(chess.process("e5") == "e7-e5");
ASSERT(chess.process("g4") == "g2-g4");
ASSERT(chess.process("d5") == "d7-d5");
ASSERT(chess.process("g1 f3") == "g1-f3");
ASSERT(chess.process("queen to h4") == "d8-h4");
ASSERT(!chess.grammar().empty());
}
{
Chessboard chess;
ASSERT(chess.process("knight c3") == "b1-c3");
ASSERT(chess.process("knight c6") == "b8-c6");
ASSERT(chess.process("knight b5") == "c3-b5");
ASSERT(chess.process("knight f6") == "g8-f6");
ASSERT(chess.process("knight d6") == "b5-d6");
ASSERT(chess.process("knight d4") == "");
ASSERT(chess.process("d6") == "c7-d6");
ASSERT(chess.process("e4") == "e2-e4");
ASSERT(chess.process("knight d4") == "c6-d4");
ASSERT(chess.process("d3") == "d2-d3");
ASSERT(chess.process("knight e4") == "f6-e4");
ASSERT(chess.process("king to e2") == "");
ASSERT(chess.process("king to d2") == "");
}
}

View File

@ -0,0 +1,8 @@
if (WHISPER_SDL2)
set(TARGET wchess)
add_executable(${TARGET} wchess.cmd.cpp)
include(DefaultTargetOptions)
target_link_libraries(${TARGET} PRIVATE wchess-core common-sdl ${CMAKE_THREAD_LIBS_INIT})
endif ()

View File

@ -0,0 +1,247 @@
// Command line voice assisted chess
//
// Speak chess move commands to the microphone.
// The moves will translated to chessboard positions.
//
//
#include "WChess.h"
#include "common-sdl.h"
#include <iostream>
#include <memory>
#include <thread>
// command-line parameters
struct whisper_params {
int32_t n_threads = std::min(4, (int32_t) std::thread::hardware_concurrency());
int32_t prompt_ms = 5000;
int32_t command_ms = 8000;
int32_t capture_id = -1;
int32_t max_tokens = 32;
int32_t audio_ctx = 0;
float vad_thold = 0.6f;
float freq_thold = 100.0f;
float grammar_penalty = 100.0f;
bool speed_up = false;
bool translate = false;
bool print_special = false;
bool print_energy = false;
bool no_timestamps = true;
bool use_gpu = true;
std::string language = "en";
std::string model = "models/ggml-base.en.bin";
std::string fname_out;
std::string commands;
std::string prompt;
std::string context;
std::string grammar;
};
void whisper_print_usage(int /*argc*/, char ** argv, const whisper_params & params) {
fprintf(stderr, "\n");
fprintf(stderr, "usage: %s [options]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "options:\n");
fprintf(stderr, " -h, --help [default] show this help message and exit\n");
fprintf(stderr, " -t N, --threads N [%-7d] number of threads to use during computation\n", params.n_threads);
fprintf(stderr, " -pms N, --prompt-ms N [%-7d] prompt duration in milliseconds\n", params.prompt_ms);
fprintf(stderr, " -cms N, --command-ms N [%-7d] command duration in milliseconds\n", params.command_ms);
fprintf(stderr, " -c ID, --capture ID [%-7d] capture device ID\n", params.capture_id);
fprintf(stderr, " -mt N, --max-tokens N [%-7d] maximum number of tokens per audio chunk\n", params.max_tokens);
fprintf(stderr, " -ac N, --audio-ctx N [%-7d] audio context size (0 - all)\n", params.audio_ctx);
fprintf(stderr, " -vth N, --vad-thold N [%-7.2f] voice activity detection threshold\n", params.vad_thold);
fprintf(stderr, " -fth N, --freq-thold N [%-7.2f] high-pass frequency cutoff\n", params.freq_thold);
fprintf(stderr, " -su, --speed-up [%-7s] speed up audio by x2 (reduced accuracy)\n", params.speed_up ? "true" : "false");
fprintf(stderr, " -tr, --translate [%-7s] translate from source language to english\n", params.translate ? "true" : "false");
fprintf(stderr, " -ps, --print-special [%-7s] print special tokens\n", params.print_special ? "true" : "false");
fprintf(stderr, " -pe, --print-energy [%-7s] print sound energy (for debugging)\n", params.print_energy ? "true" : "false");
fprintf(stderr, " -ng, --no-gpu [%-7s] disable GPU\n", params.use_gpu ? "false" : "true");
fprintf(stderr, " -l LANG, --language LANG [%-7s] spoken language\n", params.language.c_str());
fprintf(stderr, " -m FNAME, --model FNAME [%-7s] model path\n", params.model.c_str());
fprintf(stderr, " -f FNAME, --file FNAME [%-7s] text output file name\n", params.fname_out.c_str());
fprintf(stderr, " -cmd FNAME, --commands FNAME [%-7s] text file with allowed commands\n", params.commands.c_str());
fprintf(stderr, " -p, --prompt [%-7s] the required activation prompt\n", params.prompt.c_str());
fprintf(stderr, " -ctx, --context [%-7s] sample text to help the transcription\n", params.context.c_str());
fprintf(stderr, " --grammar-penalty N [%-7.1f] scales down logits of nongrammar tokens\n", params.grammar_penalty);
fprintf(stderr, "\n");
}
bool whisper_params_parse(int argc, char ** argv, whisper_params & params) {
for (int i = 1; i < argc; i++) {
std::string arg = argv[i];
if (arg == "-h" || arg == "--help") {
whisper_print_usage(argc, argv, params);
exit(0);
}
else if (arg == "-t" || arg == "--threads") { params.n_threads = std::stoi(argv[++i]); }
else if (arg == "-pms" || arg == "--prompt-ms") { params.prompt_ms = std::stoi(argv[++i]); }
else if (arg == "-cms" || arg == "--command-ms") { params.command_ms = std::stoi(argv[++i]); }
else if (arg == "-c" || arg == "--capture") { params.capture_id = std::stoi(argv[++i]); }
else if (arg == "-mt" || arg == "--max-tokens") { params.max_tokens = std::stoi(argv[++i]); }
else if (arg == "-ac" || arg == "--audio-ctx") { params.audio_ctx = std::stoi(argv[++i]); }
else if (arg == "-vth" || arg == "--vad-thold") { params.vad_thold = std::stof(argv[++i]); }
else if (arg == "-fth" || arg == "--freq-thold") { params.freq_thold = std::stof(argv[++i]); }
else if (arg == "-su" || arg == "--speed-up") { params.speed_up = true; }
else if (arg == "-tr" || arg == "--translate") { params.translate = true; }
else if (arg == "-ps" || arg == "--print-special") { params.print_special = true; }
else if (arg == "-pe" || arg == "--print-energy") { params.print_energy = true; }
else if (arg == "-ng" || arg == "--no-gpu") { params.use_gpu = false; }
else if (arg == "-l" || arg == "--language") { params.language = argv[++i]; }
else if (arg == "-m" || arg == "--model") { params.model = argv[++i]; }
else if (arg == "-f" || arg == "--file") { params.fname_out = argv[++i]; }
else if (arg == "-cmd" || arg == "--commands") { params.commands = argv[++i]; }
else if (arg == "-p" || arg == "--prompt") { params.prompt = argv[++i]; }
else if (arg == "-ctx" || arg == "--context") { params.context = argv[++i]; }
else if ( arg == "--grammar-penalty") { params.grammar_penalty = std::stof(argv[++i]); }
else {
fprintf(stderr, "error: unknown argument: %s\n", arg.c_str());
whisper_print_usage(argc, argv, params);
exit(0);
}
}
return true;
}
std::unique_ptr<WChess> g_wchess;
int g_moveCount = 0;
void set_move(const std::string & move, float) {
if (!move.empty()) {
g_moveCount++;
fprintf(stdout, "Move: %s\n\n", move.c_str());
}
else fprintf(stdout, "Move rejected\n\n");
fprintf(stdout, "%s\n", g_wchess->stringify_board().c_str());
fprintf(stdout, "%s\n", g_moveCount ? "White's turn" : "Black's turn");
}
audio_async g_audio(30*1000);
bool g_listening = false;
std::vector<float> g_pcmf32;
bool read_input() {
std::string input;
while (true) {
fprintf(stdout, "[(l)isten/(p)ause/(q)uit]: ");
std::cin >> input;
fprintf(stdout, "\n");
if (input[0] == 'q') {
fprintf(stdout, "Quitting\n");
return false;
}
if (input[0] == 'l') {
if (!g_listening) {
fprintf(stdout, "Listening\n");
g_listening = true;
g_pcmf32.clear();
g_audio.resume();
g_audio.clear();
}
else fprintf(stdout, "Still listening\n");
return true;
}
else {
if (g_listening) {
g_listening = false;
g_audio.get(0, g_pcmf32);
g_audio.pause();
fprintf(stdout, "Processing\n");
}
else fprintf(stdout, "Not listening\n");
return true;
}
}
return true;
}
bool get_audio(std::vector<float> & pcmf32_cur) {
if (!read_input()) return false;
if (!g_pcmf32.empty()) pcmf32_cur = std::move(g_pcmf32);
else pcmf32_cur.clear();
return true;
}
int main(int argc, char ** argv) {
whisper_params params;
if (whisper_params_parse(argc, argv, params) == false) {
return 1;
}
if (whisper_lang_id(params.language.c_str()) == -1) {
fprintf(stderr, "error: unknown language '%s'\n", params.language.c_str());
whisper_print_usage(argc, argv, params);
exit(0);
}
// whisper init
struct whisper_context_params cparams;
cparams.use_gpu = params.use_gpu;
struct whisper_context * ctx = whisper_init_from_file_with_params(params.model.c_str(), cparams);
if (!ctx) {
fprintf(stderr, "%s: whisper_init_from_file_with_params() failed!\n", __func__);
return 1;
}
// init audio
if (!g_audio.init(params.capture_id, WHISPER_SAMPLE_RATE)) {
fprintf(stderr, "%s: audio.init() failed!\n", __func__);
return 1;
}
struct whisper_full_params wparams = whisper_full_default_params(whisper_sampling_strategy::WHISPER_SAMPLING_GREEDY);
wparams.offset_ms = 0;
wparams.translate = false;
wparams.no_context = true;
wparams.single_segment = true;
wparams.print_realtime = false;
wparams.print_progress = false;
wparams.print_timestamps = true;
wparams.print_special = false;
wparams.no_timestamps = true;
wparams.max_tokens = 32;
wparams.audio_ctx = 768; // partial encoder context for better performance
wparams.temperature = 0.0f;
wparams.temperature_inc = 2.0f;
wparams.greedy.best_of = 1;
wparams.beam_search.beam_size = 1;
wparams.language = "en";
wparams.grammar_penalty = 100.0;
wparams.initial_prompt = params.context.data();
WChess::callbacks cb;
cb.get_audio = get_audio;
cb.set_move = set_move;
WChess::settings s;
s.vad_ms = 2000;
s.prompt_ms = params.prompt_ms;
s.command_ms = params.command_ms;
s.vad_thold = params.vad_thold;
s.freq_thold = params.freq_thold;
s.print_energy = params.print_energy;
g_wchess.reset(new WChess(ctx, wparams, cb, s));
set_move("start", 0);
g_wchess->run();
whisper_print_timings(ctx);
whisper_free(ctx);
return 0;
}

View File

@ -0,0 +1,51 @@
set(TARGET wchess.wasm)
add_executable(${TARGET}
wchess.wasm.cpp
)
include(DefaultTargetOptions)
target_link_libraries(${TARGET} PRIVATE
common
wchess-core
)
unset(EXTRA_FLAGS)
if (WHISPER_WASM_SINGLE_FILE)
set(EXTRA_FLAGS "-s SINGLE_FILE=1")
message(STATUS "Embedding WASM inside chess.js")
add_custom_command(
TARGET ${TARGET} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy
${CMAKE_BINARY_DIR}/bin/${TARGET}.js
${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${TARGET}/js/chess.js
)
endif()
set_target_properties(${TARGET} PROPERTIES LINK_FLAGS " \
--bind \
-s USE_PTHREADS=1 \
-s PTHREAD_POOL_SIZE=8 \
-s INITIAL_MEMORY=1024MB \
-s TOTAL_MEMORY=1024MB \
-s FORCE_FILESYSTEM=1 \
-s EXPORTED_RUNTIME_METHODS=\"['print', 'printErr', 'ccall', 'cwrap']\" \
${EXTRA_FLAGS} \
")
add_custom_command(
TARGET ${TARGET} POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_directory
${CMAKE_CURRENT_SOURCE_DIR}/chessboardjs-1.0.0
${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${TARGET}/
COMMAND ${CMAKE_COMMAND} -E copy
${CMAKE_CURRENT_SOURCE_DIR}/jquery-3.7.1.min.js
${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${TARGET}/js/
)
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/index-tmpl.html ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${TARGET}/index.html @ONLY)
configure_file(${CMAKE_SOURCE_DIR}/examples/helpers.js ${CMAKE_RUNTIME_OUTPUT_DIRECTORY}/${TARGET}/js/helpers.js @ONLY)

View File

@ -0,0 +1,54 @@
/*! chessboard.js v1.0.0 | (c) 2019 Chris Oakman | MIT License chessboardjs.com/license */
.clearfix-7da63 {
clear: both;
}
.board-b72b1 {
border: 2px solid #404040;
box-sizing: content-box;
}
.square-55d63 {
float: left;
position: relative;
/* disable any native browser highlighting */
-webkit-touch-callout: none;
-webkit-user-select: none;
-khtml-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
}
.white-1e1d7 {
background-color: #f0d9b5;
color: #b58863;
}
.black-3c85d {
background-color: #b58863;
color: #f0d9b5;
}
.highlight1-32417, .highlight2-9c5d2 {
box-shadow: inset 0 0 3px 3px yellow;
}
.notation-322f9 {
cursor: default;
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
font-size: 14px;
position: absolute;
}
.alpha-d2270 {
bottom: 1px;
right: 3px;
}
.numeric-fc462 {
top: 2px;
left: 2px;
}

View File

@ -0,0 +1,2 @@
/*! chessboard.js v1.0.0 | (c) 2019 Chris Oakman | MIT License chessboardjs.com/license */
.clearfix-7da63{clear:both}.board-b72b1{border:2px solid #404040;box-sizing:content-box}.square-55d63{float:left;position:relative;-webkit-touch-callout:none;-webkit-user-select:none;-khtml-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.white-1e1d7{background-color:#f0d9b5;color:#b58863}.black-3c85d{background-color:#b58863;color:#f0d9b5}.highlight1-32417,.highlight2-9c5d2{box-shadow:inset 0 0 3px 3px #ff0}.notation-322f9{cursor:default;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:14px;position:absolute}.alpha-d2270{bottom:1px;right:3px}.numeric-fc462{top:2px;left:2px}

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 777 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 748 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 KiB

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,32 @@
# chessboard.js Change Log
All notable changes to this project will be documented in this file.
## [1.0.0] - 2019-06-11
- Orientation methods now return current orientation. [Issue #64]
- Drop support for IE8
- Do not check for `window.JSON` (Error #1004)
- Rename `ChessBoard` to `Chessboard` (`ChessBoard` is still supported, however)
- id query selectors are now supported as the first argument to `Chessboard()`
- Remove Error #1002
- Format code according to [StandardJS]
- Bump minimum jQuery version to 1.8.3
- Throttle piece drag functions
## [0.3.0] - 2013-08-10
- Added `appearSpeed` animation config property
- Added `onSnapbackEnd` event
- Added `onMoveEnd` event
## [0.2.0] - 2013-08-05
- Added `onMouseoverSquare` and `onMouseoutSquare` events
- Added `onSnapEnd` event
- Added square code as CSS class on the squares
- Added [chess.js] integration examples
## [0.1.0] - 2013-05-21
- Initial release
[chess.js]:https://github.com/jhlywa/chess.js
[Issue #64]:https://github.com/oakmac/chessboardjs/issues/64
[StandardJS]:https://standardjs.com/

View File

@ -0,0 +1,20 @@
Copyright 2019 Chris Oakman
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View File

@ -0,0 +1,82 @@
# chessboard.js
chessboard.js is a JavaScript chessboard component. It depends on [jQuery].
Please see [chessboardjs.com] for documentation and examples.
## What is chessboard.js?
chessboard.js is a JavaScript chessboard component with a flexible "just a
board" API that
chessboard.js is a standalone JavaScript Chess Board. It is designed to be "just
a board" and expose a powerful API so that it can be used in different ways.
Here's a non-exhaustive list of things you can do with chessboard.js:
- Use chessboard.js to show game positions alongside your expert commentary.
- Use chessboard.js to have a tactics website where users have to guess the best
move.
- Integrate chessboard.js and [chess.js] with a PGN database and allow people to
search and playback games (see [Example 5000])
- Build a chess server and have users play their games out using the
chessboard.js board.
chessboard.js is flexible enough to handle any of these situations with relative
ease.
## What can chessboard.js **not** do?
The scope of chessboard.js is limited to "just a board." This is intentional and
makes chessboard.js flexible for handling a multitude of chess-related problems.
This is a common source of confusion for new users. [remove?]
Specifically, chessboard.js does not understand anything about how the game of
chess is played: how a knight moves, who's turn is it, is White in check?, etc.
Fortunately, the powerful [chess.js] library deals with exactly this sort of
problem domain and plays nicely with chessboard.js's flexible API. Some examples
of chessboard.js combined with chess.js: 5000, 5001, 5002
Please see the powerful [chess.js] library for an API to deal with these sorts
of questions.
This logic is distinct from the logic of the board. Please see the powerful
[chess.js] library for this aspect of your application.
Here is a list of things that chessboard.js is **not**:
- A chess engine
- A legal move validator
- A PGN parser
chessboard.js is designed to work well with any of those things, but the idea
behind chessboard.js is that the logic that controls the board should be
independent of those other problems.
## Docs and Examples
- Docs - <http://chessboardjs.com/docs>
- Examples - <http://chessboardjs.com/examples>
## Developer Tools
```sh
# create a build in the build/ directory
npm run build
# re-build the website
npm run website
```
## License
[MIT License](LICENSE.md)
[jQuery]:https://jquery.com/
[chessboardjs.com]:http://chessboardjs.com
[chess.js]:https://github.com/jhlywa/chess.js
[Example 5000]:http://chessboardjs.com/examples#5000

View File

@ -0,0 +1,29 @@
{
"author": "Chris Oakman <chris@oakmac.com> (http://chrisoakman.com/)",
"name": "@chrisoakman/chessboardjs",
"description": "JavaScript chessboard widget",
"homepage": "https://chessboardjs.com",
"license": "MIT",
"version": "1.0.0",
"repository": {
"type": "git",
"url": "git://github.com/oakmac/chessboardjs.git"
},
"files": ["dist/"],
"dependencies": {
"jquery": ">=3.4.1"
},
"devDependencies": {
"csso": "3.5.1",
"fs-plus": "3.1.1",
"kidif": "1.1.0",
"mustache": "2.3.0",
"standard": "10.0.2",
"uglify-js": "3.6.0"
},
"scripts": {
"build": "standard lib/chessboard.js && node scripts/build.js",
"standard": "standard --fix lib/*.js website/js/*.js",
"website": "node scripts/website.js"
}
}

View File

@ -0,0 +1,499 @@
<!doctype html>
<html lang="en-us">
<head>
<title>wchess : voice-controlled chess using Whisper + WebAssembly</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/4.3.1/iframeResizer.contentWindow.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=0.7, maximum-scale=1, minimum-scale=0.7, user-scalable=no"/>
<meta name="apple-mobile-web-app-capable" content="yes" />
<style>
#output {
width: 100%;
height: 100%;
margin: 0 auto;
margin-top: 10px;
border-left: 0px;
border-right: 0px;
padding-left: 0px;
padding-right: 0px;
display: block;
background-color: black;
color: white;
font-size: 10px;
font-family: 'Lucida Console', Monaco, monospace;
outline: none;
white-space: pre;
overflow-wrap: normal;
overflow-x: scroll;
}
.button {
background-color: #000000;
color: #FFFFFF;
padding: 20px;
border-radius: 10px;
-moz-border-radius: 10px;
-webkit-border-radius: 10px;
margin:10px;
width: 100px;
height: 50px;
-webkit-touch-callout: none; /* Safari */
-webkit-user-select: none; /* Chrome */
-moz-user-select: none; /* Firefox */
-ms-user-select: none; /* Internet Explorer/Edge */
user-select: none;
}
button[disabled]{
background-color: #cccccc;
color: #666666;
padding: 20px;
border-radius: 10px;
-moz-border-radius: 10px;
-webkit-border-radius: 10px;
margin:10px;
width: 100px;
}
.center {
display: flex;
justify-content: center;
align-items: center;
width: 500px;
}
#description {
width: 500px;
}
</style>
<link rel="stylesheet" href="css/chessboard-1.0.0.min.css" integrity="sha384-q94+BZtLrkL1/ohfjR8c6L+A6qzNH9R2hBLwyoAfu3i/WCvQjzL2RQJ3uNHDISdU" crossorigin="anonymous">
</head>
<body>
<div id="main-container">
<div id="description">
<b>wchess : voice-controlled chess using Whisper + WebAssembly</b>
<br><br>
This is a demonstration of using Whisper to recognize voice commands in the browser.
<br><br>
Usage:<br>
<ul>
<li>Select a Whisper model</li>
<li>Accept the microphone permission request if prompted</li>
<li>Hold the button and say a chess move (e.g. "Knight to c3")</li>
<li>Release the button and wait for the move to be recognized</li>
<li>Repeat</li>
</ul>
Examples:<br>
<ul>
<li><b>"d4"</b></li>
<li><b>"e2 e4"</b></li>
<li><b>"Knight f3"</b></li>
<li><b>"Bishop to b5"</b></li>
</ul>
Features:<br>
<ul>
<li>Model quantization for reduced memory footprint (~42MB)</li>
<li><a href="https://github.com/ggerganov/whisper.cpp/pull/1229">Grammar-based sampling</a> for improved recognition accuracy</li>
</ul>
<b>
Note that not all chess moves are supported. For example, castling and pawn promotion
currently do not work, but can be easily implemented. There could also be some bugs in
the move handling logic in general. The main reason for that is to keep the implementation
simple. The assumption is that a real application would already have a proper move
validation logic in place.<br><br>
The main purpose of this example is to demonstrate the capabilities of whisper.cpp and
its application in the browser for voice recognition locally on your device.
</b>
<br><br>
You can find more about this project on <a href="https://github.com/ggerganov/whisper.cpp/tree/master/examples/wchess">GitHub</a>.
<br><br>
<b>More examples:</b>
<a href="https://whisper.ggerganov.com/">main</a> |
<a href="https://whisper.ggerganov.com/bench">bench</a> |
<a href="https://whisper.ggerganov.com/stream">stream</a> |
<a href="https://whisper.ggerganov.com/command">command</a> |
<a href="https://whisper.ggerganov.com/talk">talk</a> |
<br><br>
</div>
<hr>
<div id="model-whisper">
Whisper model: <span id="model-whisper-status"></span>
<button id="fetch-whisper-tiny-en" onclick="loadWhisper()">tiny.en (Q8_0, 42 MB)</button>
<span id="fetch-whisper-progress"></span>
<br><br>
<button id="clear" onclick="clearCache()">Clear browser cache</button>
<!--
<input type="file" id="file" name="file" onchange="loadFile(event, 'whisper.bin')" />
-->
</div>
<div id="game">
<br>
<div id="chessboard" style="width: 500px"></div>
<script src="js/jquery-3.7.1.min.js"></script>
<script src="js/chessboard-1.0.0.min.js"></script>
<script>
var board = Chessboard('chessboard', 'start')
var move_count = 0;
</script>
<br>
<div id="state">
Status: <b><span id="state-status">select model</span></b>
<div id="input" class="center">
<button id="toggler" class="button" onselectstart="return false" style="display: none">Hold</button>
</div>
<pre id="state-grammar">[The grammar will be displayed here]</pre>
<pre id="state-moves">[The moves will be displayed here]</pre>
</div>
</div>
<hr>
Debug output:
<textarea id="output" rows="20"></textarea>
<br>
<b>Troubleshooting</b>
<br><br>
The page does some heavy computations, so make sure:
<ul>
<li>To use a modern web browser (e.g. Chrome, Firefox)</li>
<li>Your browser supports WASM <a href="https://webassembly.org/roadmap/">Fixed-width SIMD</a></li>
</ul>
<div class="cell-version">
<span>
|
Build time: <span class="nav-link">@GIT_DATE@</span> |
Commit hash: <a class="nav-link" href="https://github.com/ggerganov/whisper.cpp/commit/@GIT_SHA1@">@GIT_SHA1@</a> |
Commit subject: <span class="nav-link">@GIT_COMMIT_SUBJECT@</span> |
<a class="nav-link" href="https://github.com/ggerganov/whisper.cpp/tree/master/examples/command.wasm">Source Code</a> |
</span>
</div>
</div>
<script type="text/javascript" src="js/helpers.js"></script>
<script type='text/javascript'>
// web audio context
var context = null;
// the command instance
var instance = null;
// model name
var model_whisper = null;
var model_file = null;
var module_ready = null;
var Module = {
print: printTextarea,
printErr: printTextarea,
setStatus: function(text) {
printTextarea('js: ' + text);
},
monitorRunDependencies: function(left) {
},
preRun: function() {
printTextarea('js: Preparing ...');
},
postRun: function() {
printTextarea('js: Module initialized successfully!');
module_ready = true;
initInstance();
}
};
function initInstance() {
if (!module_ready || !model_file || instance) return
instance = Module.init(model_file);
if (instance) {
setStatus('Ready');
printTextarea("js: whisper initialized, instance: " + instance);
}
else {
printTextarea("js: failed to initialize whisper");
}
}
function setStatus(text) {
document.getElementById('state-status').innerHTML = text;
}
//
// fetch models
//
let dbVersion = 1
let dbName = 'whisper.ggerganov.com';
let indexedDB = window.indexedDB || window.mozIndexedDB || window.webkitIndexedDB || window.msIndexedDB
function storeFS(fname, buf) {
// write to WASM file using FS_createDataFile
// if the file exists, delete it
try {
Module.FS_unlink(fname);
} catch (e) {
// ignore
}
Module.FS_createDataFile("/", fname, buf, true, true);
printTextarea('storeFS: stored model: ' + fname + ' size: ' + buf.length);
document.getElementById('model-whisper-status').innerHTML = 'loaded "' + model_whisper + '"!';
model_file = fname;
initInstance();
}
function loadWhisper() {
setStatus('Loading')
//let url = 'https://whisper.ggerganov.com/ggml-model-whisper-tiny.en-q8_0.bin';
let url = 'https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.en-q8_0.bin';
let dst = 'whisper.bin';
let size_mb = 42;
model_whisper = 'tiny.en-q8_0';
document.getElementById('model-whisper-status').innerHTML = 'loading "' + model_whisper + '" ... ';
document.getElementById('fetch-whisper-tiny-en').style.display = 'none';
cbProgress = function(p) {
let el = document.getElementById('fetch-whisper-progress');
el.innerHTML = Math.round(100*p) + '%';
};
cbCancel = function() {
var el;
el = document.getElementById('model-whisper-status'); if (el) el.innerHTML = '';
};
loadRemote(url, dst, size_mb, cbProgress, storeFS, cbCancel, printTextarea);
// init audio capture so that the user receives a permission request
{
let context = new AudioContext({
sampleRate: 16000,
channelCount: 1,
echoCancellation: false,
autoGainControl: true,
noiseSuppression: true,
});
navigator.mediaDevices.getUserMedia({audio: true, video: false})
.then(function(s) {
stream = s;
stream.getTracks().forEach(function(track) {
track.stop();
});
})
.catch(function(err) {
printTextarea('js: error getting audio stream: ' + err);
});
context.close();
}
document.getElementById('toggler').style.display = 'block';
}
//
// microphone
//
const kSampleRate = 16000;
const kRestartRecording_s = 120;
const kIntervalAudio_ms = 250; // pass the recorded audio to the C++ instance at this rate
var mediaRecorder = null;
var doRecording = false;
var startTime = 0;
window.AudioContext = window.AudioContext || window.webkitAudioContext;
window.OfflineAudioContext = window.OfflineAudioContext || window.webkitOfflineAudioContext;
function stopRecording() {
if (mediaRecorder) {
mediaRecorder.stop();
}
}
function startRecording() {
if (!context) {
context = new AudioContext({
sampleRate: kSampleRate,
channelCount: 1,
echoCancellation: false,
autoGainControl: true,
noiseSuppression: true,
});
}
startTime = Date.now();
var chunks = [];
var stream = null;
navigator.mediaDevices.getUserMedia({audio: true, video: false})
.then(function(s) {
stream = s;
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = function(e) {
chunks.push(e.data);
var blob = new Blob(chunks, { 'type' : 'audio/ogg; codecs=opus' });
var reader = new FileReader();
reader.onload = function(event) {
var buf = new Uint8Array(reader.result);
context.decodeAudioData(buf.buffer, function(audioBuffer) {
var offlineContext = new OfflineAudioContext(audioBuffer.numberOfChannels, audioBuffer.length, audioBuffer.sampleRate);
var source = offlineContext.createBufferSource();
source.buffer = audioBuffer;
source.connect(offlineContext.destination);
source.start(0);
offlineContext.startRendering().then(function(renderedBuffer) {
let audio = renderedBuffer.getChannelData(0);
printTextarea('js: number of samples: ' + audio.length);
Module.set_audio(instance, audio);
});
mediaRecorder = null;
context = null;
});
}
reader.readAsArrayBuffer(blob);
};
mediaRecorder.onstop = function(e) {
stream.getTracks().forEach(function(track) {
track.stop();
});
};
mediaRecorder.start();
})
.catch(function(err) {
printTextarea('js: error getting audio stream: ' + err);
});
}
//
// main
//
var nLines = 0;
var movesAll = '';
// document.body.addEventListener('keydown', function(event) {
// if (event.keyCode === 32) {
// document.getElementById('toggler').innerText = "";
// onStart();
// }
// }, true);
// document.body.addEventListener('keyup', function(event) {
// if (event.keyCode === 32) {
// document.getElementById('toggler').innerText = "Hold";
// onStop();
// }
// }, true);
document.getElementById('toggler').addEventListener("touchstart", function(event){
this.innerText = "";
onStart();
}, true);
document.getElementById('toggler').addEventListener("touchend", function(event){
this.innerText = "Hold";
onStop();
}, true)
document.getElementById('toggler').addEventListener('mousedown', function(event) {
this.innerText = "";
onStart();
}, true);
document.getElementById('toggler').addEventListener('mouseup', function(event) {
this.innerText = "Hold";
onStop();
}, true);
function onStart() {
if (!instance) return;
setStatus('Listening');
startRecording();
}
function onStop() {
setStatus('Processing');
printTextarea('js: stopping recording ...');
stopRecording();
}
function setMove(move, prob) {
if (move != null && move.length > 1) {
let gameOver = move[move.length - 1] === '#';
if (gameOver) {
move = move.substring(0, move.length - 1);
document.getElementById('toggler').disabled = true;
}
board.move(move);
movesAll += move + ', prob = ' + prob.toFixed(2) + '% <br>';
nLines++;
// if more than 10 lines, remove the first line
if (nLines > 10) {
var i = movesAll.indexOf('<br>');
if (i > 0) {
movesAll = movesAll.substring(i + 4);
nLines--;
}
}
++move_count;
setStatus(gameOver ? 'Done' : move_count % 2 ? 'Black\'s turn' : 'White\'s turn');
document.getElementById('state-moves').innerHTML = movesAll;
}
else {
setStatus('Failed. ' + (move_count % 2 ? 'Black\'s turn' : 'White\'s turn'));
}
}
function setGrammar(grammar) {
document.getElementById('state-grammar').innerHTML = grammar;
}
</script>
<script type="text/javascript" src="js/chess.js"></script>
</body>
</html>

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,141 @@
#include <WChess.h>
#include <emscripten.h>
#include <emscripten/bind.h>
#include <thread>
constexpr int N_THREAD = 8;
std::vector<struct whisper_context *> g_contexts(4, nullptr);
std::mutex g_mutex;
std::thread g_worker;
std::condition_variable g_cv;
bool g_running(false);
std::vector<float> g_pcmf32;
void set_move(const std::string & move, float prob) {
MAIN_THREAD_EM_ASM({
setMove(UTF8ToString($0), $1)
}, move.c_str(), prob);
}
void set_grammar(const std::string & grammar) {
MAIN_THREAD_EM_ASM({
setGrammar(UTF8ToString($0))
}, grammar.c_str());
}
bool get_audio(std::vector<float> & audio) {
std::unique_lock<std::mutex> lock(g_mutex);
g_cv.wait(lock, [] { return !g_running || !g_pcmf32.empty(); });
if (!g_running) return false;
audio = std::move(g_pcmf32);
return true;
}
void wchess_main(size_t i) {
struct whisper_full_params wparams = whisper_full_default_params(whisper_sampling_strategy::WHISPER_SAMPLING_GREEDY);
wparams.n_threads = std::min(N_THREAD, (int) std::thread::hardware_concurrency());
wparams.offset_ms = 0;
wparams.translate = false;
wparams.no_context = true;
wparams.single_segment = true;
wparams.print_realtime = false;
wparams.print_progress = false;
wparams.print_timestamps = true;
wparams.print_special = false;
wparams.no_timestamps = true;
wparams.max_tokens = 32;
wparams.audio_ctx = 1280; // partial encoder context for better performance
wparams.temperature = 0.0f;
wparams.temperature_inc = 2.0f;
wparams.greedy.best_of = 1;
wparams.beam_search.beam_size = 1;
wparams.language = "en";
wparams.grammar_penalty = 100.0;
wparams.initial_prompt = "bishop to c3, rook to d4, knight to e5, d4 d5, knight to c3, c3, queen to d4, king b1, pawn to a1, bishop to b2, knight to c3,";
printf("command: using %d threads\n", wparams.n_threads);
WChess::callbacks cb;
cb.get_audio = get_audio;
cb.set_move = set_move;
cb.set_grammar = set_grammar;
WChess(g_contexts[i], wparams, cb, {}).run();
if (i < g_contexts.size()) {
whisper_free(g_contexts[i]);
g_contexts[i] = nullptr;
}
}
EMSCRIPTEN_BINDINGS(command) {
emscripten::function("init", emscripten::optional_override([](const std::string & path_model) {
for (size_t i = 0; i < g_contexts.size(); ++i) {
if (g_contexts[i] == nullptr) {
g_contexts[i] = whisper_init_from_file_with_params(path_model.c_str(), whisper_context_default_params());
if (g_contexts[i] != nullptr) {
g_running = true;
if (g_worker.joinable()) {
g_worker.join();
}
g_worker = std::thread([i]() {
wchess_main(i);
});
return i + 1;
} else {
return (size_t) 0;
}
}
}
return (size_t) 0;
}));
emscripten::function("free", emscripten::optional_override([](size_t /* index */) {
{
std::unique_lock<std::mutex> lock(g_mutex);
g_running = false;
}
g_cv.notify_one();
}));
emscripten::function("set_audio", emscripten::optional_override([](size_t index, const emscripten::val & audio) {
--index;
if (index >= g_contexts.size()) {
return -1;
}
if (g_contexts[index] == nullptr) {
return -2;
}
{
std::lock_guard<std::mutex> lock(g_mutex);
const int n = audio["length"].as<int>();
emscripten::val heap = emscripten::val::module_property("HEAPU8");
emscripten::val memory = heap["buffer"];
g_pcmf32.resize(n);
emscripten::val memoryView = audio["constructor"].new_(memory, reinterpret_cast<uintptr_t>(g_pcmf32.data()), n);
memoryView.call<void>("set", audio);
}
g_cv.notify_one();
return 0;
}));
}

View File

@ -0,0 +1,15 @@
*.iml
.gradle
/local.properties
/.idea/caches
/.idea/libraries
/.idea/modules.xml
/.idea/workspace.xml
/.idea/navEditor.xml
/.idea/assetWizardSettings.xml
.DS_Store
/build
/captures
.externalNativeBuild
.cxx
local.properties

View File

@ -0,0 +1,20 @@
A sample Android app using java code and [whisper.cpp](https://github.com/ggerganov/whisper.cpp/) to do voice-to-text transcriptions.
To use:
1. Select a model from the [whisper.cpp repository](https://github.com/ggerganov/whisper.cpp/tree/master/models).[^1]
2. Copy the model to the "app/src/main/assets/models" folder.
3. Select a sample audio file (for example, [jfk.wav](https://github.com/ggerganov/whisper.cpp/raw/master/samples/jfk.wav)).
4. Copy the sample to the "app/src/main/assets/samples" folder.
5. Modify the modelFilePath in the WhisperService.java
6. Modify the sampleFilePath in the WhisperService.java
7. Select the "release" active build variant, and use Android Studio to run and deploy to your device.
[^1]: I recommend the tiny or base models for running on an Android device.
PS:
1. Do not move this android project folder individually to other folders, because this android project folder depends on the files of the whole project.
2. The cpp code is compiled during the build process
3. If you want to import a compiled cpp project in your Android project, please refer to the https://github.com/litongjava/whisper.cpp.android.java.demo
![](README_files/1.jpg)

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

View File

@ -0,0 +1 @@
/build

View File

@ -0,0 +1,58 @@
plugins {
id 'com.android.application'
}
android {
compileSdkVersion 30
buildToolsVersion '30.0.3'
defaultConfig {
applicationId "com.litongjava.whisper.android.java"
minSdkVersion 21
targetSdkVersion 30
versionCode 1
versionName "1.0"
testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
externalNativeBuild {
cmake {
cppFlags ""
}
}
ndk {
abiFilters 'arm64-v8a', 'armeabi-v7a', 'x86', 'x86_64'
}
}
buildTypes {
release {
signingConfig signingConfigs.debug
minifyEnabled true
proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
}
}
externalNativeBuild {
cmake {
path "src/main/jni/whisper/CMakeLists.txt"
}
}
ndkVersion "25.2.9519653"
compileOptions {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}
}
dependencies {
implementation 'androidx.appcompat:appcompat:1.1.0'
implementation 'com.google.android.material:material:1.1.0'
implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
testImplementation 'junit:junit:4.+'
androidTestImplementation 'androidx.test.ext:junit:1.1.5'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.5.1'
//litongjava
implementation 'com.litongjava:android-view-inject:1.0'
implementation 'com.litongjava:jfinal-aop:1.0.1'
implementation 'com.litongjava:litongjava-android-utils:1.0.0'
}

View File

@ -0,0 +1,21 @@
# Add project specific ProGuard rules here.
# You can control the set of applied configuration files using the
# proguardFiles setting in build.gradle.
#
# For more details, see
# http://developer.android.com/guide/developing/tools/proguard.html
# If your project uses WebView with JS, uncomment the following
# and specify the fully qualified class name to the JavaScript interface
# class:
#-keepclassmembers class fqcn.of.javascript.interface.for.webview {
# public *;
#}
# Uncomment this to preserve the line number information for
# debugging stack traces.
#-keepattributes SourceFile,LineNumberTable
# If you keep the line number information, uncomment this to
# hide the original source file name.
#-renamesourcefileattribute SourceFile

View File

@ -0,0 +1,26 @@
package com.litongjava.whisper.android.java;
import android.content.Context;
import androidx.test.platform.app.InstrumentationRegistry;
import androidx.test.ext.junit.runners.AndroidJUnit4;
import org.junit.Test;
import org.junit.runner.RunWith;
import static org.junit.Assert.*;
/**
* Instrumented test, which will execute on an Android device.
*
* @see <a href="http://d.android.com/tools/testing">Testing documentation</a>
*/
@RunWith(AndroidJUnit4.class)
public class ExampleInstrumentedTest {
@Test
public void useAppContext() {
// Context of the app under test.
Context appContext = InstrumentationRegistry.getInstrumentation().getTargetContext();
assertEquals("com.litongjava.whisper.android.java", appContext.getPackageName());
}
}

View File

@ -0,0 +1,22 @@
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
package="com.litongjava.whisper.android.java">
<application
android:allowBackup="true"
android:name=".app.App"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@mipmap/ic_launcher_round"
android:supportsRtl="true"
android:theme="@style/Theme.Whisperandroidjava">
<activity android:name=".MainActivity">
<intent-filter>
<action android:name="android.intent.action.MAIN" />
<category android:name="android.intent.category.LAUNCHER" />
</intent-filter>
</activity>
</application>
</manifest>

View File

@ -0,0 +1,40 @@
<?xml version="1.0" encoding="UTF-8" ?>
<configuration debug="false" xmlns="http://ch.qos.logback/xml/ns/logback"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ch.qos.logback/xml/ns/logback https://raw.githubusercontent.com/enricopulatzo/logback-XSD/master/src/main/xsd/logback.xsd
http://ch.qos.logback/xml/ns/logback ">
<!--Define the storage address of the log file Do not use relative paths in the LogBack configuration. -->
<property name="LOG_HOME" value="logs" />
<!--Formatted output: %d means the date, %-6level: log level from the left display 6 characters wide, %m: log message, %n is a newline character -->
<property name="CONSOLE_LOG_PATTERN"
value="%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-6level%logger{0}.%M:%L - %m%n" />
<!-- console output -->
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>${CONSOLE_LOG_PATTERN}</pattern>
</encoder>
</appender>
<!-- Generate log files on a daily basis -->
<appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>${CONSOLE_LOG_PATTERN}</pattern>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<!--File name for log file output -->
<fileNamePattern>${LOG_HOME}/project-name-%d{yyyy-MM-dd}.log</fileNamePattern>
<!--Maximum size of log file -->
<maxHistory>180</maxHistory>
</rollingPolicy>
<!--日志文件最大的大小 -->
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>10MB</maxFileSize>
</triggeringPolicy>
</appender>
<!-- Log output level and source-->
<root level="info">
<appender-ref ref="STDOUT" />
<appender-ref ref="FILE" />
</root>
</configuration>

View File

@ -0,0 +1,107 @@
package com.litongjava.whisper.android.java;
import androidx.annotation.RequiresApi;
import androidx.appcompat.app.AppCompatActivity;
import android.content.Context;
import android.os.Build;
import android.os.Bundle;
import android.os.Handler;
import android.os.Looper;
import android.view.View;
import android.widget.TextView;
import com.blankj.utilcode.util.ThreadUtils;
import com.litongjava.android.view.inject.annotation.FindViewById;
import com.litongjava.android.view.inject.annotation.FindViewByIdLayout;
import com.litongjava.android.view.inject.annotation.OnClick;
import com.litongjava.android.view.inject.utils.ViewInjectUtils;
import com.litongjava.jfinal.aop.Aop;
import com.litongjava.jfinal.aop.AopManager;
import com.litongjava.whisper.android.java.services.WhisperService;
import com.litongjava.whisper.android.java.task.LoadModelTask;
import com.litongjava.whisper.android.java.task.TranscriptionTask;
import com.litongjava.whisper.android.java.utils.AssetUtils;
import com.whispercpp.java.whisper.WhisperLib;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.File;
@FindViewByIdLayout(R.layout.activity_main)
public class MainActivity extends AppCompatActivity {
@FindViewById(R.id.sample_text)
private TextView tv;
Logger log = LoggerFactory.getLogger(this.getClass());
private WhisperService whisperService = Aop.get(WhisperService.class);
@RequiresApi(api = Build.VERSION_CODES.O)
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
//setContentView(R.layout.activity_main);
ViewInjectUtils.injectActivity(this, this);
initAopBean();
showSystemInfo();
}
private void initAopBean() {
Handler mainHandler = new Handler(Looper.getMainLooper());
AopManager.me().addSingletonObject(mainHandler);
}
@RequiresApi(api = Build.VERSION_CODES.O)
@OnClick(R.id.loadModelBtn)
public void loadModelBtn_OnClick(View v) {
Context context = getBaseContext();
ThreadUtils.executeByIo(new LoadModelTask(tv));
}
@OnClick(R.id.transcriptSampleBtn)
public void transcriptSampleBtn_OnClick(View v) {
Context context = getBaseContext();
long start = System.currentTimeMillis();
String sampleFilePath = "samples/jfk.wav";
File filesDir = context.getFilesDir();
File sampleFile = AssetUtils.copyFileIfNotExists(context, filesDir, sampleFilePath);
long end = System.currentTimeMillis();
String msg = "copy file:" + (end - start) + "ms";
outputMsg(tv, msg);
ThreadUtils.executeByIo(new TranscriptionTask(tv, sampleFile));
}
private void outputMsg(TextView tv, String msg) {
tv.append(msg + "\n");
log.info(msg);
}
@RequiresApi(api = Build.VERSION_CODES.O)
@OnClick(R.id.systemInfoBtn)
public void systemInfoBtn_OnClick(View v) {
showSystemInfo();
}
@RequiresApi(api = Build.VERSION_CODES.O)
public void showSystemInfo() {
String systemInfo = WhisperLib.getSystemInfo();
tv.append(systemInfo + "\n");
}
@OnClick(R.id.clearBtn)
public void clearBtn_OnClick(View v) {
tv.setText("");
}
@RequiresApi(api = Build.VERSION_CODES.O)
@Override
protected void onDestroy() {
super.onDestroy();
whisperService.release();
}
}

View File

@ -0,0 +1,13 @@
package com.litongjava.whisper.android.java.app;
import android.app.Application;
import com.blankj.utilcode.util.Utils;
public class App extends Application {
@Override
public void onCreate() {
super.onCreate();
Utils.init(this);
}
}

View File

@ -0,0 +1,47 @@
package com.litongjava.whisper.android.java.bean;
/**
* Created by litonglinux@qq.com on 10/21/2023_7:48 AM
*/
public class WhisperSegment {
private long start, end;
private String sentence;
public WhisperSegment() {
}
public WhisperSegment(long start, long end, String sentence) {
this.start = start;
this.end = end;
this.sentence = sentence;
}
public long getStart() {
return start;
}
public long getEnd() {
return end;
}
public String getSentence() {
return sentence;
}
public void setStart(long start) {
this.start = start;
}
public void setEnd(long end) {
this.end = end;
}
public void setSentence(String sentence) {
this.sentence = sentence;
}
@Override
public String toString() {
return "["+start+" --> "+end+"]:"+sentence;
}
}

View File

@ -0,0 +1,101 @@
package com.litongjava.whisper.android.java.services;
import android.content.Context;
import android.os.Build;
import android.os.Handler;
import android.widget.TextView;
import android.widget.Toast;
import androidx.annotation.RequiresApi;
import com.blankj.utilcode.util.ToastUtils;
import com.blankj.utilcode.util.Utils;
import com.litongjava.android.utils.dialog.AlertDialogUtils;
import com.litongjava.jfinal.aop.Aop;
import com.litongjava.whisper.android.java.bean.WhisperSegment;
import com.litongjava.whisper.android.java.single.LocalWhisper;
import com.litongjava.whisper.android.java.utils.WaveEncoder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.ExecutionException;
public class WhisperService {
private Logger log = LoggerFactory.getLogger(this.getClass());
private final Object lock = new Object();
@RequiresApi(api = Build.VERSION_CODES.O)
public void loadModel(TextView tv) {
String modelFilePath = LocalWhisper.modelFilePath;
String msg = "load model from :" + modelFilePath + "\n";
outputMsg(tv, msg);
long start = System.currentTimeMillis();
LocalWhisper.INSTANCE.init();
long end = System.currentTimeMillis();
msg = "model load successful:" + (end - start) + "ms";
outputMsg(tv, msg);
ToastUtils.showLong(msg);
}
@RequiresApi(api = Build.VERSION_CODES.O)
public void transcribeSample(TextView tv, File sampleFile) {
String msg = "";
msg = "transcribe file from :" + sampleFile.getAbsolutePath();
outputMsg(tv, msg);
Long start = System.currentTimeMillis();
float[] audioData = new float[0]; // 读取音频样本
try {
audioData = WaveEncoder.decodeWaveFile(sampleFile);
} catch (IOException e) {
e.printStackTrace();
return;
}
long end = System.currentTimeMillis();
msg = "decode wave file:" + (end - start) + "ms";
outputMsg(tv, msg);
start = System.currentTimeMillis();
List<WhisperSegment> transcription = null;
try {
//transcription = LocalWhisper.INSTANCE.transcribeData(audioData);
transcription = LocalWhisper.INSTANCE.transcribeDataWithTime(audioData);
} catch (ExecutionException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
end = System.currentTimeMillis();
if(transcription!=null){
ToastUtils.showLong(transcription.toString());
msg = "Transcript successful:" + (end - start) + "ms";
outputMsg(tv, msg);
outputMsg(tv, transcription.toString());
}else{
msg = "Transcript failed:" + (end - start) + "ms";
outputMsg(tv, msg);
}
}
private void outputMsg(TextView tv, String msg) {
log.info(msg);
if(tv!=null){
Aop.get(Handler.class).post(()->{ tv.append(msg + "\n");});
}
}
@RequiresApi(api = Build.VERSION_CODES.O)
public void release() {
//noting to do
}
}

View File

@ -0,0 +1,66 @@
package com.litongjava.whisper.android.java.single;
import android.app.Application;
import android.os.Build;
import android.os.Handler;
import androidx.annotation.RequiresApi;
import com.blankj.utilcode.util.ToastUtils;
import com.blankj.utilcode.util.Utils;
import com.litongjava.jfinal.aop.Aop;
import com.litongjava.whisper.android.java.bean.WhisperSegment;
import com.litongjava.whisper.android.java.utils.AssetUtils;
import com.whispercpp.java.whisper.WhisperContext;
import java.io.File;
import java.util.List;
import java.util.concurrent.ExecutionException;
@RequiresApi(api = Build.VERSION_CODES.O)
public enum LocalWhisper {
INSTANCE;
public static final String modelFilePath = "models/ggml-tiny.bin";
private WhisperContext whisperContext;
@RequiresApi(api = Build.VERSION_CODES.O)
LocalWhisper() {
Application context = Utils.getApp();
File filesDir = context.getFilesDir();
File modelFile = AssetUtils.copyFileIfNotExists(context, filesDir, modelFilePath);
String realModelFilePath = modelFile.getAbsolutePath();
whisperContext = WhisperContext.createContextFromFile(realModelFilePath);
}
public synchronized String transcribeData(float[] data) throws ExecutionException, InterruptedException {
if(whisperContext==null){
toastModelLoading();
return null;
}else{
return whisperContext.transcribeData(data);
}
}
private static void toastModelLoading() {
Aop.get(Handler.class).post(()->{
ToastUtils.showShort("please wait for model loading");
});
}
public List<WhisperSegment> transcribeDataWithTime(float[] audioData) throws ExecutionException, InterruptedException {
if(whisperContext==null){
toastModelLoading();
return null;
}else{
return whisperContext.transcribeDataWithTime(audioData);
}
}
public void init() {
//noting to do.but init
}
}

View File

@ -0,0 +1,44 @@
package com.litongjava.whisper.android.java.task;
import android.content.Context;
import android.os.Build;
import android.os.Handler;
import android.widget.TextView;
import com.blankj.utilcode.util.ThreadUtils;
import com.litongjava.jfinal.aop.Aop;
import com.litongjava.whisper.android.java.services.WhisperService;
import java.io.File;
public class LoadModelTask extends ThreadUtils.Task<Object> {
private final TextView tv;
public LoadModelTask(TextView tv) {
this.tv = tv;
}
@Override
public Object doInBackground() {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
Aop.get(WhisperService.class).loadModel(tv);
}else{
Aop.get(Handler.class).post(()->{
tv.append("not supported android devices");
});
}
return null;
}
@Override
public void onSuccess(Object result) {
}
@Override
public void onCancel() {
}
@Override
public void onFail(Throwable t) {
}
}

View File

@ -0,0 +1,44 @@
package com.litongjava.whisper.android.java.task;
import android.content.Context;
import android.os.Build;
import android.widget.TextView;
import com.blankj.utilcode.util.ThreadUtils;
import com.litongjava.jfinal.aop.Aop;
import com.litongjava.whisper.android.java.services.WhisperService;
import java.io.File;
public class TranscriptionTask extends ThreadUtils.Task<Object> {
private final TextView tv;
private final File sampleFile;
public TranscriptionTask(TextView tv, File sampleFile) {
this.tv = tv;
this.sampleFile = sampleFile;
}
@Override
public Object doInBackground() {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
Aop.get(WhisperService.class).transcribeSample(tv, sampleFile);
}else{
tv.append("not supported android devices");
}
return null;
}
@Override
public void onSuccess(Object result) {
}
@Override
public void onCancel() {
}
@Override
public void onFail(Throwable t) {
}
}

View File

@ -0,0 +1,91 @@
package com.litongjava.whisper.android.java.utils;
import android.content.Context;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
public class AssetUtils {
private static Logger log = LoggerFactory.getLogger(AssetUtils.class);
public static File copyFileIfNotExists(Context context, File distDir, String filename) {
File dstFile = new File(distDir, filename);
if (dstFile.exists()) {
return dstFile;
} else {
File parentFile = dstFile.getParentFile();
log.info("parentFile:{}", parentFile);
if (!parentFile.exists()) {
parentFile.mkdirs();
}
AssetUtils.copyFileFromAssets(context, filename, dstFile);
}
return dstFile;
}
public static void copyDirectoryFromAssets(Context appCtx, String srcDir, String dstDir) {
if (srcDir.isEmpty() || dstDir.isEmpty()) {
return;
}
try {
if (!new File(dstDir).exists()) {
new File(dstDir).mkdirs();
}
for (String fileName : appCtx.getAssets().list(srcDir)) {
String srcSubPath = srcDir + File.separator + fileName;
String dstSubPath = dstDir + File.separator + fileName;
if (new File(srcSubPath).isDirectory()) {
copyDirectoryFromAssets(appCtx, srcSubPath, dstSubPath);
} else {
copyFileFromAssets(appCtx, srcSubPath, dstSubPath);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
public static void copyFileFromAssets(Context appCtx, String srcPath, String dstPath) {
File dstFile = new File(dstPath);
copyFileFromAssets(appCtx, srcPath, dstFile);
}
public static void copyFileFromAssets(Context appCtx, String srcPath, File dstFile) {
if (srcPath.isEmpty()) {
return;
}
InputStream is = null;
OutputStream os = null;
try {
is = new BufferedInputStream(appCtx.getAssets().open(srcPath));
os = new BufferedOutputStream(new FileOutputStream(dstFile));
byte[] buffer = new byte[1024];
int length = 0;
while ((length = is.read(buffer)) != -1) {
os.write(buffer, 0, length);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
os.close();
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

View File

@ -0,0 +1,105 @@
package com.litongjava.whisper.android.java.utils;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.ShortBuffer;
public class WaveEncoder {
public static float[] decodeWaveFile(File file) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (FileInputStream fis = new FileInputStream(file)) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = fis.read(buffer)) != -1) {
baos.write(buffer, 0, bytesRead);
}
}
ByteBuffer byteBuffer = ByteBuffer.wrap(baos.toByteArray());
byteBuffer.order(ByteOrder.LITTLE_ENDIAN);
int channel = byteBuffer.getShort(22);
byteBuffer.position(44);
ShortBuffer shortBuffer = byteBuffer.asShortBuffer();
short[] shortArray = new short[shortBuffer.limit()];
shortBuffer.get(shortArray);
float[] output = new float[shortArray.length / channel];
for (int index = 0; index < output.length; index++) {
if (channel == 1) {
output[index] = Math.max(-1f, Math.min(1f, shortArray[index] / 32767.0f));
} else {
output[index] = Math.max(-1f, Math.min(1f, (shortArray[2 * index] + shortArray[2 * index + 1]) / 32767.0f / 2.0f));
}
}
return output;
}
public static void encodeWaveFile(File file, short[] data) throws IOException {
try (FileOutputStream fos = new FileOutputStream(file)) {
fos.write(headerBytes(data.length * 2));
ByteBuffer buffer = ByteBuffer.allocate(data.length * 2);
buffer.order(ByteOrder.LITTLE_ENDIAN);
buffer.asShortBuffer().put(data);
byte[] bytes = new byte[buffer.limit()];
buffer.get(bytes);
fos.write(bytes);
}
}
private static byte[] headerBytes(int totalLength) {
if (totalLength < 44)
throw new IllegalArgumentException("Total length must be at least 44 bytes");
ByteBuffer buffer = ByteBuffer.allocate(44);
buffer.order(ByteOrder.LITTLE_ENDIAN);
buffer.put((byte) 'R');
buffer.put((byte) 'I');
buffer.put((byte) 'F');
buffer.put((byte) 'F');
buffer.putInt(totalLength - 8);
buffer.put((byte) 'W');
buffer.put((byte) 'A');
buffer.put((byte) 'V');
buffer.put((byte) 'E');
buffer.put((byte) 'f');
buffer.put((byte) 'm');
buffer.put((byte) 't');
buffer.put((byte) ' ');
buffer.putInt(16);
buffer.putShort((short) 1);
buffer.putShort((short) 1);
buffer.putInt(16000);
buffer.putInt(32000);
buffer.putShort((short) 2);
buffer.putShort((short) 16);
buffer.put((byte) 'd');
buffer.put((byte) 'a');
buffer.put((byte) 't');
buffer.put((byte) 'a');
buffer.putInt(totalLength - 44);
buffer.position(0);
byte[] bytes = new byte[buffer.limit()];
buffer.get(bytes);
return bytes;
}
}

View File

@ -0,0 +1,121 @@
package com.whispercpp.java.whisper;
import android.os.Build;
import android.util.Log;
import androidx.annotation.RequiresApi;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class CpuInfo {
private static final String LOG_TAG = "WhisperCpuConfig";
private List<String> lines;
public CpuInfo(List<String> lines) {
this.lines = lines;
}
@RequiresApi(api = Build.VERSION_CODES.N)
public int getHighPerfCpuCount0() {
try {
return getHighPerfCpuCountByFrequencies();
} catch (Exception e) {
Log.d(LOG_TAG, "Couldn't read CPU frequencies", e);
return getHighPerfCpuCountByVariant();
}
}
@RequiresApi(api = Build.VERSION_CODES.N)
private int getHighPerfCpuCountByFrequencies() {
List<Integer> frequencies = getCpuValues("processor", line -> {
try {
return getMaxCpuFrequency(Integer.parseInt(line.trim()));
} catch (IOException e) {
e.printStackTrace();
}
return 0;
}
);
Log.d(LOG_TAG, "Binned cpu frequencies (frequency, count): " + binnedValues(frequencies));
return countDroppingMin(frequencies);
}
@RequiresApi(api = Build.VERSION_CODES.N)
private int getHighPerfCpuCountByVariant() {
List<Integer> variants = getCpuValues("CPU variant", line -> Integer.parseInt(line.trim().substring(line.indexOf("0x") + 2), 16));
Log.d(LOG_TAG, "Binned cpu variants (variant, count): " + binnedValues(variants));
return countKeepingMin(variants);
}
@RequiresApi(api = Build.VERSION_CODES.N)
private Map<Integer, Integer> binnedValues(List<Integer> values) {
Map<Integer, Integer> countMap = new HashMap<>();
for (int value : values) {
countMap.put(value, countMap.getOrDefault(value, 0) + 1);
}
return countMap;
}
@RequiresApi(api = Build.VERSION_CODES.N)
private List<Integer> getCpuValues(String property, Mapper mapper) {
List<Integer> values = new ArrayList<>();
for (String line : lines) {
if (line.startsWith(property)) {
values.add(mapper.map(line.substring(line.indexOf(':') + 1)));
}
}
values.sort(Integer::compareTo);
return values;
}
@RequiresApi(api = Build.VERSION_CODES.N)
private int countDroppingMin(List<Integer> values) {
int min = values.stream().mapToInt(i -> i).min().orElse(Integer.MAX_VALUE);
return (int) values.stream().filter(value -> value > min).count();
}
@RequiresApi(api = Build.VERSION_CODES.N)
private int countKeepingMin(List<Integer> values) {
int min = values.stream().mapToInt(i -> i).min().orElse(Integer.MAX_VALUE);
return (int) values.stream().filter(value -> value.equals(min)).count();
}
@RequiresApi(api = Build.VERSION_CODES.N)
public static int getHighPerfCpuCount() {
try {
return readCpuInfo().getHighPerfCpuCount0();
} catch (Exception e) {
Log.d(LOG_TAG, "Couldn't read CPU info", e);
return Math.max(Runtime.getRuntime().availableProcessors() - 4, 0);
}
}
private static CpuInfo readCpuInfo() throws IOException {
try (BufferedReader reader = new BufferedReader(new FileReader("/proc/cpuinfo"))) {
List<String> lines = new ArrayList<>();
String line;
while ((line = reader.readLine()) != null) {
lines.add(line);
}
return new CpuInfo(lines);
}
}
private static int getMaxCpuFrequency(int cpuIndex) throws IOException {
String path = "/sys/devices/system/cpu/cpu" + cpuIndex + "/cpufreq/cpuinfo_max_freq";
try (BufferedReader reader = new BufferedReader(new FileReader(path))) {
return Integer.parseInt(reader.readLine());
}
}
private interface Mapper {
int map(String line);
}
}

View File

@ -0,0 +1,138 @@
package com.whispercpp.java.whisper;
import android.content.res.AssetManager;
import android.os.Build;
import android.util.Log;
import androidx.annotation.RequiresApi;
import com.litongjava.whisper.android.java.bean.WhisperSegment;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class WhisperContext {
private static final String LOG_TAG = "LibWhisper";
private long ptr;
private final ExecutorService executorService;
private WhisperContext(long ptr) {
this.ptr = ptr;
this.executorService = Executors.newSingleThreadExecutor();
}
public String transcribeData(float[] data) throws ExecutionException, InterruptedException {
return executorService.submit(new Callable<String>() {
@RequiresApi(api = Build.VERSION_CODES.O)
@Override
public String call() throws Exception {
if (ptr == 0L) {
throw new IllegalStateException();
}
int numThreads = WhisperCpuConfig.getPreferredThreadCount();
Log.d(LOG_TAG, "Selecting " + numThreads + " threads");
StringBuilder result = new StringBuilder();
synchronized (this) {
WhisperLib.fullTranscribe(ptr, numThreads, data);
int textCount = WhisperLib.getTextSegmentCount(ptr);
for (int i = 0; i < textCount; i++) {
String sentence = WhisperLib.getTextSegment(ptr, i);
result.append(sentence);
}
}
return result.toString();
}
}).get();
}
public List<WhisperSegment> transcribeDataWithTime(float[] data) throws ExecutionException, InterruptedException {
return executorService.submit(new Callable<List<WhisperSegment>>() {
@RequiresApi(api = Build.VERSION_CODES.O)
@Override
public List<WhisperSegment> call() throws Exception {
if (ptr == 0L) {
throw new IllegalStateException();
}
int numThreads = WhisperCpuConfig.getPreferredThreadCount();
Log.d(LOG_TAG, "Selecting " + numThreads + " threads");
List<WhisperSegment> segments = new ArrayList<>();
synchronized (this) {
// StringBuilder result = new StringBuilder();
WhisperLib.fullTranscribe(ptr, numThreads, data);
int textCount = WhisperLib.getTextSegmentCount(ptr);
for (int i = 0; i < textCount; i++) {
long start = WhisperLib.getTextSegmentT0(ptr, i);
String sentence = WhisperLib.getTextSegment(ptr, i);
long end = WhisperLib.getTextSegmentT1(ptr, i);
// result.append();
segments.add(new WhisperSegment(start, end, sentence));
}
// return result.toString();
}
return segments;
}
}).get();
}
@RequiresApi(api = Build.VERSION_CODES.O)
public String benchMemory(int nthreads) throws ExecutionException, InterruptedException {
return executorService.submit(() -> WhisperLib.benchMemcpy(nthreads)).get();
}
@RequiresApi(api = Build.VERSION_CODES.O)
public String benchGgmlMulMat(int nthreads) throws ExecutionException, InterruptedException {
return executorService.submit(() -> WhisperLib.benchGgmlMulMat(nthreads)).get();
}
@RequiresApi(api = Build.VERSION_CODES.O)
public void release() throws ExecutionException, InterruptedException {
executorService.submit(() -> {
if (ptr != 0L) {
WhisperLib.freeContext(ptr);
ptr = 0;
}
}).get();
}
@RequiresApi(api = Build.VERSION_CODES.O)
public static WhisperContext createContextFromFile(String filePath) {
long ptr = WhisperLib.initContext(filePath);
if (ptr == 0L) {
throw new RuntimeException("Couldn't create context with path " + filePath);
}
return new WhisperContext(ptr);
}
@RequiresApi(api = Build.VERSION_CODES.O)
public static WhisperContext createContextFromInputStream(InputStream stream) {
long ptr = WhisperLib.initContextFromInputStream(stream);
if (ptr == 0L) {
throw new RuntimeException("Couldn't create context from input stream");
}
return new WhisperContext(ptr);
}
@RequiresApi(api = Build.VERSION_CODES.O)
public static WhisperContext createContextFromAsset(AssetManager assetManager, String assetPath) {
long ptr = WhisperLib.initContextFromAsset(assetManager, assetPath);
if (ptr == 0L) {
throw new RuntimeException("Couldn't create context from asset " + assetPath);
}
return new WhisperContext(ptr);
}
@RequiresApi(api = Build.VERSION_CODES.O)
public static String getSystemInfo() {
return WhisperLib.getSystemInfo();
}
}

View File

@ -0,0 +1,12 @@
package com.whispercpp.java.whisper;
import android.os.Build;
import androidx.annotation.RequiresApi;
public class WhisperCpuConfig {
@RequiresApi(api = Build.VERSION_CODES.N)
public static int getPreferredThreadCount() {
return Math.max(CpuInfo.getHighPerfCpuCount(), 2);
}
}

View File

@ -0,0 +1,75 @@
package com.whispercpp.java.whisper;
import android.content.res.AssetManager;
import android.os.Build;
import android.util.Log;
import androidx.annotation.RequiresApi;
import java.io.InputStream;
@RequiresApi(api = Build.VERSION_CODES.O)
public class WhisperLib {
private static final String LOG_TAG = "LibWhisper";
static {
Log.d(LOG_TAG, "Primary ABI: " + Build.SUPPORTED_ABIS[0]);
boolean loadVfpv4 = false;
boolean loadV8fp16 = false;
if (WhisperUtils.isArmEabiV7a()) {
String cpuInfo = WhisperUtils.cpuInfo();
if (cpuInfo != null) {
Log.d(LOG_TAG, "CPU info: " + cpuInfo);
if (cpuInfo.contains("vfpv4")) {
Log.d(LOG_TAG, "CPU supports vfpv4");
loadVfpv4 = true;
}
}
} else if (WhisperUtils.isArmEabiV8a()) {
String cpuInfo = WhisperUtils.cpuInfo();
if (cpuInfo != null) {
Log.d(LOG_TAG, "CPU info: " + cpuInfo);
if (cpuInfo.contains("fphp")) {
Log.d(LOG_TAG, "CPU supports fp16 arithmetic");
loadV8fp16 = true;
}
}
}
if (loadVfpv4) {
Log.d(LOG_TAG, "Loading libwhisper_vfpv4.so");
System.loadLibrary("whisper_vfpv4");
} else if (loadV8fp16) {
Log.d(LOG_TAG, "Loading libwhisper_v8fp16_va.so");
System.loadLibrary("whisper_v8fp16_va");
} else {
Log.d(LOG_TAG, "Loading libwhisper.so");
System.loadLibrary("whisper");
}
}
public static native long initContextFromInputStream(InputStream inputStream);
public static native long initContextFromAsset(AssetManager assetManager, String assetPath);
public static native long initContext(String modelPath);
public static native void freeContext(long contextPtr);
public static native void fullTranscribe(long contextPtr, int numThreads, float[] audioData);
public static native int getTextSegmentCount(long contextPtr);
public static native String getTextSegment(long contextPtr, int index);
public static native long getTextSegmentT0(long contextPtr, int index);
public static native long getTextSegmentT1(long contextPtr, int index);
public static native String getSystemInfo();
public static native String benchMemcpy(int nthread);
public static native String benchGgmlMulMat(int nthread);
}

View File

@ -0,0 +1,34 @@
package com.whispercpp.java.whisper;
import android.os.Build;
import android.util.Log;
import androidx.annotation.RequiresApi;
import java.io.File;
import java.nio.file.Path;
public class WhisperUtils {
private static final String LOG_TAG = "LibWhisper";
public static boolean isArmEabiV7a() {
return Build.SUPPORTED_ABIS[0].equals("armeabi-v7a");
}
public static boolean isArmEabiV8a() {
return Build.SUPPORTED_ABIS[0].equals("arm64-v8a");
}
@RequiresApi(api = Build.VERSION_CODES.O)
public static String cpuInfo() {
try {
Path path = new File("/proc/cpuinfo").toPath();
return new String(java.nio.file.Files.readAllBytes(path));
} catch (Exception e) {
Log.w(LOG_TAG, "Couldn't read /proc/cpuinfo", e);
return null;
}
}
}

View File

@ -0,0 +1,56 @@
cmake_minimum_required(VERSION 3.10)
project(whisper.cpp)
set(CMAKE_CXX_STANDARD 11)
set(WHISPER_LIB_DIR ${CMAKE_SOURCE_DIR}/../../../../../../../)
set(
SOURCE_FILES
${WHISPER_LIB_DIR}/ggml.c
${WHISPER_LIB_DIR}/ggml-alloc.c
${WHISPER_LIB_DIR}/ggml-backend.c
${WHISPER_LIB_DIR}/ggml-quants.c
${WHISPER_LIB_DIR}/whisper.cpp
${CMAKE_SOURCE_DIR}/jni.c
)
find_library(LOG_LIB log)
function(build_library target_name)
add_library(
${target_name}
SHARED
${SOURCE_FILES}
)
target_link_libraries(${target_name} ${LOG_LIB} android)
if (${target_name} STREQUAL "whisper_v8fp16_va")
target_compile_options(${target_name} PRIVATE -march=armv8.2-a+fp16)
elseif (${target_name} STREQUAL "whisper_vfpv4")
target_compile_options(${target_name} PRIVATE -mfpu=neon-vfpv4)
endif ()
if (NOT ${CMAKE_BUILD_TYPE} STREQUAL "Debug")
target_compile_options(${target_name} PRIVATE -O3)
target_compile_options(${target_name} PRIVATE -fvisibility=hidden -fvisibility-inlines-hidden)
target_compile_options(${target_name} PRIVATE -ffunction-sections -fdata-sections)
#target_link_options(${target_name} PRIVATE -Wl,--gc-sections)
#target_link_options(${target_name} PRIVATE -Wl,--exclude-libs,ALL)
#target_link_options(${target_name} PRIVATE -flto)
endif ()
endfunction()
build_library("whisper") # Default target
if (${ANDROID_ABI} STREQUAL "arm64-v8a")
build_library("whisper_v8fp16_va")
elseif (${ANDROID_ABI} STREQUAL "armeabi-v7a")
build_library("whisper_vfpv4")
endif ()
include_directories(${WHISPER_LIB_DIR})

View File

@ -0,0 +1,257 @@
#include <jni.h>
#include <android/asset_manager.h>
#include <android/asset_manager_jni.h>
#include <android/log.h>
#include <stdlib.h>
#include <sys/sysinfo.h>
#include <string.h>
#include "whisper.h"
#include "ggml.h"
#define UNUSED(x) (void)(x)
#define TAG "JNI"
#define LOGI(...) __android_log_print(ANDROID_LOG_INFO, TAG, __VA_ARGS__)
#define LOGW(...) __android_log_print(ANDROID_LOG_WARN, TAG, __VA_ARGS__)
static inline int min(int a, int b) {
return (a < b) ? a : b;
}
static inline int max(int a, int b) {
return (a > b) ? a : b;
}
struct input_stream_context {
size_t offset;
JNIEnv * env;
jobject thiz;
jobject input_stream;
jmethodID mid_available;
jmethodID mid_read;
};
size_t inputStreamRead(void * ctx, void * output, size_t read_size) {
struct input_stream_context* is = (struct input_stream_context*)ctx;
jint avail_size = (*is->env)->CallIntMethod(is->env, is->input_stream, is->mid_available);
jint size_to_copy = read_size < avail_size ? (jint)read_size : avail_size;
jbyteArray byte_array = (*is->env)->NewByteArray(is->env, size_to_copy);
jint n_read = (*is->env)->CallIntMethod(is->env, is->input_stream, is->mid_read, byte_array, 0, size_to_copy);
if (size_to_copy != read_size || size_to_copy != n_read) {
LOGI("Insufficient Read: Req=%zu, ToCopy=%d, Available=%d", read_size, size_to_copy, n_read);
}
jbyte* byte_array_elements = (*is->env)->GetByteArrayElements(is->env, byte_array, NULL);
memcpy(output, byte_array_elements, size_to_copy);
(*is->env)->ReleaseByteArrayElements(is->env, byte_array, byte_array_elements, JNI_ABORT);
(*is->env)->DeleteLocalRef(is->env, byte_array);
is->offset += size_to_copy;
return size_to_copy;
}
bool inputStreamEof(void * ctx) {
struct input_stream_context* is = (struct input_stream_context*)ctx;
jint result = (*is->env)->CallIntMethod(is->env, is->input_stream, is->mid_available);
return result <= 0;
}
void inputStreamClose(void * ctx) {
}
JNIEXPORT jlong JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_initContextFromInputStream(
JNIEnv *env, jobject thiz, jobject input_stream) {
UNUSED(thiz);
struct whisper_context *context = NULL;
struct whisper_model_loader loader = {};
struct input_stream_context inp_ctx = {};
inp_ctx.offset = 0;
inp_ctx.env = env;
inp_ctx.thiz = thiz;
inp_ctx.input_stream = input_stream;
jclass cls = (*env)->GetObjectClass(env, input_stream);
inp_ctx.mid_available = (*env)->GetMethodID(env, cls, "available", "()I");
inp_ctx.mid_read = (*env)->GetMethodID(env, cls, "read", "([BII)I");
loader.context = &inp_ctx;
loader.read = inputStreamRead;
loader.eof = inputStreamEof;
loader.close = inputStreamClose;
loader.eof(loader.context);
context = whisper_init(&loader);
return (jlong) context;
}
static size_t asset_read(void *ctx, void *output, size_t read_size) {
return AAsset_read((AAsset *) ctx, output, read_size);
}
static bool asset_is_eof(void *ctx) {
return AAsset_getRemainingLength64((AAsset *) ctx) <= 0;
}
static void asset_close(void *ctx) {
AAsset_close((AAsset *) ctx);
}
static struct whisper_context *whisper_init_from_asset(
JNIEnv *env,
jobject assetManager,
const char *asset_path
) {
LOGI("Loading model from asset '%s'\n", asset_path);
AAssetManager *asset_manager = AAssetManager_fromJava(env, assetManager);
AAsset *asset = AAssetManager_open(asset_manager, asset_path, AASSET_MODE_STREAMING);
if (!asset) {
LOGW("Failed to open '%s'\n", asset_path);
return NULL;
}
whisper_model_loader loader = {
.context = asset,
.read = &asset_read,
.eof = &asset_is_eof,
.close = &asset_close
};
return whisper_init(&loader);
}
JNIEXPORT jlong JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_initContextFromAsset(
JNIEnv *env, jobject thiz, jobject assetManager, jstring asset_path_str) {
UNUSED(thiz);
struct whisper_context *context = NULL;
const char *asset_path_chars = (*env)->GetStringUTFChars(env, asset_path_str, NULL);
context = whisper_init_from_asset(env, assetManager, asset_path_chars);
(*env)->ReleaseStringUTFChars(env, asset_path_str, asset_path_chars);
return (jlong) context;
}
JNIEXPORT jlong JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_initContext(
JNIEnv *env, jobject thiz, jstring model_path_str) {
UNUSED(thiz);
struct whisper_context *context = NULL;
const char *model_path_chars = (*env)->GetStringUTFChars(env, model_path_str, NULL);
context = whisper_init_from_file(model_path_chars);
(*env)->ReleaseStringUTFChars(env, model_path_str, model_path_chars);
return (jlong) context;
}
JNIEXPORT void JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_freeContext(
JNIEnv *env, jobject thiz, jlong context_ptr) {
UNUSED(env);
UNUSED(thiz);
struct whisper_context *context = (struct whisper_context *) context_ptr;
whisper_free(context);
}
JNIEXPORT void JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_fullTranscribe(
JNIEnv *env, jobject thiz, jlong context_ptr, jint num_threads, jfloatArray audio_data) {
UNUSED(thiz);
struct whisper_context *context = (struct whisper_context *) context_ptr;
jfloat *audio_data_arr = (*env)->GetFloatArrayElements(env, audio_data, NULL);
const jsize audio_data_length = (*env)->GetArrayLength(env, audio_data);
// The below adapted from the Objective-C iOS sample
struct whisper_full_params params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
params.print_realtime = true;
params.print_progress = false;
params.print_timestamps = true;
params.print_special = false;
params.translate = false;
params.language = "en";
params.n_threads = num_threads;
params.offset_ms = 0;
params.no_context = true;
params.single_segment = false;
whisper_reset_timings(context);
LOGI("About to run whisper_full");
if (whisper_full(context, params, audio_data_arr, audio_data_length) != 0) {
LOGI("Failed to run the model");
} else {
whisper_print_timings(context);
}
(*env)->ReleaseFloatArrayElements(env, audio_data, audio_data_arr, JNI_ABORT);
}
JNIEXPORT jint JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_getTextSegmentCount(
JNIEnv *env, jobject thiz, jlong context_ptr) {
UNUSED(env);
UNUSED(thiz);
struct whisper_context *context = (struct whisper_context *) context_ptr;
return whisper_full_n_segments(context);
}
JNIEXPORT jstring JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_getTextSegment(
JNIEnv *env, jobject thiz, jlong context_ptr, jint index) {
UNUSED(thiz);
struct whisper_context *context = (struct whisper_context *) context_ptr;
const char *text = whisper_full_get_segment_text(context, index);
jstring string = (*env)->NewStringUTF(env, text);
return string;
}
JNIEXPORT jlong JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_getTextSegmentT0(JNIEnv *env, jobject thiz,jlong context_ptr, jint index) {
UNUSED(thiz);
struct whisper_context *context = (struct whisper_context *) context_ptr;
const int64_t t0 = whisper_full_get_segment_t0(context, index);
return (jlong)t0;
}
JNIEXPORT jlong JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_getTextSegmentT1(JNIEnv *env, jobject thiz,jlong context_ptr, jint index) {
UNUSED(thiz);
struct whisper_context *context = (struct whisper_context *) context_ptr;
const int64_t t1 = whisper_full_get_segment_t1(context, index);
return (jlong)t1;
}
JNIEXPORT jstring JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_getSystemInfo(
JNIEnv *env, jobject thiz
) {
UNUSED(thiz);
const char *sysinfo = whisper_print_system_info();
jstring string = (*env)->NewStringUTF(env, sysinfo);
return string;
}
JNIEXPORT jstring JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_benchMemcpy(JNIEnv *env, jobject thiz,
jint n_threads) {
UNUSED(thiz);
const char *bench_ggml_memcpy = whisper_bench_memcpy_str(n_threads);
jstring string = (*env)->NewStringUTF(env, bench_ggml_memcpy);
}
JNIEXPORT jstring JNICALL
Java_com_whispercpp_java_whisper_WhisperLib_benchGgmlMulMat(JNIEnv *env, jobject thiz,
jint n_threads) {
UNUSED(thiz);
const char *bench_ggml_mul_mat = whisper_bench_ggml_mul_mat_str(n_threads);
jstring string = (*env)->NewStringUTF(env, bench_ggml_mul_mat);
}

Some files were not shown because too many files have changed in this diff Show More