hydai
b138ff2be3
cuda: fix vmm oom issue on NVIDIA AGX Orin (llama/4687)
...
Signed-off-by: hydai <hydai@secondstate.io>
2024-01-03 14:43:51 +02:00
Guillaume Wenzek
cf6f1e4181
ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639)
...
* add more int ops
* ggml_compute_forward_dup_bytes
* add tests
* PR comments
* tests : minor indentations
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-01-03 14:43:51 +02:00
Georgi Gerganov
620a223814
scripts : fix sync order + metal sed
2024-01-03 14:43:51 +02:00
Andreu Huguet
f39f9690ec
examples : fix WASM Stack Overflow ( #1713 )
...
Fix for problem:
"""
RuntimeError: Aborted(Stack overflow! Stack cookie has been overwritten at 0x12be2b10, expected hex dwords 0x89BACDFE and 0x2135467, but received 0x00000000 0x00000000)
"""
That appears when executing the WASM example with the newer versions.
2024-01-02 16:50:04 +00:00
bobqianic
f9ca90256b
docker : fix the publishing of the CUDA Docker image ( #1704 )
2023-12-30 23:12:31 +02:00
Georgi Gerganov
2623640cd6
scripts : do not sync commits from this repo
2023-12-29 15:03:08 +02:00
Tamotsu Takahashi
d87de61ae6
ci : build with CLBlast + ggml-opencl use GGML_API ( #1576 )
...
* Build with CLBlast
* Declare GGML_API
After rebasing, examples/talk-llama failed:
"D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) ->
"D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) ->
(Link target) ->
llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context *,void (__cdecl*)(float,void *),void *,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
2023-12-29 12:23:27 +02:00
bobqianic
f5f485f899
whisper : replace tensor->n_dims
with ggml_n_dims(tensor)
( #1694 )
2023-12-29 11:38:35 +02:00
Georgi Gerganov
e77b27c331
sync : ggml (VMM, sync-ggml-am, dotprod ARM fixes, CUDA fixes) ( #1691 )
...
* scripts : add sync-ggml-am.sh
* sync : ggml (VMM, ARM dot prod fix, etc.)
* build : fix CUDA build
* ggml : fix some mul mat cases + add tests for src1 F16
dbd02958fa
2023-12-29 11:30:47 +02:00
Dimo
a5cc3dc8a2
download : fix large q5 model name ( #1695 )
...
fixed typo in large-v3-q5-0 model name to match HF link
2023-12-29 11:14:32 +02:00
bobqianic
37a709f655
whisper : Replace WHISPER_PRINT_DEBUG with WHISPER_LOG_DEBUG ( #1681 )
2023-12-23 12:02:58 +00:00
Georgi Gerganov
3a5302108d
sync : ggml (ggml_scale, ggml_row_size, etc.) ( #1677 )
...
* sync : ggml
* sync : llama.cpp
* talk-llama : fix obsolete param
* ggml-alloc : fix ggml_tallocr_is_own
* talk.wasm : update to new ggml
* ggml : fix type punning in ggml_scale
* ggml : cuda jetson + arm quants warnings
2023-12-22 17:53:39 +02:00
Chaoqun
d2ee117a0a
docker : Dockerize whisper.cpp ( #1674 )
...
* build: add dockerfile for ci
* ci: add action to build/push docker image
* fix: lowercase repository to fix ci
* ci: update cuBLAS flag
* build: install curl and ffmped in image
* docs: add docker section
* fix: improve args check when download model
2023-12-22 11:16:02 +00:00
bobqianic
db8ccdb850
CI : Add coverage for talk-llama when WHISPER_CUBLAS=1 ( #1672 )
2023-12-21 22:39:46 +00:00
bobqianic
d2419030b0
examples : Revert CMakeLists.txt for talk-llama ( #1669 )
2023-12-21 22:48:52 +02:00
bobqianic
8986690c2a
cmake : set default CUDA architectures ( #1667 )
2023-12-21 15:44:04 +02:00
Alfredo Montesinos
9286d3f584
bench.py : add different large models ( #1655 )
...
Amend different large v1,v2,v3 models to benchmark.
2023-12-19 12:40:14 +02:00
Georgi Gerganov
940de9dbe9
wchess : update README.md
2023-12-14 22:00:47 +02:00
Georgi Gerganov
88112c8afb
release : v1.5.2
2023-12-14 17:56:39 +02:00
Georgi Gerganov
375585c07c
wchess : update readme
2023-12-14 17:51:14 +02:00
fraxy-v
fd99ece8e3
wchess : whisper assisted chess ( #1595 )
...
* wchess: whisper assisted chess
* wchess: fix allowed moves in check
* wchess: touchstart, touchend events
* wchess: css, disabled button
* wchess : html touches
* wchess : minor fixes and code style
* wchess : bump encoder context to 1280
* wchess : index.html
* wchess : fix CI warnings
* wchess : add array header
* wchess : build static library
* wchess : display grammar
* wchess : update UX
* wchess : add comment
* wchess : add README
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-14 15:58:26 +02:00
Georgi Gerganov
8171e621fc
sync : ggml (Metal fixes, new ops, tests) ( #1633 )
...
* sync : ggml (Metal fixes, new ops, tests)
* cuda : fix bin bcast when src1 and dst have different types
2023-12-13 21:55:03 +02:00
Kreijstal
ec03661b20
cmake : target windows 8 or above for prefetchVirtualMemory in llama-talk ( #1617 )
...
Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)
2023-12-12 11:35:00 +00:00
Kreijstal
6335933a5b
cmake : Fix bug in httplib.h for mingw ( #1615 )
...
Fix bug in httlib.h for mingw, please see https://github.com/yhirose/cpp-httplib/issues/1669
2023-12-10 17:47:52 +00:00
Finn Voorhees
885b5563d0
metal : fix ggml_metal_log
vargs ( #1606 )
2023-12-08 13:50:50 +02:00
Georgi Gerganov
9521ba6801
whisper.objc : disable timestamps for real-time transcription
2023-12-08 13:43:37 +02:00
Georgi Gerganov
29511d33c7
whisper : more debug messages + fix fallback logic
2023-12-08 13:43:12 +02:00
Georgi Gerganov
7bc4d22337
metal : fix soft_max kernel src1 argument ( #1602 )
2023-12-08 13:39:32 +02:00
Georgi Gerganov
afce6fa113
sync : ggml (new ops, new backend, etc) ( #1602 )
...
* sync : ggml (new ops, new backend, etc)
* whisper : remove obsolete broadcasting code
* ggml : remove backend self-registers + fix ggml_concat + n_task logic
* metal : fix assert
* metal : print resource path
* whisper : fix bug if metal init fails
2023-12-07 22:27:19 +02:00
Oleg Sidorov
3163090d89
server : pass max-len argument to the server ( #1574 )
...
This commit fixes the missing parameter binding for max-len between the input
arguments and wparams.
2023-12-05 23:01:45 +02:00
Finn Voorhees
f0efd0202d
ios : Remove #if arch(arm)
check for using Metal ( #1561 )
2023-12-05 01:14:26 +00:00
Digipom
3c28d1a571
ggml : Fix 32-bit compiler warning ( #1575 )
...
Warning about %lu on 32-bit targets. Updated to %zu.
2023-12-03 14:15:28 +00:00
Georgi Gerganov
e369243ebd
ggml : re-enable blas for src0 != F32 ( #1583 )
2023-12-01 23:57:52 +02:00
Aleksander Andrzejewski
a0ec3fac54
Server : Add support for .vtt format to Whisper server ( #1578 )
...
- The code comes from examples/main
- The output mimetype is set to text/vtt
Example usage:
```shell
curl 127.0.0.1:8080/inference \
-H "Content-Type: multipart/form-data" \
-F file="@samples/jfk.wav" \
-F temperature="0.2" \
-F response-format="vtt"
```
2023-11-30 23:44:26 +00:00
Oleg Sidorov
6559b538e5
server : backport .srt output format ( #1565 )
...
This commit adds a support of .srt format to Whisper server. The code is
effectively backported from examples/main. The output mimetype is set to
application/x-subrip as per https://en.wikipedia.org/wiki/SubRip .
Example usage:
curl 127.0.0.1:8080/inference \
-H "Content-Type: multipart/form-data" \
-F file="@<file-path>" \
-F temperature="0.2" \
-F response-format="srt"
2023-11-28 15:42:58 +02:00
Gregor Jasny
73d5005880
cmake : install required ggml.h header ( #1568 )
2023-11-28 15:41:49 +02:00
Kasumi
6b094b6dfe
server : set default CORS headers to allow all ( #1567 )
2023-11-28 11:55:20 +02:00
Hang
641f2f4282
readme : update help ( #1560 )
2023-11-27 12:04:08 +02:00
bobqianic
bfacd9f8ce
CI : Add CUDA 11.8.0 support ( #1554 )
...
* try to fix cublas build in CI
* add multiple cuda-toolkit version
* Update build.yml
* Disable CUDA-toolkit 10.2.89
2023-11-27 12:03:16 +02:00
bobqianic
f52e74d4dc
CI : Rectify the Clang-Related workflow issues ( #1551 )
...
* fix bugs in workflow
* fix missing clang in workflow
* Update build.yml
2023-11-27 11:35:37 +02:00
Ismatulla Mansurov
23c21e92eb
server : automatically convert audio on the server ( #1539 )
...
* server : automatically convert audio on the server
* server : remove rebundant comments
* server : automatic conversion refactor
* server : update server readme
* server : remove unnecessary comments and tabs
* server : put back remove calling
* server : apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : check ffmpeg before the server lunch
* server : fix indentation
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server : fix function typo calling
* server : fix function typo calling
* server : add warning in readme
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-27 11:28:34 +02:00
Georgi Gerganov
447d49530c
whisper : remove trailing whitespaces
2023-11-24 13:13:21 +02:00
Georgi Gerganov
9d6ebd877c
release : v1.5.1
2023-11-24 12:41:55 +02:00
Georgi Gerganov
0ba365f958
metal : add backend function to check device family support ( #1547 )
2023-11-24 12:37:08 +02:00
Georgi Gerganov
010c8ec3ab
cuda : sync some minor stuff from llama.cpp ( #1548 )
2023-11-24 12:36:21 +02:00
Georgi Gerganov
ffdb5c4735
whisper : fix typo
2023-11-24 09:45:10 +02:00
ecneladis
a5881d619c
server : add --print-realtime param ( #1541 )
...
* server : add --print-realtime param
* Fix duplicate realtime output
2023-11-24 09:35:02 +02:00
bradmit
34f70b3a56
whisper : add whisper_lang_str_full ( #1546 )
...
* Update whisper.h
add whisper_lang_fullstr to retrieve the full language name
* Update whisper.cpp
add whisper_lang_fullstr to return the full language name
* fullstr -> str_full
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-24 09:33:13 +02:00
Okabintaro
8328d1900f
fix(server): typo in temperature parameter ( #1545 )
...
Also fixed another typo in comments.
2023-11-23 20:59:36 +02:00
sandrohanea
d2bd5f0bdc
metal : fix build ( #1544 )
2023-11-23 20:20:53 +02:00