Commit Graph

52 Commits

Author SHA1 Message Date
1f50a7d29f sync : llama.cpp 2024-01-17 21:23:33 +02:00
f6614155e4 talk-llama : optional wake-up command and audio confirmation (#1765)
* talk-llama: add optional wake-word detection from command

* talk-llama: add optional audio confirmation before generating answer

* talk-llama: fix small formatting issue in output

* talk-llama.cpp: fix Windows build
2024-01-16 15:52:01 +02:00
6ebba525f1 talk-llama : sync llama.cpp 2024-01-14 18:08:20 +02:00
2a5874441d talk-llama : llama.cpp 2024-01-14 11:06:28 +02:00
f001a3b7b6 talk-llama : sync llama.cpp 2024-01-14 00:13:17 +02:00
db078a9ba8 talk-llama : add optional CLI arg to set the bot name (#1764) 2024-01-13 20:51:35 +02:00
40ae0962f4 talk-llama : sync llama.cpp 2024-01-12 22:04:51 +02:00
00b7a4be02 talk-llama : sync llama.cpp 2024-01-11 22:10:10 +02:00
bcc1658cd0 talk-llama : add optional Piper TTS support (#1749)
Add commented-out command to optionally use Piper (https://github.com/rhasspy/piper) as text-to-speech solution for the talk-llama example. Piper voices sound almost like real people which is a big improvement (e.g.) from something like espeak.
2024-01-10 16:15:28 +02:00
3b8c2dff57 talk-llama : sync latest llama.cpp 2024-01-06 17:22:57 +02:00
d87de61ae6 ci : build with CLBlast + ggml-opencl use GGML_API (#1576)
* Build with CLBlast

* Declare GGML_API

After rebasing, examples/talk-llama failed:

"D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) ->
"D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) ->
(Link target) ->
  llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
  llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context *,void (__cdecl*)(float,void *),void *,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
  D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]
2023-12-29 12:23:27 +02:00
3a5302108d sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677)
* sync : ggml

* sync : llama.cpp

* talk-llama : fix obsolete param

* ggml-alloc : fix ggml_tallocr_is_own

* talk.wasm : update to new ggml

* ggml : fix type punning in ggml_scale

* ggml : cuda jetson + arm quants warnings
2023-12-22 17:53:39 +02:00
d2419030b0 examples : Revert CMakeLists.txt for talk-llama (#1669) 2023-12-21 22:48:52 +02:00
ec03661b20 cmake : target windows 8 or above for prefetchVirtualMemory in llama-talk (#1617)
Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)
2023-12-12 11:35:00 +00:00
7883d1cae4 talk-llama : improve quote and backtick handling (#1364)
* ISSUE-1329: replace " with ' so it doesn't try to execute code in backticks.

* Typo

* Update to keep possessives in the output

Closes the ' then puts a ' in quotes then reopens the ' to escape the ' characters.
2023-11-16 10:34:05 +02:00
ccc85b4ff8 talk-llama : enable GPU by default 2023-11-15 21:33:00 +02:00
c23598e4ca talk-llama : add n_gpu_layers parameter (#1475) 2023-11-13 10:04:16 +02:00
37947203e6 talk-llama : add language auto detect (#1467)
* Add '-l auto' to talk-llama example

* Update examples/talk-llama/talk-llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-09 19:21:44 +02:00
b629d2d4fe cmake : fix talk-llama build 2023-11-07 11:03:21 +02:00
75dc800d21 talk-llama : fix n_gpu_layers usage again (#1442) 2023-11-07 10:51:27 +02:00
3989b29a9b examples : fix n_gpu_layers usage in talk-llama (#1441) 2023-11-07 01:36:23 +00:00
0463028bc2 whisper : add context param to disable gpu (#1293)
* whisper : check state->ctx_metal not null

* whisper : add whisper_context_params { use_gpu }

* whisper : new API with params & deprecate old API

* examples : use no-gpu param && whisper_init_from_file_with_params

* whisper.objc : enable metal & disable on simulator

* whisper.swiftui, metal : enable metal & support load default.metallib

* whisper.android : use new API

* bindings : use new API

* addon.node : fix build & test

* bindings : updata java binding

* bindings : add missing whisper_context_default_params_by_ref WHISPER_API for java

* metal : use SWIFTPM_MODULE_BUNDLE for GGML_SWIFT and reuse library load

* metal : move bundle var into block

* metal : use SWIFT_PACKAGE instead of GGML_SWIFT

* style : minor updates

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-11-06 11:04:24 +02:00
f96e1c5b78 sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.) (#1422)
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)

* metal : allow env metal variable to override resource path (#1415)

* Allow env variable to override resource path

* Update ggml-metal.m

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* sync : restore common / main from `master`

* sync : restore whisper from `master`

* talk-llama : update to latest llama.cpp

* ruby : fix build

* ggml : fix 32-bit ARM build

* ggml : fix MIN / MAX macro collisions + update ios bindings

* ggml : fix ifdefs and MIN / MAX again

* exampels : fix Obj-C and Swift examples

* ggml : fix 32-bit ARM compatibility

* ggml : one more attempt to fix 32-bit ARM compat

* whisper : fix support for larger graphs

---------

Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
2023-11-03 21:35:05 +02:00
d445098c8f talk-llama : move up-to-date demo to top (#1417) 2023-11-02 18:50:13 +02:00
74de25158e talk-llama : add an up-to-date demo video 2023-11-02 15:28:48 +02:00
1ca4041b86 talk-llama : update to latest llama.cpp 2023-09-15 20:06:31 +03:00
93935980f8 whisper : Metal and ggml-alloc support (#1270)
* metal : init

* whisper : factor out graph builds

* whisper : allocate encoder and decoder using ggml-alloc

* whisper : ggml-alloc is now supported

* whisper : CoreML support ggml-alloc

* build : fix ggml-alloc

* ios : update submodule

* extra : update sync-ggml.sh script to also sync ggml-alloc

* ci : see if this is causing the crash

* whisper : refactor ggml-alloc init

* whisper.android : try to fix build

* whisper : initial Metal version

* ci : try to debug vmem issue

* metal : decoder works on GPU!

* metal : add multi-decoder support

* ggml : fix ggml_nbytes (probably temp solution)

* metal : run "cross" step on the GPU

* whisper : remove ggml_repeat in the encoder

* whisper : offload the Encoder to Metal

* ggml : use simpler ggml_bytes() implementation

* ggml-alloc : try to make CI happy by reducing vram to 128GB

* whisper : add whisper_allocr to wrap ggml_allocr

* whisper : factor out alloc init in a function

* cmake : update to support Metal build

* whisper : add <functional> header

* objc : fix build (no Metal yet)

* ios : add Metal support

* swiftui : fix build

* metal : speed-up KQ multiplication

* metal : sync latest llama.cpp kernels

* readme : add Metal info

* ios : update submodule

* coreml : add code to toggle Core ML config (CPU, ANE, GPU)

* bench : fix timings by running a pre-heat

* bench : start benching the decoder

* whisper : add ggml_mul_mat_pad

* bench : fix uninitialized vars

* whisper : add comment for disabling mul-mat padding

* whisper : add description of ggml_mul_mat_pad

* whisper : clean-up ggml_mul_mat_pad

* metal : remove the "concurrent" flag

* bench : variable n_past

* ios : update SPM package
2023-09-15 12:18:18 +03:00
b55b505690 build : do not use _GNU_SOURCE gratuitously (#1129)
* Do not use _GNU_SOURCE gratuitously.

What is needed to build whisper.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions,
plus some stuff from BSD that is not specified in POSIX.1.

Well, that was true until NUMA support was added recently in ggml,
so enable GNU libc extensions for Linux builds to cover that.

There is no need to penalize musl libc which simply follows standards.

Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs.

It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.

* examples : include SDL headers before other headers

Avoid macOS build error when _DARWIN_C_SOURCE is not defined, brought by
SDL2 relying on Darwin extension memset_pattern4/8/16 (from string.h).

* make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK

* make : use BSD-specific FTMs to enable alloca on BSDs

* make : fix OpenBSD build by exposing newer POSIX definitions

* cmake : follow recent FTM improvements from Makefile
2023-09-07 12:36:14 +03:00
2818de21ff examples : fix build + compile warnings (close #1256) 2023-09-07 12:33:12 +03:00
fdf58a6668 talk-llama : fix new rope interface 2023-07-03 19:24:01 +03:00
8ba42095c5 Revert "ggml : do not use _GNU_SOURCE gratuitously (#1027)"
This reverts commit 3f7a03ebe3.
2023-07-02 21:53:52 +03:00
85ed71aaec talk-llama : fix build on macOS (#1062)
* talk-llama : use posix_madvise() instead of madvise() derived from BSD

sed -i 's,\<madvise\>,posix_&,g;s,\<MADV_,POSIX_&,g' examples/talk-llama/llama-util.h

* make : enable Darwin extensions for macOS builds

This is an attempt at fixing macOS build error coming from the fact that
RLIMIT_MEMLOCK define is not available there without Darwin extensions.
2023-06-28 22:34:50 +03:00
3f7a03ebe3 ggml : do not use _GNU_SOURCE gratuitously (#1027)
* Do not use _GNU_SOURCE gratuitously.

What is needed to build whisper.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions.

There is no need to penalize musl libc which simply follows standards.

Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs.

It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.

* examples : include SDL headers before other headers

This is an attempt at fixing macOS build error coming from SDL2 relying
on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.
2023-06-25 16:34:30 +03:00
62642bb61c talk-llama : fix build after ggml sync (#1049)
sed -i 's,GGML_BACKEND_CUDA,GGML_BACKEND_GPU,g' examples/talk-llama/llama.cpp
2023-06-25 16:13:50 +03:00
5b9e59bc07 speak scripts for Windows 2023-06-01 22:45:00 +10:00
5e2b3407ef examples : update elevenlabs scripts to use official python API (#837)
* Update elevenlabs example to use ufficial python API

* Update elevenlabs example to use official python API
2023-05-24 21:11:01 +03:00
77eab3fbfe talk-llama : sync latest llama.cpp (close #922, close #954) 2023-05-23 14:04:39 +03:00
0cb820e0f9 talk-llama : fix build + sync latest llama.cpp 2023-05-14 18:46:42 +03:00
4e4d00c67a talk-llama : only copy used KV cache in get / set state (#890)
---------

Co-authored-by: ejones <evan.q.jones@gmail.com>
2023-05-08 20:59:21 +03:00
0bf680fea2 talk-llama : fix session prompt load (#854) 2023-05-02 20:05:27 +03:00
be5911a9f3 talk-llama : add --session support (#845)
* feat: adding session support

* readme: adding --session info in examples/talk-llama

* llama: adding session fixes

* readme: updating session doc

* talk-llama: update the value of need_to_save_session to true in order to save the session in the subsequent interaction

* talk-llama: adding missing function which updates session_tokens
2023-05-01 20:18:10 +03:00
794b162a46 whisper : add integer quantization support (#540)
* whisper : add integer quantization support

* examples : add common-ggml + prepare to add "quantize" tool

* whisper : quantization tool ready

* whisper : fix F32 support

* whisper : try to fix shared lib linkage

* wasm : update quantized models to Q5

* bench.wasm : remove "medium" button

* bench.wasm : fix custom model button

* ggml : add Q5_0 and Q5_1 WASM SIMD

* wasm : add quantized models to all WASM examples

* wasm : bump DB version number to 2

* talk-llama : update example to latest llama.cpp

* node : increase test timeout to 10s

* readme : add information for model quantization

* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
5fd1bdd7fc whisper : add GPU support via cuBLAS (#834)
* make : add WHISPER_CUBLAS

* make : fix CUBLAS build

* whisper : disable Flash Attention + adjust memory buffers

* whisper : remove old commented code

* readme : add cuBLAS instructions

* cmake : add WHISPER_CUBLAS option

* gitignore : ignore build-cublas
2023-04-30 12:14:33 +03:00
001083a769 talk, talk-llama : add basic example script for eleven-labs tts (#728) 2023-04-14 19:53:58 +03:00
78548dc03f talk-llama : correct default speak.sh path (#720)
There is `speak.sh` file in `./examples/talk-llama` as described in README.
However `./examples/talk/speak.sh` is used in `talk-llama.cpp`, this commit corrects that.
2023-04-14 19:36:09 +03:00
114df388fe talk-llama : increase context to 2048 2023-04-10 23:09:15 +03:00
ea36831459 talk-llama : update to latest llama.cpp (improved performance) 2023-04-10 22:59:13 +03:00
5e6e2187a3 talk-llama : fixing usage message for talk-llama (#687)
"-ml" instead of "-mg" for specifying the llama file
2023-03-30 00:10:20 +03:00
a47e812a54 talk-llama : add alpaca support (#668) 2023-03-29 23:01:14 +03:00
e5c197d8aa talk-llama : add discussion link 2023-03-28 10:11:34 +03:00