Commit Graph

396 Commits

Author SHA1 Message Date
Georgi Gerganov
da9809f243 talk-llama : sync llama.cpp 2024-08-28 13:22:20 +03:00
Justine Tunney
7f78675008
examples : use colorblind friendly TTY color scheme (#2360)
This change updates the -pc flag, so that a new xterm256 color scheme is
used. This color scheme is believed to be better for three reasons:

1. It should be friendlier to the colorblind. The scheme was designed by
   Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it
   since 2017, so it's already popular in the machine learning community

2. It should appear to be the same colors as before to people who aren't
   i.e. it's still a red-green spectrum like before but lightly modified

3. It is readable in both white and black background terminals. The neon
   colors before were probably a bit too intense for white backgrounds.
2024-08-20 10:49:10 +03:00
Georgi Gerganov
58323bf8ed build : fix aarch64 (#0) 2024-08-08 22:48:46 +03:00
Georgi Gerganov
22058f2dbc talk-llama : sync llama.cpp 2024-08-08 22:48:46 +03:00
Georgi Gerganov
c7ea4fd235 common : handle new quant types (ggml/0) 2024-08-08 22:48:46 +03:00
Georgi Gerganov
dbf9c15e30 talk-llama : sync llama.cpp 2024-07-08 14:53:55 +03:00
Georgi Gerganov
d3f6c34976 examples : fix compile warnings [no ci] (#0) 2024-07-08 14:53:55 +03:00
Emmanuel Schmidbauer
bec9836849
server : add inference path to make OAI API compatible (#2270) 2024-07-08 14:24:58 +03:00
Georgi Gerganov
4a62efbb95
cmake : minor fixes 2024-06-26 21:42:39 +03:00
Georgi Gerganov
dc8cc2dd6f
whisper : disable CUDA mel + fix FFMPEG 2024-06-26 20:11:38 +03:00
Georgi Gerganov
e30c679928
whisper : reorganize source code + improve CMake (#2256)
* scripts : update sync [no ci]

* files : reorganize [no ci]

* sync : llama.cpp

* cmake : link math library

* cmake : build normal ggml library

* files : move headers to include

* objc : fix path to ggml-metal.h

* ci : fix WHISPER_CUDA -> GGML_CUDA

* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00
Georgi Gerganov
e293f17d34
talk-llama : sync llama.cpp 2024-06-18 09:45:37 +03:00
slaren
de29b193f6 move BLAS to a separate backend (cont) (llama/6210)
ggml-ci
2024-06-18 09:39:40 +03:00
Georgi Gerganov
3b1ac03828 ggml : remove OpenCL (#0) 2024-06-16 18:19:48 +03:00
Georgi Gerganov
061eeb9f61 talk-llama : sync llama.cpp 2024-06-16 18:19:48 +03:00
Borislav Stanimirov
af5833e298
whisper : remove speed_up and phase_vocoder* functions (#2198)
* whisper : fix cast warning

* whisper : remove phase_vocoder functions, ref #2195

* whisper : remove speed_up from whisper_full_params, closes #2195
2024-05-31 11:37:29 +03:00
Daniel Valdivia
a7dc2aab16
server : fix typo (#2181)
A simple comment typo, PR can be dismissed
2024-05-25 10:46:22 +03:00
William Tambellini
1b51fdf170
examples : add support for decoding input with ffmpeg (Linux) (#2133)
- search for ffmpeg libs/headers at cmake time
- added ffmpeg-transcode.cpp into libcommon if ffmpeg on
- hooked ffmpeg trancoding in common read_wav(...)
- passed test:
./main -m ggml-base.en.bin -f samples/jfk.mp3
2024-05-21 18:31:41 +03:00
Pedro Probst
adee3f9c1f
node : add flash_attn param (#2170) 2024-05-20 09:08:48 +03:00
Georgi Gerganov
7094ea5e75
whisper : use flash attention (#2152)
* whisper : use flash attention in the encoder

* whisper : add kv_pad

* whisper : remove extra backend instance (huh?)

* whisper : use FA for cross-attention

* whisper : use FA for self-attention

* whisper : simplify encoder FA

* whisper : add flash_attn runtime parameter

* scripts : add bench log

* scripts : add M1 Pro bench log
2024-05-15 09:38:19 +03:00
petterreinholdtsen
9d5771ae43
talk-llama : reject runs without required arguments (#2153)
* Extended talk-llama example to reject runs without required arguments.

Print warning and exit if models are not specified on the command line.

* Update examples/talk-llama/talk-llama.cpp

* Update examples/talk-llama/talk-llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-14 21:32:41 +03:00
Georgi Gerganov
4ef8d9f44e
server : return utf-8 (#2138) 2024-05-13 15:33:46 +03:00
Pedro Probst
3928dbd206
node : add audio_ctx and audio buffer params (#2123)
* node : add audio_ctx param

* node : support passing audio buffer directly

* node : parse audio_ctx in index.js

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-13 15:22:23 +03:00
valVk
30f73109b8
node : add additional params (#2000)
* Add additional params to addon.node

* Add comma_in_time as parameter

* Fix tests
2024-05-13 15:15:43 +03:00
Mark Karpelès
17fa62d3d3
js : remove un-needed request header from fetchRemote (#2119) 2024-05-13 15:13:19 +03:00
Daniel Ziegenberg
0bb05b113d
main : dont print timings with --no-prints (#2108)
Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>
2024-05-13 15:00:19 +03:00
Daniel Ziegenberg
f141b2b938
main : add options for temperature control (#2088)
Add two options:

```
-tp,       --temperature N     [0.00   ] The sampling temperature, between 0 and 1
-tpi,      --temperature-inc N [0.20   ] The increment of temperature, between 0 and 1
```

The sampling temperature, between 0 and 1. Higher values like 0.8 will
make the output more random, while lower values like 0.2 will make it
more focused and deterministic. If set to 0, the model will use log
probability to automatically increase the temperature until certain
thresholds are hit.

Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>
2024-05-13 14:59:44 +03:00
zhangjixiong
e93081f83f
whisper.android : update example, add field to print timestamp (#2072) 2024-05-13 14:30:03 +03:00
Xingchen Song(宋星辰)
b6bbce4ae9
cmake : fix json INTERFACE library (#2069) 2024-05-13 14:29:39 +03:00
mashizora
7705dc52da
main : fix double quote escaping in csv output (#2090) 2024-05-13 11:55:32 +03:00
Georgi Gerganov
3fa7d29876 talk-llama : sync llama.cpp 2024-05-13 11:02:26 +03:00
Georgi Gerganov
accada542a ggml : resolve merge (ggml/0)
ggml-ci
2024-05-13 11:02:26 +03:00
Pedro Probst
58210d6a76
examples : fix node compilation (#2115)
* node : fix compilation and update examples

* node : fix readme

* Update addon.node test
2024-05-02 22:52:55 +01:00
Georgi Gerganov
b0c3cbf2e8
main : pass nullptr when regex is empty (#2070) 2024-04-17 12:23:47 +03:00
Emmanuel Schmidbauer
9fab28135c
server : add dtw (#2044)
* server.cpp: add dtw

* Update examples/server/server.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-15 22:16:58 +03:00
Pedro Probst
1b5439a6c2
node : support no timestamps (#2048)
* fix: node: do not compute timestamps if you do not need them

* feat: add no_timestamps parameter to node addon
2024-04-15 20:03:34 +03:00
Kendrick Taylor
5c554c04ff
whisper.nvim : fix missing reference to "model" variable (#2049) 2024-04-15 19:41:28 +03:00
Ikko Eltociear Ashimine
c383f091a1
whisper : update grammar-parser.cpp (#2058)
preceeding -> preceding
2024-04-15 19:40:27 +03:00
ulatekh
c15b4cda7d
common : fix file-handle leak in read_wav() (#2026)
Now it cleans up in case of error.
2024-04-09 18:34:34 +03:00
Rotem Dan
d3cfb6ca2b
main : set stdin to binary mode on Windows (#2025) 2024-04-09 18:33:32 +03:00
ulatekh
671b4bde6c
main : allow a response-file as the sole parameter (#2019)
* The "main" example now allows a response-file as the sole parameter.

A response-file is a text file with command-line parameters, one per line.
Prefix the name of the response-file with "@" to identify it as such.
It's used under MS Windows to work around command-line length limits.
It may be useful under other platforms to simplify character-escaping.

* minor : style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-09 18:31:16 +03:00
ulatekh
c8eeb93a6a
whisper : suppress tokens with a regex (#1997)
* Allow a regular expression to describe tokens to suppress.

Example: --suppress-tokens-re "[,\.]|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens.

Technique inspired by https://github.com/openai/whisper/discussions/1041

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Blind change to fix Java test.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-09 18:27:28 +03:00
ulatekh
319fe5146e
cmake : create solution folders (#2004)
* Create solution folders in the CMake build.

* Fixed non-SDL2 build.

* Fixed emscripten build.
2024-04-09 18:23:33 +03:00
Georgi Gerganov
81a3c41aa0
talk-llama : sync llama.cpp 2024-04-07 16:21:08 +03:00
ulatekh
fc366b807a
main : add command-style grammar (#1998)
* Implemented command-style grammar in the main example.

Mostly just copied the relevant parts from the command example.

* main : code style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-28 12:02:10 +02:00
Georgi Gerganov
9fb308d90f
make : add grammar parser to common objects 2024-03-28 11:59:48 +02:00
Georgi Gerganov
2948c740a2
sync : ggml (#2001)
* sync : update scripts

* sync : ggml

* talk-llama : sync llama.cpp

* make : WHISPER_CUBLAS -> WHISPER_CUDA

* ci : try to fix sycl build

* talk-llama : fix make build
2024-03-27 18:55:10 +02:00
Georgi Gerganov
1558ec5a16
whisper : improve handling of prompts (#1981)
* whisper : improve handling of prompts

* whisper : add whisper_token_count helper
2024-03-25 14:48:19 +02:00
Mohammadreza Hendiani
04e48094e4
readme : add Fedora dependencies (#1970)
* README.md

fix documentaion and added fedora liunx dependencies for stream build

* fix documentaion and added fedora liunx dependencies for command build

* fix documentaion and added fedora liunx dependencies for talk build

* fix documentaion and added fedora liunx dependencies for talk-llama build

* reverted back mistakenly removed MacOS documentaion
2024-03-20 18:42:11 +02:00
denersc
741abb162c
whisper : token-level timestamps with DTW (#1485)
* whisper.cpp: impl dtw algo

* WIP: producing and placing DTW timestamps on tokens

* Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false.

* Fix mistake causing incorrect alignment of dtw timestamps

* implement N_TOP_MOST and CUSTOM alignment heads setting

* whisper: fix typo on alignment heads enum

* Fix issues related to changes in whisper.cpp

* Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function

* decoder: save cross QKs only if requested

* Calling median filter with ggml_map_custom1

* Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads

* Copying cross QKs from decoder backend correctly

* dtw: cleanup

* Fix incorrect n_frames passed to dtw when near end of audio

* Fix aheads_masks_init for backend != CPU

* whisper : minor style

* main : add dtw (wip)

* whisper: fix invalid memory access in aheads_masks_init

* main : add dtw (cont)

* whisper : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 18:25:26 +02:00