Borislav Stanimirov
af5833e298
whisper : remove speed_up
and phase_vocoder*
functions ( #2198 )
...
* whisper : fix cast warning
* whisper : remove phase_vocoder functions, ref #2195
* whisper : remove speed_up from whisper_full_params, closes #2195
2024-05-31 11:37:29 +03:00
Martin Delille
b87494bb8f
readme : add conan badge ( #2196 )
...
* Add conan badge
* Fix markdown formating
2024-05-30 15:43:28 +03:00
Carlos Zoido
ad130431aa
readme : add install instructions for Conan ( #2189 )
2024-05-30 15:06:15 +03:00
Borislav Stanimirov
e130b66642
whisper: use global cache for sin/cos vals and Hann window ( #2194 )
...
- also rename Hanning to Hann as it's named after Julius von Hann
as per Wikipedia
2024-05-29 19:09:21 +03:00
Georgi Gerganov
c7b6988678
release : v1.6.2
2024-05-27 10:35:09 +03:00
Georgi Gerganov
05042a782d
Revert "whisper : remove extra backend instance (huh?)" ( #2182 )
...
This reverts commit 4caa64b73e
.
2024-05-27 10:20:25 +03:00
Daniel Valdivia
a7dc2aab16
server : fix typo ( #2181 )
...
A simple comment typo, PR can be dismissed
2024-05-25 10:46:22 +03:00
Todd
22d46b7ba4
ruby : update bindings ( #2154 )
...
* update library files
* update whispercpp
* not needed for gem
2024-05-22 23:02:52 +03:00
Georgi Gerganov
c10db6ea28
release : v1.6.1
2024-05-21 18:44:37 +03:00
William Tambellini
1b51fdf170
examples : add support for decoding input with ffmpeg (Linux) ( #2133 )
...
- search for ffmpeg libs/headers at cmake time
- added ffmpeg-transcode.cpp into libcommon if ffmpeg on
- hooked ffmpeg trancoding in common read_wav(...)
- passed test:
./main -m ggml-base.en.bin -f samples/jfk.mp3
2024-05-21 18:31:41 +03:00
Pedro Probst
adee3f9c1f
node : add flash_attn param ( #2170 )
2024-05-20 09:08:48 +03:00
Tamotsu Takahashi
4798be1f9a
ci: Update build.yml to suppress warnings about node.js versions ( #2166 )
...
* Update actions to suppress warnings about old node.js
https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/
* Update actions/upload-artifact, specify android cmdline-tools-version
* Use java 20
gradle 8.1 complains against 21
https://docs.gradle.org/current/userguide/compatibility.html
2024-05-19 11:49:26 +03:00
Georgi Gerganov
08981d1bac
release : v1.6.0
2024-05-15 09:59:48 +03:00
Georgi Gerganov
7094ea5e75
whisper : use flash attention ( #2152 )
...
* whisper : use flash attention in the encoder
* whisper : add kv_pad
* whisper : remove extra backend instance (huh?)
* whisper : use FA for cross-attention
* whisper : use FA for self-attention
* whisper : simplify encoder FA
* whisper : add flash_attn runtime parameter
* scripts : add bench log
* scripts : add M1 Pro bench log
2024-05-15 09:38:19 +03:00
petterreinholdtsen
9d5771ae43
talk-llama : reject runs without required arguments ( #2153 )
...
* Extended talk-llama example to reject runs without required arguments.
Print warning and exit if models are not specified on the command line.
* Update examples/talk-llama/talk-llama.cpp
* Update examples/talk-llama/talk-llama.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-14 21:32:41 +03:00
Georgi Gerganov
f56b8305c4
sync : ggml
2024-05-14 19:16:32 +03:00
Georgi Gerganov
1056ad762c
metal : support FA without mask + add asserts (llama/7278)
...
* ggml : fa without mask + add asserts
ggml-ci
* metal : support non-contiguous KV
ggml-ci
2024-05-14 19:16:29 +03:00
Radoslav Gerganov
c451080c8b
ggml : add RPC backend (llama/6829)
...
* ggml : add RPC backend
The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).
* set TCP_NODELAY
* add CI workflows
* Address review comments
* fix warning
* implement llama_max_devices() for RPC
* Address review comments
* Address review comments
* wrap sockfd into a struct
* implement get_alignment and get_max_size
* add get_device_memory
* fix warning
* win32 support
* add README
* readme : trim trailing whitespace
* Address review comments
* win32 fix
* Address review comments
* fix compile warnings on macos
2024-05-14 19:16:29 +03:00
Neo Zhang
8e7c22fbdb
rm wait() (llama/7233)
2024-05-14 19:16:29 +03:00
Johannes Gäßler
e57e95eb0d
CUDA: add FP32 FlashAttention vector kernel (llama/7188)
...
* CUDA: add FP32 FlashAttention vector kernel
* fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-14 19:16:29 +03:00
Georgi Gerganov
130f43e4b8
scripts : sync ggml-rpc
2024-05-14 19:15:35 +03:00
thewh1teagle
d8356a1cc2
whisper : fix model path encoding in windows ( #2086 )
...
* fix: model path encoding in windows
* fix: convert model path to wide string only for MSVC compiler
2024-05-14 09:43:41 +03:00
Georgi Gerganov
4ef8d9f44e
server : return utf-8 ( #2138 )
2024-05-13 15:33:46 +03:00
Pedro Probst
3928dbd206
node : add audio_ctx and audio buffer params ( #2123 )
...
* node : add audio_ctx param
* node : support passing audio buffer directly
* node : parse audio_ctx in index.js
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-13 15:22:23 +03:00
aldorof
2ced6f0742
cmake : fix HIP/ROCm build ( #2102 )
2024-05-13 15:18:43 +03:00
valVk
30f73109b8
node : add additional params ( #2000 )
...
* Add additional params to addon.node
* Add comma_in_time as parameter
* Fix tests
2024-05-13 15:15:43 +03:00
Mark Karpelès
17fa62d3d3
js : remove un-needed request header from fetchRemote ( #2119 )
2024-05-13 15:13:19 +03:00
Georgi Gerganov
1da5edcde0
cmake : fix metal embed sources path ( #2110 )
2024-05-13 15:09:59 +03:00
Daniel Ziegenberg
0bb05b113d
main : dont print timings with --no-prints ( #2108 )
...
Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>
2024-05-13 15:00:19 +03:00
Daniel Ziegenberg
f141b2b938
main : add options for temperature control ( #2088 )
...
Add two options:
```
-tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1
-tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1
```
The sampling temperature, between 0 and 1. Higher values like 0.8 will
make the output more random, while lower values like 0.2 will make it
more focused and deterministic. If set to 0, the model will use log
probability to automatically increase the temperature until certain
thresholds are hit.
Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>
2024-05-13 14:59:44 +03:00
Georgi Gerganov
2b434c449e
whisper : switch back to F32 mask ( #0 )
2024-05-13 14:43:43 +03:00
zhangjixiong
e93081f83f
whisper.android : update example, add field to print timestamp ( #2072 )
2024-05-13 14:30:03 +03:00
Xingchen Song(宋星辰)
b6bbce4ae9
cmake : fix json INTERFACE library ( #2069 )
2024-05-13 14:29:39 +03:00
mashizora
7705dc52da
main : fix double quote escaping in csv output ( #2090 )
2024-05-13 11:55:32 +03:00
Georgi Gerganov
e6acaf9d91
metal : tune soft_max number of threads ( #0 )
2024-05-13 11:02:26 +03:00
Georgi Gerganov
2c81e6fd51
whisper : remove old flash attn code ( #0 )
2024-05-13 11:02:26 +03:00
Georgi Gerganov
9506267ce5
ggml : try fix ppc64 ( #0 )
2024-05-13 11:02:26 +03:00
Georgi Gerganov
fbeb80b5f0
ggml : remove oboslete alibi code (skipme) ( #0 )
2024-05-13 11:02:26 +03:00
Georgi Gerganov
3fa7d29876
talk-llama : sync llama.cpp
2024-05-13 11:02:26 +03:00
Georgi Gerganov
fe179ae0cc
sync : ggml
2024-05-13 11:02:26 +03:00
Hong Bo PENG
40aeeeecc4
ggml : optimize for ppc64le using VSX intrinsics (ggml/784)
...
* optimize for ppc64le using VSX intrinsics
* 1. code clean up by removing comments about overflow concern.
2. fix typo in suffix of scaling.
* Continue to fix typo in suffix of scaling for QK_K <> 256
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-13 11:02:26 +03:00
Georgi Gerganov
5a863fbe18
metal : fix indent (ggml/0)
2024-05-13 11:02:26 +03:00
Georgi Gerganov
91c646c61d
ggml : restore sigmoid decl order (ggml/0)
2024-05-13 11:02:26 +03:00
Georgi Gerganov
accada542a
ggml : resolve merge (ggml/0)
...
ggml-ci
2024-05-13 11:02:26 +03:00
Georgi Gerganov
e54329da7b
ggml : full ALiBi support (llama/7192)
...
* ggml : full ALiBi support
* ggml : update ggml_soft_max_ext() CUDA, SYCL
* ggml : ggml_flash_attn_ext() support ALiBi (CPU)
* ggml : ggml_flash_attn_ext() support ALiBi (Metal)
* ggml : fix warning
* ggml : ggml_flash_attn_ext() support ALiBi (CUDA)
ggml-ci
* ggml : fix assert message
* vulkan : add dev notes
* ggml : require mask when using ALiBi
ggml-ci
* convert : fix convert for refact models
2024-05-13 11:02:26 +03:00
Georgi Gerganov
284fac39fb
metal : fix flash attention kernel requirements (llama/7169)
...
* metal : fix flash attention kernel requirements
ggml-ci
* metal : fix ggml_metal_supports_op
ggml-ci
2024-05-13 11:02:26 +03:00
Ouadie EL FAROUKI
fe454b8d9e
Minor arithmetic improvement to mmvq wrapper kernel (llama/7172)
2024-05-13 11:02:26 +03:00
0cc4m
c114b75aee
Vulkan Bugfixes and Improvements (llama/7084)
...
* Modify mat mat mul shader for mul_mat_id, modify mat vec mul shaders for single call batch operation
* Further work towards MoE, disabled for now
* Disable MoE code (not ready yet), fix a number of bugs in shaders and Vulkan code
* Add softmax with f16 mask and pos buffer support
* Disable mul_mat_id shaders for now
* Fix flake8
* Fix validation errors caused by empty buffers on larger batch sizes
2024-05-13 11:02:26 +03:00
Johannes Gäßler
4be936b88b
CUDA: generalize FP16 fattn vec kernel (llama/7061)
...
* CUDA: generalize FP16 fattn vec kernel
* disable unsupported head sizes for AMD in test
* try AMD fix
* fix batch size 2-8
* partially revert changes
2024-05-13 11:02:26 +03:00
Albert Jin
26c550f772
opencl : alignment size converted from bits to bytes (llama/7090)
...
* opencl alignment size should be converted from bits to bytes
Reference: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#CL_DEVICE_MEM_BASE_ADDR_ALIGN
> Alignment requirement (in bits) for sub-buffer offsets.
* Update ggml-opencl.cpp for readability using division instead of shift
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
2024-05-13 11:02:26 +03:00