Georgi Gerganov
c3f319d7c2
ggml : sync latest llama.cpp (view_src + alloc improvements) ( #1247 )
...
* ggml : sync latest llama.cpp (view_src + alloc improvements)
* ggml : fix build
2023-09-05 20:57:27 +03:00
Georgi Gerganov
59a3d0cb57
ggml : sync (ggml-alloc, GPU, eps, etc.) ( #1220 )
...
* ggml : sync (ggml-alloc, GPU, eps, etc.)
* ggml : fix build
* wasm : fix build
2023-09-05 13:54:40 +03:00
ChangSeok Oh
8e30bf3c02
ggml : fix compilation errors incurred by -Werror ( #1227 )
...
The -Werror warning option turns all warnings into errors. This PR makes
the compiler happy to build ggml.c and whisper.cpp with the stricter option.
2023-08-30 22:09:15 +03:00
Przemysław Pawełczyk
25466aa1c3
ggml : fix compiling when SSE3 is available but not SSSE3 ( #1210 )
...
It got broken in commit 3998465721
.
2023-08-27 21:37:31 +03:00
Przemysław Pawełczyk
601c2d2181
ggml : detect SSSE3 ( #1211 )
...
* ggml : add ggml_cpu_has_ssse3
* whisper : show SSSE3 in system info
* make : detect SSSE3 via cpuinfo
2023-08-27 21:36:41 +03:00
alonfaraj
3998465721
ci : more platforms coverage ( #1101 )
...
* add multi platform
* add image name
* fix
* fix /bin/sh path
* add missing \
* add all platforms for check
* remove platforms
* remove s390x
* - add arm v6
- format run cmd
* remove arm v6
* - bump checkout to v3
- use setup emsdk action
- add arch to all ubuntu jobs
* mymindstorm/setup-emsdk to v12
* add missing QEMU step
* add fail-fast: false for debug
* add freebsd
* remark all jobs except freebsd for test
* add sudo
* enable all tests again
* format
* check __AVX__ support before include immintrin.h
* try auto detect flag by cmake
* fix check for immintrin.h
* fix include check for immintrin.h
* Remove all platforms for sanitizer build except amd64
We have no clue why they failed.
---------
Co-authored-by: Alon Faraj <alon.faraj@mapcore.com>
2023-07-16 23:00:34 +03:00
Georgi Gerganov
8ba42095c5
Revert "ggml : do not use _GNU_SOURCE gratuitously ( #1027 )"
...
This reverts commit 3f7a03ebe3
.
2023-07-02 21:53:52 +03:00
Georgi Gerganov
d6509bf78d
ggml : sync latest repo (mostly refactoring changes)
2023-07-02 21:46:09 +03:00
Przemysław Pawełczyk
3f7a03ebe3
ggml : do not use _GNU_SOURCE gratuitously ( #1027 )
...
* Do not use _GNU_SOURCE gratuitously.
What is needed to build whisper.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/ ) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions.
There is no need to penalize musl libc which simply follows standards.
Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs.
It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.
* examples : include SDL headers before other headers
This is an attempt at fixing macOS build error coming from SDL2 relying
on Darwin extension memset_pattern4/8/16 coming from Apple's string.h.
2023-06-25 16:34:30 +03:00
Georgi Gerganov
5feb0dffba
ggml : sync latest ggml lib
2023-06-25 14:30:44 +03:00
Georgi Gerganov
429b9785c0
ggml : update WASM SIMD
2023-05-20 20:00:06 +03:00
Georgi Gerganov
e410cfc3ce
ggml : sync latest ggml repo
...
- new Q4 and Q8 quantization
- updated CUDA
2023-05-20 18:56:30 +03:00
Georgi Gerganov
aaf0d41c7c
ggml : add AVX dot products
2023-05-14 18:56:46 +03:00
Georgi Gerganov
e693074aa6
ggml : sync latest ggml
...
- New Q4 and Q5 formats
- Various improvements
2023-05-14 18:04:23 +03:00
Georgi Gerganov
5974c8facd
ggml : fix 32-bit ARM build + quantization
2023-05-02 21:52:26 +03:00
Georgi Gerganov
0bcb64b184
ggml : sync ggml (clBLAST + tensor names)
2023-05-02 21:24:18 +03:00
Georgi Gerganov
feac80dd3f
ggml : fix UB (int << 31)
2023-04-30 22:27:30 +03:00
Georgi Gerganov
794b162a46
whisper : add integer quantization support ( #540 )
...
* whisper : add integer quantization support
* examples : add common-ggml + prepare to add "quantize" tool
* whisper : quantization tool ready
* whisper : fix F32 support
* whisper : try to fix shared lib linkage
* wasm : update quantized models to Q5
* bench.wasm : remove "medium" button
* bench.wasm : fix custom model button
* ggml : add Q5_0 and Q5_1 WASM SIMD
* wasm : add quantized models to all WASM examples
* wasm : bump DB version number to 2
* talk-llama : update example to latest llama.cpp
* node : increase test timeout to 10s
* readme : add information for model quantization
* wasm : add links to other examples
2023-04-30 18:51:57 +03:00
Georgi Gerganov
0ccd6746c9
ggml : fix WASM build
2023-04-29 21:37:23 +03:00
Georgi Gerganov
d9b550c0a1
ggml : fix 32-bit ARM NEON ( #836 )
...
* ggml : add support for 32-bit ARM
* ggml : fix
* ggml : fix
2023-04-29 21:33:33 +03:00
Georgi Gerganov
e9b091c92a
ggml : use vzip instead of vuzp for consistency
2023-04-29 21:14:09 +03:00
Georgi Gerganov
1f30b99208
ggml : fix WASM build
2023-04-29 20:21:25 +03:00
Georgi Gerganov
05c3ea3bc8
ggml : sync with ggml repo (warning fixes + asserts)
2023-04-29 19:33:28 +03:00
Georgi Gerganov
acec73ab6e
ggml : sync latest ggml + llama.cpp updates (quantization)
2023-04-29 12:32:28 +03:00
Jhen-Jie Hong
ea1f8a50d4
ggml, ci : fix build on whisper.android (ARM_NEON) + add CI ( #764 )
...
* ggml : fix undefined symbol by remove inline handle
* ggml : make own ggml_aligned_malloc function
* ci: add ios/android build
2023-04-15 14:21:58 +03:00
Georgi Gerganov
677ad754a0
ggml : sync latest ggml
2023-04-14 19:20:39 +03:00
novag
463e46338c
ggml : fix q4_1 dot product types ( #759 )
...
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-14 13:34:20 +03:00
Georgi Gerganov
2f889132c6
ggml : sync latest changes from ggml and llama.cpp
2023-04-13 18:53:44 +03:00
Georgi Gerganov
ebef1e8620
ggml : fix WASM build
2023-04-10 23:18:29 +03:00
Georgi Gerganov
69b8503935
ggml : backport llama.cpp updates ( close #709 )
...
- About x2 overall performance improvement on Apple Silicon
- Results should now be the same for different number of threads (not
tested)
2023-04-10 22:28:54 +03:00
Georgi Gerganov
4a0deb8b1e
talk-llama : add new example + sync ggml from llama.cpp ( #664 )
...
* talk-llama : talk with LLaMA AI
* talk.llama : disable EOS token
* talk-llama : add README instructions
* ggml : fix build in debug
2023-03-27 21:00:32 +03:00
Georgi Gerganov
f3ee4a9673
whisper : reduce memory usage during inference ( #431 )
...
* ggml : add "scratch" buffer support
* ggml : support for scratch ring-buffer
* ggml : bug fix in ggml_repeat()
* ggml : error on scratch buffer overflow
* whisper : use scratch buffers during inference (base model only)
* whisper : update memory usage for all models
* whisper : fix encoder memory usage
* whisper : use whisper_context functions instead of macros
* whisper : fix FF + remove it from README
* ggml : reuse ggml_new_i32
* ggml : refactor the scratch buffer storage
* whisper : reorder scratch buffers in the decoder
* main : add option to disable temp fallback
* Update README.md
2023-02-04 09:45:52 +02:00
fitzsim
ae16c21e9c
whisper : PPC64 big-endian support ( #398 )
...
* ggml : set cache line size to 128 on POWER9
* whisper : add PPC64 big endian support
2023-01-23 20:48:10 +02:00
Georgi Gerganov
1290fc6457
bench : add memcpy and ggml_mul_mat benchmarks
2023-01-18 20:31:46 +02:00
Georgi Gerganov
4ef3398e8f
ggml : remove obsolete zeroing + comment fixes ( #390 )
2023-01-08 20:21:03 +02:00
Abitofevrything
8d7b29cedd
ggml : correct behaviour of ggml_vec_sum_f32 ( #390 )
2023-01-08 20:06:09 +02:00
Georgi Gerganov
52a3e0c92a
ggml : improve vec_dot_f16 unrolling in flash_attn_f16
2023-01-08 11:41:18 +02:00
Georgi Gerganov
f30b5d322c
ggml : fix bug in new soft max computation
2023-01-07 21:00:07 +02:00
Georgi Gerganov
d347a59a5f
ggml : when using BLAS start only 1 CPU thread
2023-01-07 19:48:56 +02:00
Georgi Gerganov
6394c906af
ggml : fix running tasks with variable number of threads
2023-01-07 19:20:18 +02:00
Georgi Gerganov
74ffa14e1d
ggml : unroll ggml_vec_dot_f16 in ggml_compute_forward_flash_attn_f16
2023-01-07 19:19:40 +02:00
Georgi Gerganov
65fdcbbbbb
whisper : revert accidental MB change
2023-01-07 16:18:21 +02:00
Georgi Gerganov
d61d55cd4b
ggml : speed-up soft max via Accelerate + unroll
2023-01-07 16:16:42 +02:00
Georgi Gerganov
d51fc3ee0a
ggml : use vDSP_sve and vDSP_maxv from Accelerate
2023-01-07 16:10:16 +02:00
Georgi Gerganov
f82a7dd019
ggml : make gcc happy (minor)
2023-01-07 09:34:39 +02:00
Abitofevrything
a62170c656
ggml : add SSE3 and fp16 conversion lookup table ( #368 )
...
* Improves WASM performance:
On MacBook M1 Pro, I observe 25% faster using Firefox and 35% faster using Chrome
* Add support for SSE3 SIMD
* Add SSE3 to system information
* Add Imath support for fp16-fp32 conversions
* Add Imath to system information
* Wrap Imath calls to avoid static function warnings
* Drop Imath; Add lookup table for f16 -> f32 conversions
* Remove TODO comments
* Update SSE3 to new macro arguments
* Correct updated macro definitions
* Prefer static inline where possible
* ggml : static inlines + add public f16 <-> f32 conversions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-01-06 18:45:59 +02:00
Thomas Fitzsimmons
1944e7c33e
whisper : document POWER VSX support
2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons
49a8dd6732
ggml : reorganize POWER9 ppc64le SIMD code
2023-01-05 23:53:00 +02:00
Thomas Fitzsimmons
8c7f642286
ggml : change f16 load and store macro arguments
2023-01-05 23:53:00 +02:00
Georgi Gerganov
0a0cfa7985
ggml : add void to argument-less functions
2023-01-05 21:40:38 +02:00
Georgi Gerganov
d51c5eb906
ggml : define MIN / MAX only if not defined (minor)
2023-01-05 21:16:52 +02:00
Thomas Fitzsimmons
424c410c42
ggml : improve f16 acceleration for POWER9 ppc64le
2022-12-31 10:02:19 +02:00
Georgi Gerganov
4e0b2069e7
ggml : barrier refactor + static functions
2022-12-28 19:00:53 +02:00
Georgi Gerganov
ac521a566e
ggml : simplify the SIMD code ( #324 )
...
* ggml : simplify the SIMD code
* ggml : generic reduce for all register sizes + comments
2022-12-24 10:22:28 +02:00
Georgi Gerganov
7282e2109e
ggml : use vaddvq_f32 for slightly more efficient reduce
2022-12-23 13:48:19 +02:00
Thomas Fitzsimmons
466ceebb78
ggml : add f16 acceleration for POWER9 ppc64le
2022-12-23 13:23:58 +02:00
Andy Maloney
493d94130d
ggml : make consts static ( #317 )
...
These shouldn't be able to be referenced outside the compilation unit.
2022-12-23 11:05:27 +02:00
Andy Maloney
fa463313ad
minor : small code cleanups ( #302 )
...
* Small code cleanups
- fix indentation
- remove extra semicolons
- remove extra break after returns in case statements
- remove unnecessary call to .data() on string
- use empty() instead of checking size()
- no need to check for nullptr before free
- remove unnecessary initialization of string to ""
* minor : switch case always break
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2022-12-22 17:06:19 +02:00
Kevin Brothaler
e1432dd91a
Check for both __ARM_NEON and __ARM_FEATURE_FMA so that the project can be compiled for armv7a.
...
Android armeabi-v7a's NEON support doesn't support FMA unless configured with `-mfpu=neon-fp-armv8`, which would need runtime checks.
* Also removed ABI filter from Android project.
2022-12-22 16:47:54 +02:00
katsu560
419b8a6402
Add AVX,AVX2 support for ggml_vec_scale_f32
2022-12-17 19:40:10 +02:00
Georgi Gerganov
a7047b2a28
ggml : implement ggml_compute_forward_dup_f16() special cases
2022-12-16 21:50:41 +02:00
Georgi Gerganov
0f11759406
ggml : make more compatible with c99 ( #262 )
2022-12-16 18:00:12 +02:00
Georgi Gerganov
f66ac6dc4f
ggml : fix indentation
2022-12-13 23:09:21 +02:00
Georgi Gerganov
9955fa4ed7
ggml : make compatible with c99 ( #262 )
2022-12-13 23:07:49 +02:00
Roland Rabien
e70d47baab
Remove C++20 requirement ( #257 )
...
* Remove C++20 requirement
* Roll back C features not supported in VS2017
2022-12-11 20:03:07 +02:00
Georgi Gerganov
3b1aacbe6d
talk : talk with AI in the terminal
2022-12-10 16:51:58 +02:00
Georgi Gerganov
50a061b313
ggml : add alternative cblas_sgemm call
2022-12-08 23:48:04 +02:00
Al Hoang
04a16bbf11
fix compilation on haiku
2022-12-08 09:20:57 +02:00
Georgi Gerganov
b6597539f9
ggml : fix typo in previous commit
2022-12-06 22:12:57 +02:00
Georgi Gerganov
9a4b7a916e
ggml : use macros to inline FP16 <-> FP32 conversions
2022-12-06 22:09:26 +02:00
Georgi Gerganov
f8ec718b76
ggml : add F16C CPU flag check
2022-12-06 21:56:56 +02:00
katsu560
35b40a93b9
add fp16/fp32 convert intrinsics
2022-12-06 21:44:24 +02:00
Georgi Gerganov
061fc81bd6
ggml : remove inline specifier from fp16 <-> fp32 converters
2022-12-01 22:15:12 +02:00
Georgi Gerganov
388e9f79ad
ggml : fix the fix
2022-11-23 22:40:06 +02:00
Georgi Gerganov
35cd29ce1f
ggml : fix cross-compile Linux -> Window with mingw ( #168 )
2022-11-23 22:28:41 +02:00
katsu560
804f36aa2c
ggml: change inline ggml_fp16_to_fp32, ggml_fp16_t ggml_fp32_to_fp16
2022-11-23 22:16:33 +02:00
katsu560
83456076f0
add AVX support
2022-11-23 22:16:33 +02:00
Georgi Gerganov
2065572a11
ggml : fix Windows build
2022-11-20 22:47:03 +02:00
boolemancer
0bfe728b84
Fix the Windows pthread_create shim
...
The current implementation doesn't actually set the out parameter,
and it returns 0 on failure instead of on success.
2022-11-08 15:02:32 +02:00
Georgi Gerganov
75171c2b79
ggml : multi-thread the ggml_add operator
2022-11-03 20:53:44 +02:00
Georgi Gerganov
137321915f
ggml : fix the check for NEON support ( #7 )
...
Was using the wrong preprocessor macro
2022-11-02 17:52:24 +02:00
Syed Jafri
24cd12f647
Cross compilation ( #121 )
...
* Cross compile windows
* set env properly
* rm log
* fix review
* Add back space
2022-11-02 08:46:49 +02:00
Mikhail Grigorev
8dac3c6e10
Fixed sched_yield
2022-10-30 21:38:18 +02:00
Mikhail Grigorev
6417e59aad
Implemenated sched_yield function for Windows
2022-10-30 21:38:18 +02:00
Georgi Gerganov
e5044f87d9
ggml : fix barrier
2022-10-29 19:37:19 +03:00
Georgi Gerganov
a272f10b2e
ggml : fix thread-safety of ggml_init and ggml_free
2022-10-29 19:37:19 +03:00
Georgi Gerganov
fbd513b813
Add OpenBLAS support
...
Supported via CMake - just add:
cmake .. -DWHISPER_SUPPORT_OPENBLAS=ON
On Ubuntu, you have to install the library like this:
apt install libopenblas-dev
Unfortunately, I don't observe any benefit compared to the
original AVX2 + FP16 implementation. Maybe I'm missing something
2022-10-27 18:31:49 +03:00
Georgi Gerganov
34bb3ab0cf
ggml : add system info functions
2022-10-25 20:53:48 +03:00
Georgi Gerganov
c6710efde2
refactoring : move main + stream in examples + other stuff
2022-10-25 20:53:48 +03:00
Georgi Gerganov
db460b78ff
wip : WASM 128-bit SIMD support
2022-10-22 18:54:01 +03:00
Georgi Gerganov
e905c6f827
wip : initial WASM port
...
Works but it is very slow because no SIMD is used.
For example, jfk.wav is processed in ~23 seconds using "tiny.en" model
2022-10-22 18:54:01 +03:00
Georgi Gerganov
19817711b4
Add reference to FP16 repo
2022-10-18 19:48:34 +03:00
Georgi Gerganov
e36aabe00d
Correct implementation of FP16 GELU
...
Can toggle it via the GGML_GELU_FP16 macro
2022-10-18 18:42:08 +03:00
Georgi Gerganov
91632eb6ea
Revert GELU change
...
Seems it does not work on x86 for some reason
2022-10-18 00:45:08 +03:00
Georgi Gerganov
72d967bce4
Use Accelerate framework on Apple silicon
...
Huge performance improvement in the Encode (almost x2 on MacBook M1 Pro)
Also various extra optimizations:
- Multi-threaded NORM operator
- Faster GELU via F16 cast
2022-10-18 00:12:51 +03:00
Georgi Gerganov
0e858f080d
close #56 : build on FreeBSD
...
Thanks to @abelbabel for the contribution
2022-10-17 18:10:16 +03:00
Borislav Stanimirov
0b45d25151
Building with MSVC
2022-10-11 21:40:46 +03:00
lnyan
4bbb8a587b
Add MinGW support
2022-10-09 22:26:37 +08:00
Georgi Gerganov
e29a5dacc6
ref #11 , #18 , #26 : fix CACHE_LINE_SIZE constant
2022-10-07 21:56:44 +03:00
Georgi Gerganov
167324584b
wip : rpi4 support
2022-10-05 23:03:46 +03:00