* cmake : make WHISPER_NO_AVX512=ON disable all subsets of AVX-512
Previously it happened only for MSVC, but it makes sense to have the
same behavior for other compilers too.
* make : reorder x86 ISA extensions in chronological order
And update compiler flags at the end to ease modifying conditions.
* make : support WHISPER_NO_AVX512=1 for disabling all AVX-512 subsets.
That way you do not have to override each AVX-512 subset setting
individually if it has been turned on during autodetection.
* make : add AVX512 detection to Makefile and CMakeLists.txt
* make : autodetect more AVX512 instruction subsets
* cmake : do not default to AVX512, must be enabled explicitly
* cmake : enable a set of AVX512 subsets, when AVX512 is turned on
* make : consolidate AVX512 subsets, add AVX512 VBMI
* cmake : revert to NO AVX512 setting, add settings for AVX512 VNNI and VBMI
* make : re-introduce AVX512VNNI back
* cmake : remove superfluous comment line
* make : use pkg-config for finding CFLAGS & LDFLAGS needed by OpenBLAS
That way building on *nix like environments (including MSYS2 on Windows)
with WHISPER_OPENBLAS=1 works out of the box.
Fix handling of WHISPER_OPENBLAS, so that empty value or 0 won't be
misinterpreted by make as enabled. Mind that it's not intended to
detect CMake false constants (OFF NO FALSE N). make is not CMake.
By default OpenBLAS with 64-bit interface is used, but that can be
changed with `WHISPER_OPENBLAS_INTERFACE64=0` if 32-bit one is desired.
If OpenBLAS headers and library are respectively in include/ and lib/
subdirectories of given path, then you can specify it, e.g.
`OPENBLAS_PATH=/usr/local/openblas`, and this will take precedence over
any pkg-config file.
If there is no pkg-config file (.pc) for OpenBLAS and OPENBLAS_PATH is
empty, then headers are assumed to be in /usr/include/openblas and
library as assumed to be called 'openblas64' (or 'openblas' if
`WHISPER_OPENBLAS_INTERFACE64=0`). If different headers location should
be used, then it can be done, e.g.
`WHISPER_BLAS_CFLAGS=-I/usr/local/include/openblas`.
If different library should be used, it can be specified, e.g.
`WHISPER_BLAS_LIB=openblasp64` (pthreads version as seen on Fedora), or
you can provide LDFLAGS needed to link with OpenBLAS directly:
`WHISPER_BLAS_LDFLAGS="-L/usr/local/lib/openblas -lopenblas64"`.
Current solution is flexible enough to handle most cases out there
without needlessly hardcoding possible OpenBLAS installation details.
* cmake : fix how pkg-config is used for finding include dirs and libraries needed by OpenBLAS
That way building on *nix like environments (including MSYS2 on Windows)
with -DWHISPER_OPENBLAS=ON should work out of the box as long as you
have CMake 3.25 or newer.
Make OPENBLAS_PATH environment variable supported not only on Windows.
It sets OpenBLAS include dir to ${OPENBLAS_PATH}/include and library to
${WHISPER_BLAS_LIB} (name without prefixes and suffixes) in
${OPENBLAS_PATH}/lib and avoids further package finding.
By default OpenBLAS with 64-bit interface is used (equivalent to setting
`-DWHISPER_BLAS_LIB=openblas64`), but that can be changed with
`-DWHISPER_OPENBLAS_INTERFACE64=OFF` (equivalent to setting
`-DWHISPER_BLAS_LIB=openblas`) if 32-bit one is desired.
Turn on BLA_STATIC for FindBLAS only when WHISPER_STATIC is enabled.
BLA_STATIC may not work as expected for pkg-config based operation.
Get rid of supporting BLAS_HOME environment variable. If OPENBLAS_PATH
is insufficient in your case, there is no pkg-config file to rely on,
then you can manually specify include dir, e.g.
`-DBLAS_INCLUDE_DIRS=/usr/local/include/openblas`, and library, e.g.
`-DBLAS_LIBRARIES=/usr/local/lib/libopenblas.so`.
* make / cmake : use OpenBLAS with 32-bit interface by default.
OpenBLAS w/o INTERFACE64=1 vel USE_64BITINT=1 seems to be more common.
* cmake : hardcode "lib" prefix for OpenBLAS lib filename (even on Windows)
* cmake : hardcode OpenBLAS library name when building in MSVC (Windows)
Most *nix like environments (including MSYS2 on Windows) have OpenBLAS
packages that allow coexistence of OpenBLAS builds with 32-bit and
64-bit interface (w/o and w/ OPENBLAS_USE64BITINT defined) and they
differ by not having or having "64" suffix in their library filenames.
That's not the case for OpenBLAS prebuilt libraries for Windows.
* ggml : embed Metal library source (ggml-metal.metal) into binary
enable by setting WHISPER_EMBED_METAL_LIBRARY
* rename the build option
* rename the preprocessor directive
* generate Metal library embedding assembly on-fly during build process
I just upgraded the R wrapper at https://github.com/bnosac/audio.whisper to use whisper.cpp 1.5.4
I'm working on Windows and noticed while doing that that it did not pick up the relevant CFLAGS/CXXFLAGS as my system showed
```
I whisper.cpp build info:
I UNAME_S: MSYS_NT-10.0-19045
I UNAME_P: unknown
I UNAME_M: x86_64
```
Many thanks for all the tremendous hard work on maintaining whisper.cpp!
* make : fix server example building on MSYS2 environments (Windows)
It was not working since commit eff3570f78742dfd56024328ed93d4f442434280
when server was introduced.
* cmake : simplify server example lib deps on Windows
server uses httplib::Server, not httplib::SSLServer, so there is no need
to mention cryptographic libraries in target_link_libraries.
Winsock (ws2_32) suffices here.
Also use plain library names like we use in other places.
* Add first draft of server
* Added json support and base funcs for server.cpp
* Add more user input via api-request
also some clean up
* Add reqest params and load post function
Also some general clean up
* Remove unused function
* Add readme
* Add exception handlers
* Update examples/server/server.cpp
* make : add server target
* Add magic curl syntax
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* whisper : migrate to ggml-backend
* whisper : fix logit reading
* whisper : fix tensor allocation during load
* whisper : fix beam-search with CUDA
* whisper : free backends + fix compile warning
* whisper : print when CUDA is enabled
* whisper : fix CoreML
* make : clean-up
* talk : fix compile warning
* whisper : support ggml_conv with CUDA and Metal (#1473)
* ggml : add CUDA support for ggml_conv
* whisper : remove ggml_repeat for conv bias + single backend
* cuda : fix im2col kernel
* metal : add im2col support + mul mat-vec f16 x f16
* bench-all : add q4 models
* whisper : clean-up
* quantize-all : fix
* ggml : im2col opts
* whisper : avoid whisper_model_data wrapper
* whisper : add note that ggml_mul_mat_pad does not work with CUDA
* whisper : factor out graph compute in common function
* whisper : fixes
* whisper : fix UB with measure buffers
* whisper : try to fix the parallel whisper_state functionality (#1479)
* whisper : try to fix the parallel whisper_state functionality
* whisper : fix multi-state Metal
* whisper : free backend instances in whisper_state
* sync : ggml (backend v2, k-quants, CUDA opts, Metal opts, etc.)
* metal : allow env metal variable to override resource path (#1415)
* Allow env variable to override resource path
* Update ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* sync : restore common / main from `master`
* sync : restore whisper from `master`
* talk-llama : update to latest llama.cpp
* ruby : fix build
* ggml : fix 32-bit ARM build
* ggml : fix MIN / MAX macro collisions + update ios bindings
* ggml : fix ifdefs and MIN / MAX again
* exampels : fix Obj-C and Swift examples
* ggml : fix 32-bit ARM compatibility
* ggml : one more attempt to fix 32-bit ARM compat
* whisper : fix support for larger graphs
---------
Co-authored-by: Chris Raethke <codesoda@users.noreply.github.com>
* metal : init
* whisper : factor out graph builds
* whisper : allocate encoder and decoder using ggml-alloc
* whisper : ggml-alloc is now supported
* whisper : CoreML support ggml-alloc
* build : fix ggml-alloc
* ios : update submodule
* extra : update sync-ggml.sh script to also sync ggml-alloc
* ci : see if this is causing the crash
* whisper : refactor ggml-alloc init
* whisper.android : try to fix build
* whisper : initial Metal version
* ci : try to debug vmem issue
* metal : decoder works on GPU!
* metal : add multi-decoder support
* ggml : fix ggml_nbytes (probably temp solution)
* metal : run "cross" step on the GPU
* whisper : remove ggml_repeat in the encoder
* whisper : offload the Encoder to Metal
* ggml : use simpler ggml_bytes() implementation
* ggml-alloc : try to make CI happy by reducing vram to 128GB
* whisper : add whisper_allocr to wrap ggml_allocr
* whisper : factor out alloc init in a function
* cmake : update to support Metal build
* whisper : add <functional> header
* objc : fix build (no Metal yet)
* ios : add Metal support
* swiftui : fix build
* metal : speed-up KQ multiplication
* metal : sync latest llama.cpp kernels
* readme : add Metal info
* ios : update submodule
* coreml : add code to toggle Core ML config (CPU, ANE, GPU)
* bench : fix timings by running a pre-heat
* bench : start benching the decoder
* whisper : add ggml_mul_mat_pad
* bench : fix uninitialized vars
* whisper : add comment for disabling mul-mat padding
* whisper : add description of ggml_mul_mat_pad
* whisper : clean-up ggml_mul_mat_pad
* metal : remove the "concurrent" flag
* bench : variable n_past
* ios : update SPM package