* ci: add visionOS build workflow
Add a new GitHub Actions workflow for building on visionOS with CMake and Xcode.
* ggml: Define _DARWIN_C_SOURCE for visionOS to fix missing u_xxx typedefs
* ci: remove define hacks for u_xxx system types
---------
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
I've been seeing significantly worse performance for tg with flash attention
enabled vs disabled, and it seems to be related to the submit heuristic.
Change the heuristic to check how many bytes worth of weight matrix are
used and flush every 100MB, and ramp up after the first few submits.
This seems to resolve the issue, and also increases perf for non-FA a bit.
* opencl: more profiling timing
* opencl: generate trace for profiling
* opencl: reduce profiling overhead
* Populate profiling timing info at the end rather than after each
kernel run
* opencl: fix for chrome tracing
* Enable CUDA Graph on CTK < 12.x
`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.
* Fix compilation errors with MUSA
* Disable CUDA Graph for MUSA
* cmake: Factor out compiler flag function from ggml
llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).
* cmake: Enable building against system ggml
This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.
The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.
Nothing installs to it yet, so when attempting to use the cmake package,
set_and_check() triggers an error if the directory doesn't already exist
for other reasons.
This commit updates the Makefile to use cmake instead of make to build
whisper.cpp.
The motivation for this change is that currently the make recipe test
will fail with the following error:
```console
$ make test
Mkdir build
Mkdir models
Build whisper
make[1]: Entering directory '/home/danbev/work/ai/whisper-work'
make[1]: *** No rule to make target 'libwhisper.a'. Stop.
make[1]: Leaving directory '/home/danbev/work/ai/whisper-work'
make: *** [Makefile:33: whisper] Error 2
```
* whisper : add support for ggml_backend_buffer_type
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
* fix compile error when building on Ubuntu
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
* remove copyright header from include file
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
---------
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
* bindings.java : enable copyLibs task [no ci]
This commit adds a dependency on the copyLibs task to the sourcesJar and
jar tasks. This ensures that the libwhisper.so file is copied to the
correct location before the jar is built.
It also sets the executable bit on the gradlew file.
* bindings.java : add copyLibs dep for processResources [no ci]
This will otherwise cause builds to fail after doing an initial build.
* bindings.java : pass structs by value to native code
This commit refactors the code to pass the structs by value to the
native code. This is done by creating a ByValue class for each struct
and using it in the Java code.
The motivation for this change is that without this application crashes
due to what I believe was memory mis-alignement. When the structs were
passed to the native code they would be att different memory locations.
Passing by value overcomes this issue and considering that the structs
hold parementers (context and full params) it might be alright do to
this. These changes allow all the tests to pass.
* bindings.java : fix javadoc warnings [no ci]
* bindings.java : fix libwhisper.dylib path in build.gradle [no ci]
This commit fixes the copyLibwhisperDynlib task in the build.gradle file
to copy the correct libwhisper.dylib file from build/src.
This commit updates the instructions for running the test in the
JavaScript bindings README file.
The motivation for this is for Node.js versions after v16.4.0 the
`--experimental-wasm-threads` and `--experimental-wasm-simd` flags are
no longer required and they generate the following errors:
```console
$ node --experimental-wasm-threads --experimental-wasm-simd ../tests/test-whisper.js
node: bad option: --experimental-wasm-threads
node: bad option: --experimental-wasm-simd
```
This commit add GGML_USE_CPU to built target library to enable CPU
backend.
The motivation for this that without the compile definition the CPU
backend is not enabled and the app will crash when trying to use it.
* whisper.android.java : update build with ggml source changes
This commit updates the whisper.android.java build to include the
new ggml source files and directories. The gradle build configuration is
also updated to include the aliyun maven repository.
* examples : reduce initial memory to 512MB
This commit reduces the initial memory size to 512MB. This is done to
to avoid WebAssembly memory allocation issues on some platforms. It also
adds a flag to allow the memory to grow dynamically (up to the maximum).
The motivation for this change is that currently the initial memory is
set to 2GB which might be to large for some platforms. This will lead to
an error being thrown from the JavaScript code generated by Emscripten
when trying to allocate memory. More details can be found in the
referenced issue below.
* examples : set MAXIMUM_MEMORY instead of TOTAL_MEMORY
This commit sets MAXIMUM_MEMORY instead of TOTAL_MEMORY in the
whisper.wasm example.
The motivation for this is that TOTAL_MEMORY and INITIAL_MEMORY are
actually the same thing. Instead we want to set MAXIMUM_MEMORY to
2GB.
Refs: https://github.com/ggerganov/whisper.cpp/issues/2920
Refs: https://emscripten.org/docs/tools_reference/settings_reference.html#initial-memory
This commit fixes the nthread parsing in the whisper.wasm example when
using the `Threads` slider to change the number of threads to be used.
Currently this results in the following error:
```console
main.js:5597 Uncaught TypeError: Cannot convert "5" to int
at checkAssertions (main.js:5597:21)
at Object.toWireType (main.js:5611:15)
at Object.full_default (eval at new_ (main.js:5292:27), <anonymous>:10:26)
at whisper.wasm/:649:42
```
This commit adds a fix to the server.py file to handle requests for
web worker files when running the local python server to test the wasm
examples.
The motivation for this is that currently the server is serving files
from the build-em/bin directory which is where the .worker.js files
exist. But when examples access these resources they do so with the
application context path, for example /whisper.wasm/libmain.worker.js
but this will not be found as it currently works.
This commit adds debug level logging for the native build options and
variables to ggml/CMakeLists.txt.
The motivation for this is that it can be useful to see the effective
result of `GGML_NATIVE`, `GGML_NATIVE_DEFAULT`, and `INS_ENB` for a
cmake build. I've found myself adding similar logging a few times now,
so I thought it might be a good idea to add this.
Example output, specifying `-DCMAKE_MESSAGE_LOG_LEVEL=DEBUG` when
running cmake produces the following output:
```console
-- GGML_NATIVE : OFF
-- GGML_NATIVE_DEFAULT : OFF
-- INS_ENB : OFF
```
* whisper : improve whisper-cli executable path detection in model download shell scripts
If whisper-cli is found on the path, do not suggest invoking from build directory. This improves flexibility and usability for distribution and packaging scenarios.
* whisper : enhance Windows model download batch script to have comparable functionality and behaviour as shell scripts
* Download models to the current directory if the script is executed from the \bin\ directory (for future distribution scenarios where the script is in the \bin\ subdirectory of a Windows build)
* Add model_path command line argument
* If whisper-cli is found on the path, do not suggest invoking from build directory
* whisper : resolve compiler warning by removing duplicate definition of NOMINMAX in whisper-cli code
This change initializes each decoder's random number generator with a
unique seed.
The motivation for this is that currently all decoders are initialized
with the same seed value, 0. The result of this is that for the same
state (logits, probs, and logprobs) they will produce the same output.
This commit removes the -DCMAKE_CUDA_ARCHITECTURES=all flag from the
windows-cublas job in the build.yml file.
The motivation for this is that building for all architectures is
unnecessary and takes a long time. Without this flag the architectures
will instead be set by ggml-cuda.
Refs: https://github.com/ggerganov/whisper.cpp/pull/2915#issuecomment-2743160743