* ggml : add check for grad_accs
This commit adds a check for grad_accs in ggml_graph_get_grad and
ggml_graph_get_grad_acc functions. This is necessary to avoid segfaults
when grad_accs is not initialized.
The motivation for this change is that I find it nice to be able to
print out a computation graph using ggml_graph_print but this function
segfaults when grad_accs is not initialized:
```console
(gdb) p g1
$2 = (ggml_cgraph *) 0x7ffff66004b0
(gdb) p *g1
$3 = {size = 2048, n_nodes = 1, n_leafs = 2, nodes = 0x7ffff6600500,
grads = 0x0, grad_accs = 0x0, leafs = 0x7ffff6604500,
visited_hash_set = {size = 4099, used = 0x7ffff6610518,
keys = 0x7ffff6608500}, order = GGML_CGRAPH_EVAL_ORDER_LEFT_TO_RIGHT}
(gdb) p ggml_graph_print(g1)
=== GRAPH ===
n_nodes = 1
Program received signal SIGSEGV, Segmentation fault.
0x0000555555579775 in ggml_graph_get_grad
(cgraph=0x7ffff66004b0,node=0x7ffff6600340)
at /ggml/ggml/src/ggml.c:5990
5990 return igrad != GGML_HASHSET_FULL &&
ggml_bitset_get(cgraph->visited_hash_set.used, igrad) ?
cgraph->grads[igrad] : NULL;
```
* squash! ggml : add check for grad_accs
Fix the check in ggml_graph_get_grad. The check was incorrectly using
cgraph->grad_accs instead of cgraph->grads.
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Vulkan: Implement VK_KHR_cooperative_matrix support in the matrix matrix multiplication shader
* Improve performance with better q4_k and q5_k dequant and store unrolling
* Add Vulkan MUL_MAT and MUL_MAT_ID accumulator precision selection
* Rework mulmat shader selection and compilation logic, avoid compiling shaders that won't get used by device
* Vulkan: Implement accumulator switch for specific mul mat mat shaders
* Vulkan: Unroll more loops for more mul mat mat performance
* Vulkan: Add VK_AMD_shader_core_properties2 support to read Compute Unit count for split_k logic
* Disable coopmat support on AMD proprietary driver
* Remove redundant checks
* Add environment variable GGML_VK_DISABLE_COOPMAT to disable VK_KHR_cooperative_matrix support
* Fix rebase typo
* Fix coopmat2 MUL_MAT_ID pipeline selection
* metal : Extend how Llama.cpp locates metal resources (llama/10675)
* It searches the resource file in the directory where the current
binary is located as well.
* Resolves symbolic links.
Rationale:
When we plug this dependency into a Bazel build and run it in the
context of Bazel (e.g. testing):
* the execution directory is often very different from where the files
are located and no direct control over this (Bazel sandboxing),
* the Bazel sandbox often use symbolic links to make files available.
With this patch, we can have the resource file added to the target,
can build and run tests in the context of Bazel.
* Update ggml/src/ggml-metal/ggml-metal.m
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-metal/ggml-metal.m
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Remove Whisper::Model.[]
* Fix Whisper::Model::URI#request
* Make Whisper::Context#initialize accept pre-converted model name
* Use downloading pre-converted model feature for testing
* Update README
* Remove unnecessary task
* Move whisper/model.rb -> whisper/model/uri.rb
* Update document comment of Whisper::Context#initialize
* Don't show download progress when not tty
* Pass String to raise
* Use cache model file if download fails
* Add test for auto download
* Specify required Ruby version
* Fix a typo
* Remove unnecessary flags
* Initialize Whisper::Params#diarize explicitely
* Remove redundant code from README for simplicity
* Add Whisper::Params#no_speech_thold attribute
* Add test for Whisper::Params#no_speech_thold
* Fix hallucinations during silence
When the predicted tokens end with a single timestamp the the entire 30 segment should be considered as done, to avoid hallucinations for the remaining part of segment.
This behaviour is on par with openai's whisper. Refer to logic related to `single_timestamp_ending` in https://github.com/openai/whisper/blob/main/whisper/transcribe.py
* Accept review comments related to formatting.
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Adding missing CMakeLists.txt include for ggm-cpu needed by whisper.android
* attempt to re-enable CI for JNI android
---------
Co-authored-by: Your Name <you@example.com>
* Use C++17
* Add test for Pathname of model
* Make Whisper::Context#initialize accept Pathname
* Add shorthand for pre-converted models
* Update documents
* Add headings to API section in README [skip ci]
* Remove unused function
* Don't care about no longer included file
* Cosmetic fix
* Use conditional get when get model files
* [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend
Move to compile time selection to backend to avoid latency at run time.
Add it to all mkl gemm calls and only for NVIDIA backend.
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
* Formatting
* Address PR comments to increase readibility
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
* ggml_pad_reflect_1d defined in header
* implemented on CPU
* called the forward pass
* impl Metal kernel
* added Metal kernel
* added OP_PAD_REFLECT_1D in test-backend-ops.cpp
* add test-pad-reflect-1d test case
* test case support multiple backend