* fix(rpc): Improve input validation and error handling
The `rpc-server` was vulnerable to Denial of Service attacks via
several RPC commands (`SET_TENSOR`, `GRAPH_COMPUTE`, etc.). Malformed
messages could trigger failed assertions (e.g., invalid `ggml_type`)
or out-of-bounds reads/writes leading to `GGML_ABORT` calls,
crashing the server process.
This PR introduces robust input validation and replaces `abort()`
calls with graceful error handling:
- **Type Validation:** `deserialize_tensor` now checks if the
`tensor->type` is within the valid `GGML_TYPE_COUNT` range
*before* calling `ggml_new_tensor_4d`. Returns `nullptr` on
invalid type.
- **Bounds Checks:** Replaced `GGML_ABORT` in `set_tensor`,
`set_tensor_hash`, and `get_tensor` handlers with error
logging and returning `false` when data/offset parameters
are out of buffer bounds.
- **Size Checks:** Added safe arithmetic checks (for overflow) in
`graph_compute` when calculating required message sizes based
on client-provided `n_nodes` and `n_tensors`. Returns early
if the reported sizes conflict with the actual message size or
would lead to overflow.
- **Error Propagation:**
- `create_node` now checks for `nullptr` return values from
`deserialize_tensor` and its recursive calls, propagating
`nullptr` upwards on failure. Uses `find` instead of `at`
for safer map access.
- `copy_tensor` now checks for `nullptr` from `deserialize_tensor`
and sets the response status to failure if deserialization
or bounds checks fail.
- `graph_compute` now checks for `nullptr` return from
`create_node` and returns failure status correctly. The final
return value now reflects the actual computation status.
These changes improve the RPC server's resilience
against malformed client requests, preventing crashes and ensuring
errors are handled more gracefully.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* refactor(rpc): address pr comments
removed comments and unnecessary returns
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* refactor(rpc): ambiguous nullptr from create_node
rpc_server::create_node could previously return nullptr if the input ID
was 0 (valid) or if an internal error (deserialization, recursion
failure) occurred (invalid). This ambiguity made error handling
difficult for the caller (`graph_compute`).
This commit clarifies the meaning of nullptr:
- `graph_compute` now checks if the input 'id' was non-zero when
`create_node` returns nullptr, correctly identifying failures
versus intentional null links.
- `create_node` avoids recursive calls for zero IDs and propagates
nullptr unambiguously on failure during recursion.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* refactor(rpc): initial zero check in create_node
The caller (`graph_compute`) already checks `id != 0` when handling
a `nullptr` return from `create_node`, correctly distinguishing
intentional null links from actual errors. This makes the initial
`if (id == 0)` check redundant.
Also removes the log message when a tensor ID is not found in the
provided map which was added in this branch.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix(rpc): Handle get_alloc_size failure in server
Check the return value of `server.get_alloc_size` in the RPC server
loop. If the call fails, return early to close the connection.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* refactor(rpc): input size validation in graph_compute
Removes detailed, step-by-step size calculations and overflow
checks in favor of simpler direct comparisons, assuming 64-bit
overflow is unlikely.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* refactor(rpc): remove extra status code setting
Removes the explicit setting of `response.result = GGML_STATUS_FAILED`
when `create_node` returns `nullptr` within `graph_compute`.
Primary signal is the `false` return value in case of failure.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* refactor(rpc): remove redundant check for tensor->type
Breaks CI on ubuntu-cpu-make. Tensor type is uint32_t, thus
the check is not needed.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
---------
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* SYCL: Add all missing unary kernels
ggml-ci
* decouple kernel launch range from data size using strided loop
* use ciel_div helper for num_blocks
ggml-ci
* clean auto imported header files
RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.
The performance impact of this change depends on the network latency.
This commit adds a check to makes sure that the target exists before
trying to add compile options to ignore warnings when using MSVC.
The motivation for this is currently the build is broken depending on
the cmake options provided. With this fix it should be possible to build
even if the targets are not actually available.
Refs: https://github.com/ggml-org/whisper.cpp/pull/3090#issuecomment-2842760104
This commit adds the the command line option `--no-gpu` to the server
examples print usage function.
The motivation for this is that this options is available and can be set
but it is not displayed in the usage message.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3095
This commit adds a temporary fix to the `test_log_suppress` test in the
Ruby bindings.
The motivation for this changes is that I suspect that the recent
migration of the models to HuggingFace Xet has changed the way HTTP
caching works for the models. This is causing the test in question to
fail. This is a temporary fix so that CI is not broken while we
investigate this further.
* whisper: suppress Windows compiler warnings
This commit disables compiler warnings on window using MSVC.
The motivation for these changes is that some compilers generate
warnings for these conversion, for example Windows MSVC, and
there are quite a few of them. This makes it a little difficult to
spot new warnings that may be introduced and also can be difficult
for users/embedders of ggml where these warnings are hard to separate
from their own warnings.
* squash! whisper: suppress Windows compiler warnings
Move ggml related warnings into ggml. This commit also fixes the
indentation and adds a missing whitespace to the if statement.
This commit addresses a warnings that is present for Release builds:
```console
[ 30%] Building CXX object src/CMakeFiles/whisper.dir/whisper.cpp.o
In file included from /usr/include/c++/13/bits/stl_tree.h:63,
from /usr/include/c++/13/map:62,
from /home/danbev/work/ai/whisper.cpp/src/whisper-arch.h:5,
from /home/danbev/work/ai/whisper.cpp/src/whisper.cpp:2:
In static member function ‘static void std::__copy_move<false, false, std::random_access_iterator_tag>::__assign_one(_Tp*, _Up*) [with _Tp = const whisper_grammar_element*; _Up = const whisper_grammar_element* const]’,
inlined from ‘static _Up* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(_Tp*, _Tp*, _Up*) [with _Tp = const whisper_grammar_element* const; _Up = const whisper_grammar_element*; bool _IsMove = false]’ at /usr/include/c++/13/bits/stl_algobase.h:440:20,
inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = const whisper_grammar_element* const*; _OI = const whisper_grammar_element**]’ at /usr/include/c++/13/bits/stl_algobase.h:506:30,
inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = const whisper_grammar_element* const*; _OI = const whisper_grammar_element**]’ at /usr/include/c++/13/bits/stl_algobase.h:533:42,
...
```
This warning is caused by the fact that the `stack` vector is empty
when it is passed to `new_stacks.push_back(stack);`.
The suggested fix is to use `new_stacks.emplace_back();` instead of
`new_stacks.push_back(stack);`.
This commit removes the empty `.gitmodules` file from the repository.
The motivation of this is that this file is currently empty and the
project does not use any submodules at this time. Removing it mainly to
reduce clutter in the repository and any confusion when seen the file
in repo.
This commit disables the publishing of the Java binding to the Maven
repository.
The motivation for this is that this job was disabled for some time and
recently it was re-enabled, but the publishing of the Java binding
caused the build to fail and needs to be investigated further.
Refs: https://github.com/ggml-org/whisper.cpp/issues/3079
* Update PATH for main/main-cuda container
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add Dockerfile for musa, .dockerignore and update CI
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add Moore Threads GPU Support in README.md and replace ./main with whisper-cli
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Forward GGML_CUDA/GGML_MUSA to cmake in Makefile
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Minor updates for PATH ENV in Dockerfiles
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Address comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Lazy run TestBase.whisper
* Fix indentation
* Remove disused GGML_HIP_UMA from Ruby
* Add encoder_begin_callback
* Comment out existing abort mechanism
* Add test for encoder_begin_callback
* Add signatures for encoder_begin_callback related methods
* Update gem date
* tune matmul for gcn
* this one is more power efficient
* Update ggml/src/ggml-vulkan/ggml-vulkan.cpp
Co-authored-by: 0cc4m <picard12@live.de>
* disable this tune for the proprietary driver
---------
Co-authored-by: 0cc4m <picard12@live.de>
Add RPC_CMD_HELLO for getting the version of the protocol implemend by
the server. Follow the semantic versioning rules at https://semver.org
Hopefully this bring better user experience when we make breaking
changes at the protocol level and avoid issues like #12465
* graph : make mla compatible with FA
* metal : add exp FA kernels for DeepSeek models
ggml-ci
* llama : minor naming updates
ggml-ci
* ggml : disable FA for DS head sizes
* tests : add FA tests for MLA shapes
ggml-ci
Submit operators using asynchronous threads to improve performance.
Use the environment variable GGML_CANN_ASYNC_MODE to control whether
asynchronous submission is enabled. It is disabled by default.
Testing shows a 10%–20% performance improvement in scenarios with
small parameter sizes, especially in quantized models.
The grouped query attention optmization doesn't require a power of two ratio,
the only thing relying on it was the modulo operation written as bitwise &.
split_k need not depend on gqa_ratio - enable it any time there's only one
workgroup in the X dimension. The shader gets the split index from the x coord,
and multiple workgroups in the X dimension (pre-split) indicates a larger
FA operation that wouldn't need splitting.
Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.