whisper.cpp/ggml/include
compilade 9bf7250bf9 llama : simplify Mamba with advanced batch splits (llama/8526)
* llama : advanced batch splits

This includes equal-sequence-length batch splits which are useful
to simplify recurrent model operators.

* llama : always make recurrent state slots contiguous

* ggml : simplify mamba operators

* llama : fix integer signedness mixing

* llama : logits_all has priority over batch->logits

Otherwise, the server embeddings tests failed.
This was likely an existing problem but was only detected here
because of an additional assertion.

* llama : apply suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* llama : fix t5 segfault

* llama : fix Mamba session save and restore

* llama : minor cosmetic changes

* llama : rename llama_reorder_outputs to llama_output_reorder

Also move it closer to llama_output_reserve.

* llama : fix pooled embeddings when using batches with equal_seqs

* minor : add struct members for clarity

ggml-ci

* llama : fix T5 segfault again

* llama : fix Mamba pooled embeddings with multiple sequences

Until the pooled embeddings are refactored to allow splitting
across ubatches for causal embeddings,
recurrent models can only process a single sequence per ubatch
when calculating pooled embeddings.

* llama : add llama_model_is_recurrent to simplify figuring that out

This will make it easier to more cleanly support RWKV-v6 and Mamba-2.

* llama : fix simple splits when the batch contains embeddings

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-28 13:22:20 +03:00
..
ggml-alloc.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-backend.h feat: ref. cross entropy, add CUDA, fix grad test (ggml/929) 2024-08-28 13:22:20 +03:00
ggml-blas.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-cann.h ggml : add CANN backend (llama/0) 2024-08-09 09:58:16 +03:00
ggml-cuda.h feat: Support Moore Threads GPU (llama/8383) 2024-08-08 22:48:46 +03:00
ggml-kompute.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-metal.h metal : add abort callback (ggml/905) 2024-08-08 22:48:46 +03:00
ggml-rpc.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-sycl.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml-vulkan.h whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
ggml.h llama : simplify Mamba with advanced batch splits (llama/8526) 2024-08-28 13:22:20 +03:00