mirror of
https://github.com/ggerganov/whisper.cpp.git
synced 2025-03-14 00:06:37 +00:00
Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.