ruby : Add parallel transcription support (#3222)
Some checks failed
Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled
CI / determine-tag (push) Has been cancelled
CI / ubuntu-22 (linux/amd64) (push) Has been cancelled
CI / ubuntu-22 (linux/ppc64le) (push) Has been cancelled
CI / ubuntu-22-arm64 (linux/arm64) (push) Has been cancelled
CI / ubuntu-22-arm-v7 (linux/arm/v7) (push) Has been cancelled
CI / macOS-latest (generic/platform=iOS) (push) Has been cancelled
CI / macOS-latest (generic/platform=macOS) (push) Has been cancelled
CI / macOS-latest (generic/platform=tvOS) (push) Has been cancelled
CI / ubuntu-22-gcc (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-gcc (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-gcc (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-gcc (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-arm64 (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-gcc-arm64 (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-arm-v7 (linux/arm/v7, Debug) (push) Has been cancelled
CI / ubuntu-22-gcc-arm-v7 (linux/arm/v7, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/amd64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/arm64, Release) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Has been cancelled
CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Has been cancelled
CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Has been cancelled
CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Has been cancelled
CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Has been cancelled
CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Has been cancelled
CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (Win32, ON, x86, 0.3.29, Release, x86, 2.28.5, ON) (push) Has been cancelled
CI / windows-blas (x64, ON, x64_64, 0.3.29, Release, x64, 2.28.5, ON) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Has been cancelled
CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Has been cancelled
CI / emscripten (Release) (push) Has been cancelled
CI / android (push) Has been cancelled
CI / android_java (push) Has been cancelled
CI / quantize (push) Has been cancelled
CI / vad (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main-musa.Dockerfile platform:linux/amd64 tag:main-musa]) (push) Has been cancelled
Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Has been cancelled
Examples WASM / deploy-wasm-github-pages (push) Has been cancelled
CI / ios-xcode-build (Release) (push) Has been cancelled
CI / bindings-java (push) Has been cancelled
CI / release (push) Has been cancelled
CI / coreml-base-en (push) Has been cancelled

* Fix indentation of code sample in document comment

* Make Whisper::Context#transcribe able to run non-parallel

* Add test for Whisper::Context#transcribe with parallel option

* Follow signature API change of Context#transcribe

* Remove useless variable assignment

* Move simple usage up in README

* Add need help section in README

* Add document on Context#transcribe's parallel option in README

* Update date

* Fix signature of Context.new

* Make Context#subscribe accept n_processors option

* Make test follow #transcribe's change

* Make RBS follow #transcribe's change

* Add document for #transcribe's n_processors option

* Rename test directory so that Rake tasks' default setting is used
This commit is contained in:
KITAITI Makoto
2025-06-04 14:50:18 +09:00
committed by GitHub
parent 82f461eaa4
commit 799eacdde4
20 changed files with 107 additions and 62 deletions

View File

@ -70,17 +70,6 @@ end
Some models are prepared up-front: Some models are prepared up-front:
```ruby
base_en = Whisper::Model.pre_converted_models["base.en"]
whisper = Whisper::Context.new(base_en)
```
At first time you use a model, it is downloaded automatically. After that, downloaded cached file is used. To clear cache, call `#clear_cache`:
```ruby
Whisper::Model.pre_converted_models["base"].clear_cache
```
You also can use shorthand for pre-converted models: You also can use shorthand for pre-converted models:
```ruby ```ruby
@ -105,6 +94,19 @@ puts Whisper::Model.pre_converted_models.keys
# : # :
``` ```
You can also retrieve each model:
```ruby
base_en = Whisper::Model.pre_converted_models["base.en"]
whisper = Whisper::Context.new(base_en)
```
At first time you use a model, it is downloaded automatically. After that, downloaded cached file is used. To clear cache, call `#clear_cache`:
```ruby
Whisper::Model.pre_converted_models["base"].clear_cache
```
You can also use local model files you prepared: You can also use local model files you prepared:
```ruby ```ruby
@ -163,6 +165,16 @@ For details on VAD, see [whisper.cpp's README](https://github.com/ggml-org/whisp
API API
--- ---
### Transcription ###
By default, `Whisper::Context#transcribe` works in a single thread. You can make it work in parallel by passing `n_processors` option:
```ruby
whisper.transcribe("path/to/audio.wav", params, n_processors: Etc.nprocessors)
```
Note that transcription occasionally might be low accuracy when it works in parallel.
### Segments ### ### Segments ###
Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`: Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:
@ -297,6 +309,11 @@ First call of `rake test` builds an extension and downloads a model for testing.
If something seems wrong on build, running `rake clean` solves some cases. If something seems wrong on build, running `rake clean` solves some cases.
### Need help ###
* Windows support
* Refinement of C/C++ code, especially memory management
License License
------- -------

View File

@ -67,17 +67,15 @@ file LIB_FILE => [SO_FILE, "lib"] do |t|
end end
CLEAN.include LIB_FILE CLEAN.include LIB_FILE
Rake::TestTask.new do |t| Rake::TestTask.new
t.test_files = FileList["tests/test_*.rb"]
end
TEST_MEMORY_VIEW = "tests/jfk_reader/jfk_reader.#{RbConfig::CONFIG['DLEXT']}" TEST_MEMORY_VIEW = "test/jfk_reader/jfk_reader.#{RbConfig::CONFIG['DLEXT']}"
file TEST_MEMORY_VIEW => "tests/jfk_reader/jfk_reader.c" do |t| file TEST_MEMORY_VIEW => "test/jfk_reader/jfk_reader.c" do |t|
chdir "tests/jfk_reader" do chdir "test/jfk_reader" do
ruby "extconf.rb" ruby "extconf.rb"
sh "make" sh "make"
end end
end end
CLEAN.include "tests/jfk_reader/jfk_reader.{o,#{RbConfig::CONFIG['DLEXT']}}" CLEAN.include "test/jfk_reader/jfk_reader.{o,#{RbConfig::CONFIG['DLEXT']}}"
task test: [LIB_FILE, TEST_MEMORY_VIEW] task test: [LIB_FILE, TEST_MEMORY_VIEW]

View File

@ -24,6 +24,7 @@ ID id_URI;
ID id_pre_converted_models; ID id_pre_converted_models;
ID id_coreml_compiled_models; ID id_coreml_compiled_models;
ID id_cache; ID id_cache;
ID id_n_processors;
static bool is_log_callback_finalized = false; static bool is_log_callback_finalized = false;
@ -142,6 +143,7 @@ void Init_whisper() {
id_pre_converted_models = rb_intern("pre_converted_models"); id_pre_converted_models = rb_intern("pre_converted_models");
id_coreml_compiled_models = rb_intern("coreml_compiled_models"); id_coreml_compiled_models = rb_intern("coreml_compiled_models");
id_cache = rb_intern("cache"); id_cache = rb_intern("cache");
id_n_processors = rb_intern("n_processors");
mWhisper = rb_define_module("Whisper"); mWhisper = rb_define_module("Whisper");
mVAD = rb_define_module_under(mWhisper, "VAD"); mVAD = rb_define_module_under(mWhisper, "VAD");

View File

@ -13,6 +13,7 @@ extern ID id_URI;
extern ID id_pre_converted_models; extern ID id_pre_converted_models;
extern ID id_coreml_compiled_models; extern ID id_coreml_compiled_models;
extern ID id_cache; extern ID id_cache;
extern ID id_n_processors;
extern VALUE cContext; extern VALUE cContext;
extern VALUE eError; extern VALUE eError;
@ -24,6 +25,8 @@ extern VALUE rb_whisper_model_s_new(VALUE context);
extern VALUE rb_whisper_segment_s_new(VALUE context, int index); extern VALUE rb_whisper_segment_s_new(VALUE context, int index);
extern void prepare_transcription(ruby_whisper_params *rwp, VALUE *context); extern void prepare_transcription(ruby_whisper_params *rwp, VALUE *context);
ID transcribe_option_names[1];
static void static void
ruby_whisper_free(ruby_whisper *rw) ruby_whisper_free(ruby_whisper *rw)
{ {
@ -633,6 +636,8 @@ init_ruby_whisper_context(VALUE *mWhisper)
{ {
cContext = rb_define_class_under(*mWhisper, "Context", rb_cObject); cContext = rb_define_class_under(*mWhisper, "Context", rb_cObject);
transcribe_option_names[0] = id_n_processors;
rb_define_alloc_func(cContext, ruby_whisper_allocate); rb_define_alloc_func(cContext, ruby_whisper_allocate);
rb_define_method(cContext, "initialize", ruby_whisper_initialize, -1); rb_define_method(cContext, "initialize", ruby_whisper_initialize, -1);

View File

@ -13,6 +13,7 @@ extern const rb_data_type_t ruby_whisper_params_type;
extern ID id_to_s; extern ID id_to_s;
extern ID id_call; extern ID id_call;
extern ID transcribe_option_names[1];
extern void extern void
prepare_transcription(ruby_whisper_params * rwp, VALUE * self); prepare_transcription(ruby_whisper_params * rwp, VALUE * self);
@ -34,9 +35,14 @@ VALUE
ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self) { ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self) {
ruby_whisper *rw; ruby_whisper *rw;
ruby_whisper_params *rwp; ruby_whisper_params *rwp;
VALUE wave_file_path, blk, params; VALUE wave_file_path, blk, params, kws;
VALUE opts[1];
rb_scan_args_kw(RB_SCAN_ARGS_LAST_HASH_KEYWORDS, argc, argv, "2:&", &wave_file_path, &params, &kws, &blk);
rb_get_kwargs(kws, transcribe_option_names, 0, 1, opts);
int n_processors = opts[0] == Qundef ? 1 : NUM2INT(opts[0]);
rb_scan_args(argc, argv, "02&", &wave_file_path, &params, &blk);
TypedData_Get_Struct(self, ruby_whisper, &ruby_whisper_type, rw); TypedData_Get_Struct(self, ruby_whisper, &ruby_whisper_type, rw);
TypedData_Get_Struct(params, ruby_whisper_params, &ruby_whisper_params_type, rwp); TypedData_Get_Struct(params, ruby_whisper_params, &ruby_whisper_params_type, rwp);
@ -66,7 +72,7 @@ ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self) {
prepare_transcription(rwp, &self); prepare_transcription(rwp, &self);
if (whisper_full_parallel(rw->context, rwp->params, pcmf32.data(), pcmf32.size(), 1) != 0) { if (whisper_full_parallel(rw->context, rwp->params, pcmf32.data(), pcmf32.size(), n_processors) != 0) {
fprintf(stderr, "failed to process audio\n"); fprintf(stderr, "failed to process audio\n");
return self; return self;
} }
@ -76,9 +82,8 @@ ruby_whisper_transcribe(int argc, VALUE *argv, VALUE self) {
const char * text = whisper_full_get_segment_text(rw->context, i); const char * text = whisper_full_get_segment_text(rw->context, i);
output = rb_str_concat(output, rb_str_new2(text)); output = rb_str_concat(output, rb_str_new2(text));
} }
VALUE idCall = id_call;
if (blk != Qnil) { if (blk != Qnil) {
rb_funcall(blk, idCall, 1, output); rb_funcall(blk, id_call, 1, output);
} }
return self; return self;
} }

View File

@ -25,7 +25,7 @@ module Whisper
def self.system_info_str: () -> String def self.system_info_str: () -> String
class Context class Context
def self.new: (path | ::URI::HTTP) -> instance def self.new: (String | path | ::URI::HTTP) -> instance
# transcribe a single file # transcribe a single file
# can emit to a block results # can emit to a block results
@ -36,8 +36,8 @@ module Whisper
# puts text # puts text
# end # end
# #
def transcribe: (string, Params) -> self def transcribe: (string, Params, ?n_processors: Integer) -> self
| (string, Params) { (String) -> void } -> self | (string, Params, ?n_processors: Integer) { (String) -> void } -> self
def model_n_vocab: () -> Integer def model_n_vocab: () -> Integer
def model_n_audio_ctx: () -> Integer def model_n_audio_ctx: () -> Integer

View File

@ -20,6 +20,24 @@ class TestWhisper < TestBase
} }
end end
def test_transcribe_non_parallel
@whisper = Whisper::Context.new("base.en")
params = Whisper::Params.new
@whisper.transcribe(AUDIO, params, n_processors: 1) {|text|
assert_match(/ask not what your country can do for you, ask what you can do for your country/, text)
}
end
def test_transcribe_n_processors
@whisper = Whisper::Context.new("base.en")
params = Whisper::Params.new
@whisper.transcribe(AUDIO, params, n_processors: 4) {|text|
assert_match(/ask not what your country can do for you[,.] ask what you can do for your country/i, text)
}
end
sub_test_case "After transcription" do sub_test_case "After transcription" do
def test_full_n_segments def test_full_n_segments
assert_equal 1, whisper.full_n_segments assert_equal 1, whisper.full_n_segments

View File

@ -4,7 +4,7 @@ Gem::Specification.new do |s|
s.name = "whispercpp" s.name = "whispercpp"
s.authors = ["Georgi Gerganov", "Todd A. Fisher"] s.authors = ["Georgi Gerganov", "Todd A. Fisher"]
s.version = '1.3.3' s.version = '1.3.3'
s.date = '2025-06-01' s.date = '2025-06-03'
s.description = %q{High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model via Ruby} s.description = %q{High-performance inference of OpenAI's Whisper automatic speech recognition (ASR) model via Ruby}
s.email = 'todd.fisher@gmail.com' s.email = 'todd.fisher@gmail.com'
s.extra_rdoc_files = ['LICENSE', 'README.md'] s.extra_rdoc_files = ['LICENSE', 'README.md']
@ -21,7 +21,7 @@ Gem::Specification.new do |s|
} }
s.summary = %q{Ruby whisper.cpp bindings} s.summary = %q{Ruby whisper.cpp bindings}
s.test_files = s.files.select {|file| file.start_with? "tests/"} s.test_files = s.files.select {|file| file.start_with? "test/"}
s.extensions << 'ext/extconf.rb' s.extensions << 'ext/extconf.rb'
s.required_ruby_version = '>= 3.1.0' s.required_ruby_version = '>= 3.1.0'