mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-06-18 14:58:09 +00:00

Files

Daniel Bevenius edbd4cb7f5 ruby : ignore "Downloading" output in test_log_suppress (#3106 )

This commit adds a temporary fix to the `test_log_suppress` test in the
Ruby bindings.

The motivation for this changes is that I suspect that the recent
migration of the models to HuggingFace Xet has changed the way HTTP
caching works for the models. This is causing the test in question to
fail. This is a temporary fix so that CI is not broken while we
investigate this further.

2025-05-01 09:12:48 +03:00

ext

ruby : add encoder begin callback related methods (#3076 )

2025-04-26 04:33:11 +09:00

lib/whisper/model

ruby : add encoder begin callback related methods (#3076 )

2025-04-26 04:33:11 +09:00

sig

ruby : add encoder begin callback related methods (#3076 )

2025-04-26 04:33:11 +09:00

tests

ruby : ignore "Downloading" output in test_log_suppress (#3106 )

2025-05-01 09:12:48 +03:00

.gitignore

ruby: use CMake in build process (#3043 )

2025-04-14 18:18:27 +09:00

extsources.rb

ruby: use CMake in build process (#3043 )

2025-04-14 18:18:27 +09:00

Rakefile

ruby: use CMake in build process (#3043 )

2025-04-14 18:18:27 +09:00

README.md

ruby : make Ruby bindings installed with build options (#3056 )

2025-04-17 18:49:58 +09:00

whispercpp.gemspec

ruby : add encoder begin callback related methods (#3076 )

2025-04-26 04:33:11 +09:00

README.md

whispercpp

Ruby bindings for whisper.cpp, an interface of automatic speech recognition model.

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add whispercpp

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install whispercpp

You can pass build options for whisper.cpp, for instance:

$ bundle config build.whispercpp --enable-ggml-cuda

or,

$ gem install whispercpp -- --enable-ggml-cuda

See whisper.cpp's README for available options. You need convert options present the README to Ruby-style options.
For boolean options like GGML_CUDA, the README says -DGGML_CUDA=1. You need strip -D, prepend --enable- for 1 or ON (--disable- for 0 or OFF) and make it kebab-case: --enable-ggml-cuda.
For options which require arguments like CMAKE_CUDA_ARCHITECTURES, the README says -DCMAKE_CUDA_ARCHITECTURES="86". You need strip -D, prepend --, make it kebab-case, append = and append argument: --cmake-cuda-architectures="86".

Usage

require "whisper"

whisper = Whisper::Context.new("base")

params = Whisper::Params.new(
  language: "en",
  offset: 10_000,
  duration: 60_000,
  max_text_tokens: 300,
  translate: true,
  print_timestamps: false,
  initial_prompt: "Initial prompt here."
)

whisper.transcribe("path/to/audio.wav", params) do |whole_text|
  puts whole_text
end

Preparing model

Some models are prepared up-front:

base_en = Whisper::Model.pre_converted_models["base.en"]
whisper = Whisper::Context.new(base_en)

At first time you use a model, it is downloaded automatically. After that, downloaded cached file is used. To clear cache, call #clear_cache:

Whisper::Model.pre_converted_models["base"].clear_cache

You also can use shorthand for pre-converted models:

whisper = Whisper::Context.new("base.en")

You can see the list of prepared model names by Whisper::Model.pre_converted_models.keys:

puts Whisper::Model.pre_converted_models.keys
# tiny
# tiny.en
# tiny-q5_1
# tiny.en-q5_1
# tiny-q8_0
# base
# base.en
# base-q5_1
# base.en-q5_1
# base-q8_0
#   :
#   :

You can also use local model files you prepared:

whisper = Whisper::Context.new("path/to/your/model.bin")

Or, you can download model files:

whisper = Whisper::Context.new("https://example.net/uri/of/your/model.bin")
# Or
whisper = Whisper::Context.new(URI("https://example.net/uri/of/your/model.bin"))

See models page for details.

Preparing audio file

Currently, whisper.cpp accepts only 16-bit WAV files.

API

Segments

Once Whisper::Context#transcribe called, you can retrieve segments by #each_segment:

def format_time(time_ms)
  sec, decimal_part = time_ms.divmod(1000)
  min, sec = sec.divmod(60)
  hour, min = min.divmod(60)
  "%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end

whisper
  .transcribe("path/to/audio.wav", params)
  .each_segment.with_index do |segment, index|
    line = "[%{nth}: %{st} --> %{ed}] %{text}" % {
      nth: index + 1,
      st: format_time(segment.start_time),
      ed: format_time(segment.end_time),
      text: segment.text
    }
    line << " (speaker turned)" if segment.speaker_next_turn?
    puts line
  end

You can also add hook to params called on new segment:

# Add hook before calling #transcribe
params.on_new_segment do |segment|
  line = "[%{st} --> %{ed}] %{text}" % {
    st: format_time(segment.start_time),
    ed: format_time(segment.end_time),
    text: segment.text
  }
  line << " (speaker turned)" if segment.speaker_next_turn?
  puts line
end

whisper.transcribe("path/to/audio.wav", params)

Models

You can see model information:

whisper = Whisper::Context.new("base")
model = whisper.model

model.n_vocab # => 51864
model.n_audio_ctx # => 1500
model.n_audio_state # => 512
model.n_audio_head # => 8
model.n_audio_layer # => 6
model.n_text_ctx # => 448
model.n_text_state # => 512
model.n_text_head # => 8
model.n_text_layer # => 6
model.n_mels # => 80
model.ftype # => 1
model.type # => "base"

Logging

You can set log callback:

prefix = "[MyApp] "
log_callback = ->(level, buffer, user_data) {
  case level
  when Whisper::LOG_LEVEL_NONE
    puts "#{user_data}none: #{buffer}"
  when Whisper::LOG_LEVEL_INFO
    puts "#{user_data}info: #{buffer}"
  when Whisper::LOG_LEVEL_WARN
    puts "#{user_data}warn: #{buffer}"
  when Whisper::LOG_LEVEL_ERROR
    puts "#{user_data}error: #{buffer}"
  when Whisper::LOG_LEVEL_DEBUG
    puts "#{user_data}debug: #{buffer}"
  when Whisper::LOG_LEVEL_CONT
    puts "#{user_data}same to previous: #{buffer}"
  end
}
Whisper.log_set log_callback, prefix

Using this feature, you are also able to suppress log:

Whisper.log_set ->(level, buffer, user_data) {
  # do nothing
}, nil
Whisper::Context.new("base")

Low-level API to transcribe

You can also call Whisper::Context#full and #full_parallel with a Ruby array as samples. Although #transcribe with audio file path is recommended because it extracts PCM samples in C++ and is fast, #full and #full_parallel give you flexibility.

require "whisper"
require "wavefile"

reader = WaveFile::Reader.new("path/to/audio.wav", WaveFile::Format.new(:mono, :float, 16000))
samples = reader.enum_for(:each_buffer).map(&:samples).flatten

whisper = Whisper::Context.new("base")
whisper
  .full(Whisper::Params.new, samples)
  .each_segment do |segment|
    puts segment.text
  end

The second argument samples may be an array, an object with length and each method, or a MemoryView. If you can prepare audio data as C array and export it as a MemoryView, whispercpp accepts and works with it with zero copy.

Development

% git clone https://github.com/ggml-org/whisper.cpp.git
% cd whisper.cpp/bindings/ruby
% rake test

First call of rake test builds an extension and downloads a model for testing. After that, you add tests in tests directory and modify ext/ruby_whisper.cpp.

If something seems wrong on build, running rake clean solves some cases.

License

The same to whisper.cpp.