ExternalVendorCode/whisper.cpp

Fork 0

mirror of https://github.com/ggerganov/whisper.cpp.git synced 2025-04-24 04:56:03 +00:00

History

Olli ef6cf357e7

CI / determine-tag (push) Waiting to run

Details

CI / ubuntu-22 (linux/amd64) (push) Waiting to run

Details

CI / ubuntu-22 (linux/ppc64le) (push) Waiting to run

Details

CI / ubuntu-22-arm64 (linux/arm64) (push) Waiting to run

Details

CI / ubuntu-22-arm-v7 (linux/arm/v7) (push) Waiting to run

Details

CI / macOS-latest (generic/platform=iOS) (push) Waiting to run

Details

CI / macOS-latest (generic/platform=macOS) (push) Waiting to run

Details

CI / macOS-latest (generic/platform=tvOS) (push) Waiting to run

Details

CI / freeBSD-latest (push) Waiting to run

Details

CI / ubuntu-22-gcc (linux/amd64, Debug) (push) Waiting to run

Details

CI / ubuntu-22-gcc (linux/amd64, Release) (push) Waiting to run

Details

CI / ubuntu-22-gcc (linux/ppc64le, Debug) (push) Waiting to run

Details

CI / ubuntu-22-gcc (linux/ppc64le, Release) (push) Waiting to run

Details

CI / ubuntu-22-gcc-arm64 (linux/arm64, Debug) (push) Waiting to run

Details

CI / ubuntu-22-gcc-arm64 (linux/arm64, Release) (push) Waiting to run

Details

CI / ubuntu-22-gcc-arm-v7 (linux/arm/v7, Debug) (push) Waiting to run

Details

CI / ubuntu-22-gcc-arm-v7 (linux/arm/v7, Release) (push) Waiting to run

Details

CI / ubuntu-22-clang (linux/amd64, Debug) (push) Waiting to run

Details

CI / ubuntu-22-clang (linux/amd64, Release) (push) Waiting to run

Details

CI / ubuntu-22-clang (linux/arm64, Debug) (push) Waiting to run

Details

CI / ubuntu-22-clang (linux/arm64, Release) (push) Waiting to run

Details

CI / ubuntu-22-clang (linux/ppc64le, Debug) (push) Waiting to run

Details

CI / ubuntu-22-clang (linux/ppc64le, Release) (push) Waiting to run

Details

CI / ubuntu-22-gcc-sanitized (linux/amd64, ADDRESS) (push) Waiting to run

Details

CI / ubuntu-22-gcc-sanitized (linux/amd64, THREAD) (push) Waiting to run

Details

CI / ubuntu-22-gcc-sanitized (linux/amd64, UNDEFINED) (push) Waiting to run

Details

CI / ubuntu-22-cmake-sycl (linux/amd64, icx, icpx, ON) (push) Waiting to run

Details

CI / ubuntu-22-cmake-sycl (linux/arm/v7, icx, icpx, ON) (push) Waiting to run

Details

CI / ubuntu-22-cmake-sycl (linux/arm64, icx, icpx, ON) (push) Waiting to run

Details

CI / ubuntu-22-cmake-sycl (linux/ppc64le, icx, icpx, ON) (push) Waiting to run

Details

CI / ubuntu-22-cmake-sycl-fp16 (linux/amd64, icx, icpx, ON) (push) Waiting to run

Details

CI / ubuntu-22-cmake-sycl-fp16 (linux/arm/v7, icx, icpx, ON) (push) Waiting to run

Details

CI / ubuntu-22-cmake-sycl-fp16 (linux/arm64, icx, icpx, ON) (push) Waiting to run

Details

CI / ubuntu-22-cmake-sycl-fp16 (linux/ppc64le, icx, icpx, ON) (push) Waiting to run

Details

CI / windows-msys2 (Release, clang-x86_64, CLANG64) (push) Waiting to run

Details

CI / windows-msys2 (Release, ucrt-x86_64, UCRT64) (push) Waiting to run

Details

CI / windows (Win32, Release, win32-x86, x86, 2.28.5, ON) (push) Waiting to run

Details

CI / windows (x64, Release, win32-x86-64, x64, 2.28.5, ON) (push) Waiting to run

Details

CI / windows-blas (Win32, ON, Release, x86, 2.28.5, ON) (push) Waiting to run

Details

CI / windows-blas (x64, ON, Release, x64, 2.28.5, ON) (push) Waiting to run

Details

CI / windows-cublas (x64, Release, ON, 11.8.0, ON, 2.28.5) (push) Waiting to run

Details

CI / windows-cublas (x64, Release, ON, 12.2.0, ON, 2.28.5) (push) Waiting to run

Details

CI / emscripten (Release) (push) Waiting to run

Details

CI / ios-xcode-build (Release) (push) Blocked by required conditions

Details

CI / android (push) Waiting to run

Details

CI / android_java (push) Waiting to run

Details

CI / quantize (push) Waiting to run

Details

CI / release (push) Blocked by required conditions

Details

CI / coreml-base-en (push) Blocked by required conditions

Details

Publish Docker image / Push Docker image to Docker Hub (map[dockerfile:.devops/main.Dockerfile platform:linux/amd64 tag:main]) (push) Waiting to run

Details

Examples WASM / deploy-wasm-github-pages (push) Waiting to run

Details

Bindings Tests (Ruby) / ubuntu-22 (push) Has been cancelled

Details

ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (#3022 )

Change param_names and values not to be references for rb_get_kwargs - so it can be compiled on ruby 3.3.6 and 3.4.1

2025-04-09 20:49:25 +09:00

ext

ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (#3022 )

2025-04-09 20:49:25 +09:00

lib/whisper/model

ruby : Update uri.rb (#3016 )

2025-04-08 22:27:40 +09:00

sig

ruby : Make context accept initial parameters, API to retrieve a segment and more (#2749 )

2025-01-21 09:39:54 +02:00

tests

ruby : fix test failures in test_whisper (#2955 )

2025-03-28 17:29:56 +09:00

.gitignore

ruby : Fix of C++ header guard name, model URI support, type signature and more (#2683 )

2024-12-30 14:26:35 +02:00

extsources.rb

ruby : Follow source tree change (#2580 )

2024-11-21 17:04:29 +02:00

Rakefile

ruby : Make context accept initial parameters, API to retrieve a segment and more (#2749 )

2025-01-21 09:39:54 +02:00

README.md

rename : ggerganov -> ggml-org (#3005 )

2025-04-04 16:11:52 +03:00

whispercpp.gemspec

ruby : change homepage URI in Ruby gemspec (#3007 )

2025-04-05 07:55:09 +03:00

README.md

whispercpp

Ruby bindings for whisper.cpp, an interface of automatic speech recognition model.

Installation

Install the gem and add to the application's Gemfile by executing:

$ bundle add whispercpp

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install whispercpp

Usage

require "whisper"

whisper = Whisper::Context.new("base")

params = Whisper::Params.new(
  language: "en",
  offset: 10_000,
  duration: 60_000,
  max_text_tokens: 300,
  translate: true,
  print_timestamps: false,
  initial_prompt: "Initial prompt here."
)

whisper.transcribe("path/to/audio.wav", params) do |whole_text|
  puts whole_text
end

Preparing model

Some models are prepared up-front:

base_en = Whisper::Model.pre_converted_models["base.en"]
whisper = Whisper::Context.new(base_en)

At first time you use a model, it is downloaded automatically. After that, downloaded cached file is used. To clear cache, call #clear_cache:

Whisper::Model.pre_converted_models["base"].clear_cache

You also can use shorthand for pre-converted models:

whisper = Whisper::Context.new("base.en")

You can see the list of prepared model names by Whisper::Model.pre_converted_models.keys:

puts Whisper::Model.pre_converted_models.keys
# tiny
# tiny.en
# tiny-q5_1
# tiny.en-q5_1
# tiny-q8_0
# base
# base.en
# base-q5_1
# base.en-q5_1
# base-q8_0
#   :
#   :

You can also use local model files you prepared:

whisper = Whisper::Context.new("path/to/your/model.bin")

Or, you can download model files:

whisper = Whisper::Context.new("https://example.net/uri/of/your/model.bin")
# Or
whisper = Whisper::Context.new(URI("https://example.net/uri/of/your/model.bin"))

See models page for details.

Preparing audio file

Currently, whisper.cpp accepts only 16-bit WAV files.

API

Segments

Once Whisper::Context#transcribe called, you can retrieve segments by #each_segment:

def format_time(time_ms)
  sec, decimal_part = time_ms.divmod(1000)
  min, sec = sec.divmod(60)
  hour, min = min.divmod(60)
  "%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end

whisper
  .transcribe("path/to/audio.wav", params)
  .each_segment.with_index do |segment, index|
    line = "[%{nth}: %{st} --> %{ed}] %{text}" % {
      nth: index + 1,
      st: format_time(segment.start_time),
      ed: format_time(segment.end_time),
      text: segment.text
    }
    line << " (speaker turned)" if segment.speaker_next_turn?
    puts line
  end

You can also add hook to params called on new segment:

# Add hook before calling #transcribe
params.on_new_segment do |segment|
  line = "[%{st} --> %{ed}] %{text}" % {
    st: format_time(segment.start_time),
    ed: format_time(segment.end_time),
    text: segment.text
  }
  line << " (speaker turned)" if segment.speaker_next_turn?
  puts line
end

whisper.transcribe("path/to/audio.wav", params)

Models

You can see model information:

whisper = Whisper::Context.new("base")
model = whisper.model

model.n_vocab # => 51864
model.n_audio_ctx # => 1500
model.n_audio_state # => 512
model.n_audio_head # => 8
model.n_audio_layer # => 6
model.n_text_ctx # => 448
model.n_text_state # => 512
model.n_text_head # => 8
model.n_text_layer # => 6
model.n_mels # => 80
model.ftype # => 1
model.type # => "base"

Logging

You can set log callback:

prefix = "[MyApp] "
log_callback = ->(level, buffer, user_data) {
  case level
  when Whisper::LOG_LEVEL_NONE
    puts "#{user_data}none: #{buffer}"
  when Whisper::LOG_LEVEL_INFO
    puts "#{user_data}info: #{buffer}"
  when Whisper::LOG_LEVEL_WARN
    puts "#{user_data}warn: #{buffer}"
  when Whisper::LOG_LEVEL_ERROR
    puts "#{user_data}error: #{buffer}"
  when Whisper::LOG_LEVEL_DEBUG
    puts "#{user_data}debug: #{buffer}"
  when Whisper::LOG_LEVEL_CONT
    puts "#{user_data}same to previous: #{buffer}"
  end
}
Whisper.log_set log_callback, prefix

Using this feature, you are also able to suppress log:

Whisper.log_set ->(level, buffer, user_data) {
  # do nothing
}, nil
Whisper::Context.new("base")

Low-level API to transcribe

You can also call Whisper::Context#full and #full_parallel with a Ruby array as samples. Although #transcribe with audio file path is recommended because it extracts PCM samples in C++ and is fast, #full and #full_parallel give you flexibility.

require "whisper"
require "wavefile"

reader = WaveFile::Reader.new("path/to/audio.wav", WaveFile::Format.new(:mono, :float, 16000))
samples = reader.enum_for(:each_buffer).map(&:samples).flatten

whisper = Whisper::Context.new("base")
whisper
  .full(Whisper::Params.new, samples)
  .each_segment do |segment|
    puts segment.text
  end

The second argument samples may be an array, an object with length and each method, or a MemoryView. If you can prepare audio data as C array and export it as a MemoryView, whispercpp accepts and works with it with zero copy.

Development

% git clone https://github.com/ggml-org/whisper.cpp.git
% cd whisper.cpp/bindings/ruby
% rake test

First call of rake test builds an extension and downloads a model for testing. After that, you add tests in tests directory and modify ext/ruby_whisper.cpp.

If something seems wrong on build, running rake clean solves some cases.

License

The same to whisper.cpp.