whisper.cpp/bindings/ruby/README.md

whispercpp
==========

![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg)

Ruby bindings for [whisper.cpp][], an interface of automatic speech recognition model.

Installation
------------

Install the gem and add to the application's Gemfile by executing:

    $ bundle add whispercpp

If bundler is not being used to manage dependencies, install the gem by executing:

    $ gem install whispercpp

Usage
-----

```ruby
require "whisper"

whisper = Whisper::Context.new("path/to/model.bin")

params = Whisper::Params.new
params.language = "en"
params.offset = 10_000
params.duration = 60_000
params.max_text_tokens = 300
params.translate = true
params.print_timestamps = false
params.prompt = "Initial prompt here."

whisper.transcribe("path/to/audio.wav", params) do |whole_text|
  puts whole_text
end

```

### Preparing model ###

Use script to download model file(s):

```bash
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
sh ./models/download-ggml-model.sh base.en
```

There are some types of models. See [models][] page for details.

### Preparing audio file ###

Currently, whisper.cpp accepts only 16-bit WAV files.

### API ###

Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:

```ruby
def format_time(time_ms)
  sec, decimal_part = time_ms.divmod(1000)
  min, sec = sec.divmod(60)
  hour, min = min.divmod(60)
  "%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end

whisper.transcribe("path/to/audio.wav", params)

whisper.each_segment.with_index do |segment, index|
  line = "[%{nth}: %{st} --> %{ed}] %{text}" % {
    nth: index + 1,
    st: format_time(segment.start_time),
    ed: format_time(segment.end_time),
    text: segment.text
  }
  line << " (speaker turned)" if segment.speaker_next_turn?
  puts line
end

```

You can also add hook to params called on new segment:

```ruby
def format_time(time_ms)
  sec, decimal_part = time_ms.divmod(1000)
  min, sec = sec.divmod(60)
  hour, min = min.divmod(60)
  "%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end

# Add hook before calling #transcribe
params.on_new_segment do |segment|
  line = "[%{st} --> %{ed}] %{text}" % {
    st: format_time(segment.start_time),
    ed: format_time(segment.end_time),
    text: segment.text
  }
  line << " (speaker turned)" if segment.speaker_next_turn?
  puts line
end

whisper.transcribe("path/to/audio.wav", params)

```

[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models
ruby : support new-segment callback (#2506) * Add Params#new_segment_callback= method * Add tests for Params#new_segment_callback= * Group tests for #transcribe * Don't use static for thread-safety * Set new_segment_callback only when necessary * Remove redundant check * [skip ci] Add Ruby version README * Revert "Group tests for #transcribe" This reverts commit 71b65b00ccf1816c9ea8a247fb30f71bc09707d3. * Revert "Add tests for Params#new_segment_callback=" This reverts commit 81e6df3bab7662da5379db51f28a989db7408c02. * Add test for Context#full_n_segments * Add Context#full_n_segments * Add tests for lang API * Add lang API * Add tests for Context#full_lang_id API * Add Context#full_lang_id * Add abnormal test cases for lang * Raise appropriate errors from lang APIs * Add tests for Context#full_get_segment_t{0,1} API * Add Context#full_get_segment_t{0,1} * Add tests for Context#full_get_segment_speaker_turn_next API * Add Context#full_get_segment_speaker_turn_next * Add tests for Context#full_get_segment_text * Add Context#full_get_setgment_text * Add tests for Params#new_segment_callback= * Run new segment callback * Split tests to multiple files * Use container struct for new segment callback * Add tests for Params#new_segment_callback_user_data= * Add Whisper::Params#new_user_callback_user_data= * Add GC-related test for new segment callback * Protect new segment callback related structs from GC * Add meaningful test for build * Rename: new_segment_callback_user_data -> new_segment_callback_container * Add tests for Whisper::Segment * Add Whisper::Segment and Whisper::Context#each_segment * Extract c_ruby_whisper_callback_container_allocate() * Add test for Whisper::Params#on_new_segment * Add Whisper::Params#on_new_egment * Assign symbol IDs to variables * Make extsources.yaml simpler * Update README * Add document comments * Add test for calling Whisper::Params#on_new_segment multiple times * Add file dependencies to GitHub actions config and .gitignore * Add more files to ext/.gitignore 2024-10-28 13:43:27 +00:00			`whispercpp`
			`==========`

			`![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg)`

			`Ruby bindings for [whisper.cpp][], an interface of automatic speech recognition model.`

			`Installation`
			`------------`

			`Install the gem and add to the application's Gemfile by executing:`

			`$ bundle add whispercpp`

			`If bundler is not being used to manage dependencies, install the gem by executing:`

			`$ gem install whispercpp`

			`Usage`
			`-----`

			```ruby
			`require "whisper"`

			`whisper = Whisper::Context.new("path/to/model.bin")`

			`params = Whisper::Params.new`
			`params.language = "en"`
			`params.offset = 10_000`
			`params.duration = 60_000`
			`params.max_text_tokens = 300`
			`params.translate = true`
			`params.print_timestamps = false`
ruby : add more APIs (#2518) * Add test for built package existence * Add more tests for Whisper::Params * Add more Whisper::Params attributes * Add tests for callbacks * Add progress and abort callback features * [skip ci] Add prompt usage in README * Change prompt text in example 2024-10-28 17:23:23 +00:00			`params.prompt = "Initial prompt here."`
ruby : support new-segment callback (#2506) * Add Params#new_segment_callback= method * Add tests for Params#new_segment_callback= * Group tests for #transcribe * Don't use static for thread-safety * Set new_segment_callback only when necessary * Remove redundant check * [skip ci] Add Ruby version README * Revert "Group tests for #transcribe" This reverts commit 71b65b00ccf1816c9ea8a247fb30f71bc09707d3. * Revert "Add tests for Params#new_segment_callback=" This reverts commit 81e6df3bab7662da5379db51f28a989db7408c02. * Add test for Context#full_n_segments * Add Context#full_n_segments * Add tests for lang API * Add lang API * Add tests for Context#full_lang_id API * Add Context#full_lang_id * Add abnormal test cases for lang * Raise appropriate errors from lang APIs * Add tests for Context#full_get_segment_t{0,1} API * Add Context#full_get_segment_t{0,1} * Add tests for Context#full_get_segment_speaker_turn_next API * Add Context#full_get_segment_speaker_turn_next * Add tests for Context#full_get_segment_text * Add Context#full_get_setgment_text * Add tests for Params#new_segment_callback= * Run new segment callback * Split tests to multiple files * Use container struct for new segment callback * Add tests for Params#new_segment_callback_user_data= * Add Whisper::Params#new_user_callback_user_data= * Add GC-related test for new segment callback * Protect new segment callback related structs from GC * Add meaningful test for build * Rename: new_segment_callback_user_data -> new_segment_callback_container * Add tests for Whisper::Segment * Add Whisper::Segment and Whisper::Context#each_segment * Extract c_ruby_whisper_callback_container_allocate() * Add test for Whisper::Params#on_new_segment * Add Whisper::Params#on_new_egment * Assign symbol IDs to variables * Make extsources.yaml simpler * Update README * Add document comments * Add test for calling Whisper::Params#on_new_segment multiple times * Add file dependencies to GitHub actions config and .gitignore * Add more files to ext/.gitignore 2024-10-28 13:43:27 +00:00
			`whisper.transcribe("path/to/audio.wav", params) do \|whole_text\|`
			`puts whole_text`
			`end`

			```

			`### Preparing model ###`

			`Use script to download model file(s):`

			```bash
			`git clone https://github.com/ggerganov/whisper.cpp.git`
			`cd whisper.cpp`
			`sh ./models/download-ggml-model.sh base.en`
			```

			`There are some types of models. See [models][] page for details.`

			`### Preparing audio file ###`

			`Currently, whisper.cpp accepts only 16-bit WAV files.`

			`### API ###`

			Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:

			```ruby
			`def format_time(time_ms)`
			`sec, decimal_part = time_ms.divmod(1000)`
			`min, sec = sec.divmod(60)`
			`hour, min = min.divmod(60)`
			`"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]`
			`end`

			`whisper.transcribe("path/to/audio.wav", params)`

			`whisper.each_segment.with_index do \|segment, index\|`
			`line = "[%{nth}: %{st} --> %{ed}] %{text}" % {`
			`nth: index + 1,`
			`st: format_time(segment.start_time),`
			`ed: format_time(segment.end_time),`
			`text: segment.text`
			`}`
			`line << " (speaker turned)" if segment.speaker_next_turn?`
			`puts line`
			`end`

			```

			`You can also add hook to params called on new segment:`

			```ruby
			`def format_time(time_ms)`
			`sec, decimal_part = time_ms.divmod(1000)`
			`min, sec = sec.divmod(60)`
			`hour, min = min.divmod(60)`
			`"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]`
			`end`

			`# Add hook before calling #transcribe`
			`params.on_new_segment do \|segment\|`
			`line = "[%{st} --> %{ed}] %{text}" % {`
			`st: format_time(segment.start_time),`
			`ed: format_time(segment.end_time),`
			`text: segment.text`
			`}`
			`line << " (speaker turned)" if segment.speaker_next_turn?`
			`puts line`
			`end`

			`whisper.transcribe("path/to/audio.wav", params)`

			```

			`[whisper.cpp]: https://github.com/ggerganov/whisper.cpp`
			`[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models`