mirror of
https://github.com/SevaSk/ecoute.git
synced 2025-01-11 23:42:42 +00:00
Merge branch 'main' into main
This commit is contained in:
commit
eadc4cab7c
@ -14,6 +14,10 @@ class BaseRecorder:
|
||||
self.recorder = sr.Recognizer()
|
||||
self.recorder.energy_threshold = ENERGY_THRESHOLD
|
||||
self.recorder.dynamic_energy_threshold = DYNAMIC_ENERGY_THRESHOLD
|
||||
|
||||
if source is None:
|
||||
raise ValueError("audio source can't be None")
|
||||
|
||||
self.source = source
|
||||
self.source_name = source_name
|
||||
|
||||
|
10
README.md
10
README.md
@ -16,7 +16,7 @@ Follow these steps to set up and run Ecoute on your local machine.
|
||||
### 📋 Prerequisites
|
||||
|
||||
- Python >=3.8.0
|
||||
- An OpenAI API key
|
||||
- An OpenAI API key that can access OpenAI API (set up a paid account OpenAI account)
|
||||
- Windows OS (Not tested on others)
|
||||
- FFmpeg
|
||||
|
||||
@ -31,7 +31,7 @@ Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManage
|
||||
Once Chocolatey is installed, you can install FFmpeg by running the following command in your PowerShell:
|
||||
|
||||
```
|
||||
choco install ffmpeg-full
|
||||
choco install ffmpeg
|
||||
```
|
||||
|
||||
Please ensure that you run these commands in a PowerShell window with administrator privileges. If you face any issues during the installation, you can visit the official Chocolatey and FFmpeg websites for troubleshooting.
|
||||
@ -80,7 +80,7 @@ Run the main script:
|
||||
python main.py
|
||||
```
|
||||
|
||||
For a better and faster version, use:
|
||||
For a more better and faster version that also works with most languages, use:
|
||||
|
||||
```
|
||||
python main.py --api
|
||||
@ -88,7 +88,7 @@ python main.py --api
|
||||
|
||||
Upon initiation, Ecoute will begin transcribing your microphone input and speaker output in real-time, generating a suggested response based on the conversation. Please note that it might take a few seconds for the system to warm up before the transcription becomes real-time.
|
||||
|
||||
The --api flag significantly enhances transcription speed and accuracy, and it's expected to be the default option in future releases. However, keep in mind that using the Whisper API will consume more OpenAI credits than using the local model. This increased cost is attributed to the advanced features and capabilities that the Whisper API provides. Despite the additional cost, the considerable improvements in speed and transcription accuracy might make it a worthwhile investment for your use case.
|
||||
The --api flag will use the whisper api for transcriptions. This significantly enhances transcription speed and accuracy, and it works in most languages (rather than just English without the flag). It's expected to become the default option in future releases. However, keep in mind that using the Whisper API will consume more OpenAI credits than using the local model. This increased cost is attributed to the advanced features and capabilities that the Whisper API provides. Despite the additional expense, the substantial improvements in speed and transcription accuracy may make it a worthwhile investment for your use case.
|
||||
|
||||
### ⚠️ Limitations
|
||||
|
||||
@ -98,7 +98,7 @@ While Ecoute provides real-time transcription and response suggestions, there ar
|
||||
|
||||
**Whisper Model**: If the --api flag is not used, we utilize the 'tiny' version of the Whisper ASR model, due to its low resource consumption and fast response times. However, this model may not be as accurate as the larger models in transcribing certain types of speech, including accents or uncommon words.
|
||||
|
||||
**Language**: The Whisper model used in Ecoute is set to English. As a result, it may not accurately transcribe non-English languages or dialects. We are actively working to add multi-language support to future versions of the program.
|
||||
**Language**: If you are not using the --api flag the Whisper model used in Ecoute is set to English. As a result, it may not accurately transcribe non-English languages or dialects. We are actively working to add multi-language support to future versions of the program.
|
||||
|
||||
## 📖 License
|
||||
|
||||
|
@ -32,7 +32,7 @@ class APIWhisperTranscriber:
|
||||
def get_transcription(self, wav_file_path):
|
||||
try:
|
||||
with open(wav_file_path, "rb") as audio_file:
|
||||
result = openai.Audio.translate("whisper-1", audio_file)
|
||||
result = openai.Audio.transcribe("whisper-1", audio_file)
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return ""
|
||||
|
6
main.py
6
main.py
@ -102,6 +102,12 @@ def create_ui_components(root):
|
||||
|
||||
|
||||
def main():
|
||||
try:
|
||||
subprocess.run(["ffmpeg", "-version"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
|
||||
except FileNotFoundError:
|
||||
print("ERROR: The ffmpeg library is not installed. Please install ffmpeg and try again.")
|
||||
return
|
||||
|
||||
root = ctk.CTk()
|
||||
(
|
||||
transcript_textbox,
|
||||
|
Loading…
Reference in New Issue
Block a user