audio package

translate module

multimodaltranslation.audio.translate.audio_to_text(audio_bytes: bytes, model: str) str

Converts the audio files into text.

Parameters:
  • audio_bytes (-) – The bytes of the audio file.

  • model (-) – The path to the correct model as a string.

Returns:

The transcription of the audio.

Return type:

str

Raises:

RuntimeError – If the conversion of the audio file to wav type failed.

multimodaltranslation.audio.translate.convert_to_wav_bytes(audio_bytes: bytes) BytesIO

Converts the different audio types into wav (using ffmpeg) which is needed by our model.

Parameters:

audio_bytes (-) – The audio file in bytes.

Returns:

The converted audio file.

Return type:

io.BytesIO

Raises:

RuntimeError – If the conversion process fails.

multimodaltranslation.audio.translate.get_model(lang: str) str

Returns the path to the Vosk model for the given language. Downloads it if not already installed.

Parameters:

lang (str) – The language of the model.

Returns:

Path to the model folder as a string.

Return type:

str

Raises:

Exception – Language model not available.

multimodaltranslation.audio.translate.translate_audio(audio_bytes: bytes, lang: str, targets: list) list

Calls the audio_to_text to convert the audio into a trancsiped text.

Then translates it into desired langs using the translate_text() method.

Parameters:
  • audio_bytes (-) – The bytes of the audio file.

  • lang (-) – The original language of the audio.

  • targets (-) – A list of lanuages desired for translation.

Returns:

List of translated texts with the target language.

Return type:

list