About
=====
What is multimodal translation?
-------------------------------
| Multimodal translation is an advanced form of communication and translation that integrates and interprets information
from various sources, such as text, images, audio, and video, to convey a message accurately.
Simply put, it's translating content across various types of media.
Why is multimodality important?
-------------------------------
| When translating information that is in different formats and media types, it's hard to effectively grasp the context,
| and truly understand the meaning behind them.
|
| That's where multimodal translation comes in handy. It helps in understanding the context correctly and translate them accurately
| by using multiple modals like text, audio, video, etc... This technology is very important in systems where context awareness is required.
Types of multimodal translation:
--------------------------------
- **Text-to-text:** This is the simplest form where you can translate text from one language to another language.
- **Audio-to-text:** Here the audio is transcribed and then translated also into several languages.
- **Audio-to-audio:** May be implemented in the future. It's the same concept as audio to text but the output remains in audio format.
Technology used:
----------------
- **Speech recognition:** Important to recognize spoken language for interpretation and translation. Output can then be in text or audio format.
Limitations:
------------
- **language support:** Hard to support all languages, since every language has its own modal that has to be trained and installed into the application.
- **Maintaining context:** The context may change across different media. So it's a must to ensure the context remains correct.
Improvements:
-------------
* As mentioned above, audio to audio will be implemented in the future. Other media types can also be implemented like videos and images.
References:
-----------
* `The Era of Multimodal Translation `_
* `What is Multimodal Translation `_