Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation
- Muhammad ElNokrashy ,
- Amr Hendy ,
- Mohamed Maher ,
- Mohamed Afify ,
- Hany Hassan Awadalla
AMTA |
This paper proposes a simple yet effective method to improve direct (X-to-Y) translation for both cases: zero-shot and when direct data is available. We modify the input tokens at both the encoder and decoder to include signals for the source and target languages. We show a performance gain when training from scratch, or finetuning a pretrained model with the proposed setup. In the experiments, our method shows nearly 10.0 BLEU points gain on in-house datasets depending on the checkpoint selection criteria.