Despite the fantastic progress we've seen recently in cross-lingual modeling, the best systems still make a lot of factual errors.
To address this, here is our work on 🚨 Evaluating and Modeling Attribution for Cross-Lingual Question Answering 🚨
#1 Attribution Evaluation: Our work is the first to study attribution for cross-lingual QA. We collect attribution data in 5 languages (Bengali, Finnish Japanese, Russian, and Telugu)
With this data, we find that even state-of-the-art cross-lingual open-retrieval QA systems (e.g. CORA) lack attribution. Additionally, we find that passages retrieved cross-lingually contribute only moderately to the attribution level of the system, calling for progress in this area.
#2 Attribution Detection Modeling: We experiment with a wide range of attribution detection models to address this issue.
We find that NLI models and PaLM 2, fine-tuned on a very small number of attribution examples (~100), reach above 90% accuracy on attribution detection, leading to significantly improving the attribution level of CORA.
Attribution is one of the most promising directions to improve trust in NLP systems: Our results show the potential of using attribution detection models to improve it for cross-lingual question answering.
Work done while interning at Google Research last summer with
@johnwieting2 @JonClarkSeattle @seb_ruder @tmkwiat @liviobs @roeeaharoni @jonherzig @cindyxinyiwang
Thanks to
@dipanjand, Michael Collins, Vitaly Nikolaev,
@jasonriesa, and @pat_verga for supporting the project and to
@AkariAsai for fruitful discussions about CORA.
Paper available here:
arxiv.org/abs/2305.14332