Results analysis suggests three potential improvements:
1) Combine metrics of various strengths
2) Develop metrics that put more weight to the source instead of surface-level overlap with the reference
3) Include strategies to explicitly model more language-specific info
3/4