Pre-trained language models (PLMs) are known to improve the generalization perfor- mance of natural language understanding models by leveraging large amounts of data during the pre-training phase. However, the out-of-distribution (OOD) generalization problem remains a challenge in many NLP tasks, limiting the real-world deployment of these methods. This paper presents the first at- tempt at creating a unified benchmark named GLUE-X for evaluating OOD robustness in NLP models, highlighting the importance of OOD robustness and providing insights on how to measure the robustness of a model and how to improve it. The benchmark includes 15 publicly available datasets for OOD testing, and evaluations are conducted on 8 classic NLP tasks over 21 popularly used PLMs. Our findings confirm the need for improved OOD accuracy in NLP tasks, as significant perfor- mance degradation was observed in all settings compared to in-distribution (ID) accuracy.
展开剩余(...)
预读部分内容
预读下一页
微信支付10元后自动下载x
您已支付成功!
提示:请勿删除浏览器缓存。
0
0
分享
上一篇: CEO的ESG困境
还没有评论,赶紧来抢沙发吧!