Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan Step-level Value Preference Optimization for Mathematical Reasoning https://arxiv.org/abs/2406.10858
Dojun Park, Jiwoo Lee, Seohyun Park, Hyeyun Jeong, Youngeun Koo, Soonha Hwang, Seonwoo Park, Sungeun Lee MultiPragEval: Multilingual Pragmatic Evaluation of Large Language Models https://arxiv.org/abs/2406.07736
Peng Hu, Changjiang Gao, Ruiqi Gao, Jiajun Chen, Shujian Huang Large Language Models are Limited in Out-of-Context Knowledge Reasoning https://arxiv.org/abs/2406.07393
Sander Land, Max Bartolo Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models https://arxiv.org/abs/2405.05417
Guoxin Chen, Minpeng Liao, Chengxi Li, Kai Fan AlphaMath Almost Zero: Process Supervision without Process https://arxiv.org/abs/2405.03553
Tianhui Zhang, Bei Peng, Danushka Bollegala Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning https://arxiv.org/abs/2404.16807
Farnaz Kohankhaki, D. B. Emerson, Jacob-Junqi Tian, Laleh Seyyed-Kalantari, Faiza Khan Khattak The Impact of Unstated Norms in Bias Analysis of Language Models https://arxiv.org/abs/2404.03471
Moxin Li, Wenjie Wang, Fuli Feng, Fengbin Zhu, Qifan Wang, Tat-Seng Chua Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection https://arxiv.org/abs/2403.09972
Hui Huang, Yingqi Qu, Jing Liu, Muyun Yang, Bing Xu, Tiejun Zhao, Wenpeng Lu Self-Evaluation of Large Language Model based on Glass-box Features https://arxiv.org/abs/2403.04222
Huy Quoc To, Ming Liu, Guangyan Huang, Hung-Nghiep Tran, Andr'e Greiner-Petter, Felix Beierle, Akiko Aizawa SKT5SciSumm -- Revisiting Extractive-Generative Approach for Multi-Document Scientific Summarization https://arxiv.org/abs/2402.17311