Liang Qiu

Selected publications

2026Anthropic
Automated weak-to-strong researcher
J. Wen*, L. Qiu*, J. Benton, J. H. Kirchner, J. Leike
web blog code
2025NeurIPS
Think-RM: Enabling long-horizon reasoning in generative reward models
I. Hong, C. Yu, L. Qiu, W. Yan, Z. Xu, H. Jiang, Q. Zhang, Q. Lu, X. Liu, C. Zhang, T. Zhao
arxiv code
2025NeurIPS
Ask a strong LLM judge when your reward model is uncertain
Z. Xu, Q. Lu, Q. Zhang, L. Qiu, I. Hong, C. Yu, W. Yao, Y. Liu, H. Jiang, L. Li, H. Yun, T. Zhao
arxiv code
2025COLM
Self-rewarding PPO: Aligning large language models with demonstrations only
Q. Zhang, L. Qiu, I. Hong, Z. Xu, T. Liu, S. Li, R. Zhang, Z. Li, L. Li, B. Yin, C. Zhang, J. Chen, H. Jiang, T. Zhao
pdf
2025EMNLP
WebAgent-R1: Training web agents via end-to-end multi-turn reinforcement learning
Z. Wei, W. Yao, Y. Liu, W. Zhang, Q. Lu, L. Qiu, C. Yu, P. Xu, C. Zhang, B. Ying, H. Yun, L. Li
arxiv code
2025ICML
Discriminative finetuning of generative large language models without reward models and preference data
S. Guo, I. Hong, V. Balmaseda, C. Yu, L. Qiu, X. Liu, H. Jiang, T. Zhao, T. Yang
arxiv code
2024ACL
Aligning large language models via fine-grained supervision
D. Xu, L. Qiu, M. Kim, F. Ladhak, J. Do
arxiv
2023ACL
A survey of deep learning for mathematical reasoning
P. Lu, L. Qiu, W. Yu, S. Welleck, K.-W. Chang
arxiv web
2023ICLR
Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning
P. Lu, L. Qiu, K.-W. Chang, Y. N. Wu, S.-C. Zhu, T. Rajpurohit, P. Clark, A. Kalyan
arxiv web code
2022NeurIPS
Learn to explain: Multimodal reasoning via thought chains for science question answering
P. Lu, S. Mishra, T. Xia, L. Qiu, K.-W. Chang, S.-C. Zhu, O. Tafjord, P. Clark, A. Kalyan
pdf web code
2022SIGDIAL
Towards socially intelligent agents with mental state transition and human utility
L. Qiu*, Y. Zhao*, Y. Liang, P. Lu, W. Shi, Z. Yu, S.-C. Zhu
arxiv
2022AAAI
ValueNet: A new dataset for human value driven dialogue system
L. Qiu, Y. Zhao, J. Li, P. Lu, B. Peng, J. Gao, S.-C. Zhu
arxiv paper web
2022AAAI
Learning from the Tangram to solve mini visual tasks
Y. Zhao, L. Qiu, P. Lu, F. Shi, T. Han, S.-C. Zhu
arxiv paper data
2021NeurIPS D&B
IconQA: A new benchmark for abstract diagram understanding and visual language reasoning
P. Lu, L. Qiu, J. Chen, T. Xia, Y. Zhao, W. Zhang, Z. Yu, X. Liang, S.-C. Zhu
pdf code web
2021ACL-IJCNLP
SocAoG: Incremental graph parsing for social relation inference in dialogues
L. Qiu, Y. Liang, Y. Zhao, P. Lu, B. Peng, Z. Yu, Y. N. Wu, S.-C. Zhu
arxiv paper
2021ACL-IJCNLP
Inter-GPS: Interpretable geometry problem solving with formal language and symbolic reasoning
P. Lu, R. Gong, S. Jiang, L. Qiu, S. Huang, X. Liang, S.-C. Zhu
arxiv code web
2020EMNLP
Structured attention for unsupervised dialogue structure induction
L. Qiu, Y. Zhao, W. Shi, Y. Liang, F. Shi, T. Yuan, Z. Yu, S.-C. Zhu
arxiv paper code

For a full list, see my Google Scholar.

Contact

liangqiu at outlook dot com — general inquiries

liang at anthropic dot com — Anthropic‑related topics