Liang Qiu
Hi! My name is Liang Qiu. I'm a researcher focusing on language model alignment and reinforcement learning. I was a Senior Applied Scientist at Amazon and earned my Ph.D. in Electrical and Computer Engineering from UCLA, advised by Prof. Song-Chun Zhu and Prof. Achuta Kadambi.
My Research
My research interests lie in Natural Language Processing and Conversational AI. My long-term goal is to enhance both the EQ (aligning AI systems with human values, mental state modeling, and social reasoning) and the IQ (advancing their reasoning and decision-making capabilities).
Currently, I am working on automated research for scalable oversight.
Selected Publications
- Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
I. Hong, C. Yu, L. Qiu, W. Yan, Z. Xu, H. Jiang, Q. Zhang, Q. Lu, X. Liu, C. Zhang, T. Zhao.
NeurIPS 2025. [pdf] [code] - Ask a Strong LLM Judge when Your Reward Model is Uncertain
Z. Xu, Q. Lu, Q. Zhang, L. Qiu, I. Hong, C. Yu, W. Yao, Y. Liu, H. Jiang, L. Li, H. Yun, T. Zhao.
NeurIPS 2025. - Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
Q. Zhang, L. Qiu, I. Hong, Z. Xu, T. Liu, S. Li, R. Zhang, Z. Li, L. Li, B. Yin, C. Zhang, J. Chen, H. Jiang, T. Zhao.
COLM 2025. [pdf] - WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
Z. Wei, W. Yao, Y. Liu, W. Zhang, Q. Lu, L. Qiu, C. Yu, P. Xu, C. Zhang, B. Ying, H. Yun, L. Li.
EMNLP 2025. [pdf] [code] - DORM: Preference Data Weights Optimization for Reward Modeling in LLM Alignment
R. Zhang, C. Zhang, X. Zhang, L. Qiu, H. Jiang, Y. Zhuang, Q. Zhang, H. Yun, X. Li, B. Yin, T. Zhao, C. Zhang.
EMNLP Findings 2025. - Can Language Models Follow Multiple Turns of Entangled Instructions?
C. Han, X. Liu, H. Wang, S. Li, J. Yang, H. Jiang, Z. Wang, Q. Yin, L. Qiu, C. Yu, Y. Gao, Z. Li, B. Yin, J. Shang, H. Ji.
EMNLP Findings 2025. - Discriminative Finetuning of Generative Large Language Models without Reward Models and Preference Data
S. Guo, I. Hong, V. Balmaseda, C. Yu, L. Qiu, X. Liu, H. Jiang, T. Zhao, T. Yang.
ICML 2025. [pdf] [code] - Aligning Large Language Models via Fine-grained Supervision
D. Xu, L. Qiu, M. Kim, F. Ladhak, J. Do.
ACL 2024. [pdf] - A Survey of Deep Learning for Mathematical Reasoning
P. Lu, L. Qiu, W. Yu, S. Welleck, K.-W. Chang.
ACL 2023. [pdf] [web] - Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning
P. Lu, L. Qiu, K.-W. Chang, Y. N. Wu, S.-C. Zhu, T. Rajpurohit, P. Clark, A. Kalyan.
ICLR 2023. [pdf] [web] [code] - Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
P. Lu, S. Mishra, T. Xia, L. Qiu, K.-W. Chang, S.-C. Zhu, O. Tafjord, P. Clark, A. Kalyan.
NeurIPS 2022. [pdf] [web] [code] - Towards Socially Intelligent Agents with Mental State Transition and Human Utility
L. Qiu*, Y. Zhao*, Y. Liang, P. Lu, W. Shi, Z. Yu, S.-C. Zhu.
SIGDIAL 2022. [pdf] - ValueNet: A New Dataset for Human Value Driven Dialogue System
L. Qiu, Y. Zhao, J. Li, P. Lu, B. Peng, J. Gao, S.-C. Zhu.
AAAI 2022. [pdf] [paper] [web] - Learning from the Tangram to Solve Mini Visual Tasks
Y. Zhao, L. Qiu, P. Lu, F. Shi, T. Han, S.-C. Zhu.
AAAI 2022. [pdf] [paper] [data] - IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
P. Lu, L. Qiu, J. Chen, T. Xia, Y. Zhao, W. Zhang, Z. Yu, X. Liang, S.-C. Zhu.
NeurIPS 2021, Datasets and Benchmarks Track. [pdf] [code] [web] - SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues
L. Qiu, Y. Liang, Y. Zhao, P. Lu, B. Peng, Z. Yu, Y.N. Wu, S.-C. Zhu.
ACL-IJCNLP 2021. [pdf] [paper] - Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning
P. Lu, R. Gong, S. Jiang, L. Qiu, S. Huang, X. Liang, S.-C. Zhu.
ACL-IJCNLP 2021. [pdf] [code] [web] - Structured Attention for Unsupervised Dialogue Structure Induction
L. Qiu, Y. Zhao, W. Shi, Y. Liang, F. Shi, T. Yuan, Z. Yu, S.-C. Zhu.
EMNLP 2020. [pdf] [paper] [code]
Contact
Feel free to reach out: liangqiu at outlook dot com