Ru Peng (Perry)

Hi ๐Ÿ˜ƒ! ๅฝญๅ„’, PhD @ Zhejiang University.

rupeng.jpg

"Only love endures the passage of time"

Iโ€™m a 4th-year PhD student at Computer Science Department of Zhejiang University (ZJU), advised by Professors Junbo Zhao and Gang Chen, and affiliated with DiLab-ZJU and State Key Laboratory of Blockchain and Data Security. Also, I was a research intern at Alibaba Qwen Team, working with Dayiheng Liu, Chang Zhou and Junyang Lin on data management and synthesis for QWEN series models. Previously, I was fortunate to collaborate with Professors Tianyong Hao, Yi Fang and Kehai Chen, who ushered me into the research journey.

My research spans multiple AI areas, including LLMs (current emphasis), machine learning, NLP, and multimodality, listed below in reverse chronological order.

I am open to opportunities across academia and industry โ€” feel free to get in touch!

Google Scholar Google Scholar ย ย  Twitter Twitter ย ย  Email Email ย ย  GitHub GitHub ย ย 

ย 

๐Ÿ”ฅ News

Mar 02, 2026 Our paper โ€œW2S: Weak-to-Strong Prompt Correction for Large Language Modelsโ€ is accepted at Machine Learning 2026!
Jan 26, 2026 Our paper โ€œOptimSyn: Influence-Guided Rubrics Optimization for Synthetic Data Generationโ€ is accepted at ICLR 2026!
Aug 18, 2025 Ant RL technical report โ€œReinforcement learning with rubric anchorsโ€œ(extending RLVR with 10k+ Rubric rewards) is now released.
Feb 11, 2025 Our paper โ€œLLM-Enhanced Query Generation and Retrieval Preservation for Task-Oriented Dialogueโ€ is accepted at Findings of ACL 2025!
Feb 11, 2025 Our paper โ€œDataMan: Data Manager for Pre-training Large Language Modelsโ€ is accepted at ICLR 2025!
Dec 19, 2024 Qwen2.5 technical report are released now.
Sep 20, 2024 One paper โ€œInference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluationโ€ is accepted at Findings of EMNLP 2024 and two paper โ€œPredicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Modelโ€, โ€œEmbedding and Gradient Say Wrong: A White-Box Method for Hallucination Detectionโ€ are accepted at EMNLP 2024!
Sep 19, 2024 Qwen2.5 series foundation models are released now.
Jul 15, 2024 Qwen2 technical report are released now.
Jul 04, 2024 Release the paper of โ€œDotamathโ€ for mathematical reasoning.
Jun 17, 2024 Qwen2 series foundation models are released now.
May 16, 2024 Our paper โ€œDORY: Deliberative Prompt Recovery for LLMโ€ is accepted at Findings of ACL 2024!
Feb 04, 2024 Qwen1.5 series foundation models are released now.
Jan 16, 2024 Our paper โ€œEnergy-based Automated Model Evaluationโ€ is accepted at ICLR 2024!
Oct 23, 2023 I started my internship at Alibaba Qwen Team! Ping me if you want to meet up in HangZhou :)
Jul 15, 2023 Our paper โ€œCAME: Contrastive Automated Model Evaluationโ€ is accepted at ICCV 2023!
Oct 06, 2022 Our paper โ€œDistill The Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translationโ€ is accepted at EMNLP 2022 (Oral)!
Sep 10, 2022 Started my PhDโ€™s degree at College of Computer Science and Technology of Zhejiang University!
Apr 06, 2022 Our paper โ€œHybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignmentโ€ is accpeted at ICMR 2022 (Oral)!

๐Ÿ“ Selected Publications

  1. Qwen1.5 Blog
    qwen1_5_blog.jpeg
    Introducing qwen1. 5
    Qwen Team
    Online Blog, 2024
  2. Qwen2 Technical Report
    qwen2_technical_report.jpg
    Qwen2 technical report, 2024
    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, and 3 more authors
    arXiv preprint arXiv:2407.10671, 2024
  3. Qwen2.5 Technical Report
    qwen2_5_technical_report.jpg
    Qwen2.5 Technical Report
    An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, and 33 more authors
    arXiv preprint arXiv:2412.15115, 2024
  4. ICLR 2025
    ICLR2025_DataMan.jpg
    DataMan: Data Manager for Pre-training Large Language Models
    In The Thirteenth International Conference on Learning Representations, 2025
  5. Ant RL Tech Report
    Ant_RL_TechReport_Rubicon.png
    Reinforcement learning with rubric anchors
    Zenan Huang, Yihong Zhuang, Guoshan Lu, Zeyu Qin, Haokai Xu, Tianyu Zhao, Ru Peng , Jiaqi Hu, and 3 more authors
    arXiv preprint arXiv:2508.12790, 2025

๐Ÿ’ผ Work Experience

Hunyuan LLM Team, Tencent Dec. 2025 - Now

Qingyun Program Research Intern on Large Language Models

Focused on agentic mid-training and RL.

Inclusion AI Team, Ant Group April 2025 - Oct 2025

Plan-A Research Intern on Large Language Models

Mentor: Junbo Zhao

Contributed to the reinforcement learning from rubric rewards for both closed-ended and open-ended task.

Qwen Pre-training Team, Alibaba Group Oct 2023 - April 2025

Research Intern on Large Language Models

Mentor: Dayiheng Liu, Junyang Lin and Chang Zhou
  • Contributed to the Qwen 1.5/2/2.5 series base models.
  • Developed Data Manager (DataMan and DataXMan) for data selection and mixing in LLM pre-training, adopted in the Qwen base models and reported in Synced (ๆœบๅ™จไน‹ๅฟƒ).
  • Contributed to data synthesis in open-ended tasks for the Qwen series models.

๐Ÿ“š Academic Services

  • Conference Reviewer: ICLR 2024, 2025; ICML 2023, 2025; NeurIPS 2022, 2023, 2024; CVPR 2025; ICCV 2023, 2025; ECCV 2024; ACL 2024, 2026; AISTATS 2025; COLM 2024.
  • Journal Reviewer: IEEE Transactions on Big Data (TBD), Transactions of Machine Learning Research (TMLR).
  • Publication Chair: International Conference on Natural Language Processing (ICNLP) 2025.

๐Ÿ˜Š Miscellaneous

I love music ๐ŸŽต, sports (basketball ๐Ÿ€, football โšฝ, running ๐Ÿƒโ€โ™‚๏ธ, etc.), traveling the world ๐Ÿ—บ๏ธ, hanging out with friends ๐Ÿป, and trying anything new. Click on Totoro below โ†˜๏ธ to hear one of my favorite songs: ๅคฑๆ‹ใ‚ฝใƒณใ‚ฐๆฒขๅฑฑ่ดใ„ใฆๆณฃใ„ใฆใฐใ‹ใ‚Šใฎ็งใฏใ‚‚ใ†ใ€‚


Totoro Bottle