新建了别催了...

2025-07-29

TODO

zhoujuan

工作

科研

10.9 汇报

zengbiaojie

博客

Hugo 使用指南

科研

paper reading list

MTRec: Learning to Align with User Preferences via Mental Reward Models

Checklists Are Better Than Reward Models For Aligning Language Models 基于RL去提高模型的Instruct Follow能力

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback

Inference time LLM alignment in single and multidomain preference spectrum

Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment

Alignment of Large Language Models with Constrained Learning

学习

工作

找一点30B左右的多模态模型进行测试

DDL