TODO
zhoujuan
工作
科研
- 10.9 汇报
zengbiaojie
博客
- Hugo 使用指南
科研
paper reading list
MTRec: Learning to Align with User Preferences via Mental Reward Models
Checklists Are Better Than Reward Models For Aligning Language Models 基于RL去提高模型的Instruct Follow能力
InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback
Inference time LLM alignment in single and multidomain preference spectrum
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Alignment of Large Language Models with Constrained Learning
学习
工作
找一点30B左右的多模态模型进行测试
