BLUE
Profile banner
P
Paper
@paper.bsky.social
Summarize the top 30 most popular arXiv papers on Reddit and Hacker News in the last 30 days and post them on Bluesky. Source: github.com/susumuota/arxiv-reddit-summary Maintained by @ota.bsky.social
216 followers0 following4.5k posts
Ppaper.bsky.social

[29/30] 98 Likes, 12 Comments, 2 Posts 2404.03715, cs․LG | cs․AI | cs․CL, 04 Apr 2024 🆕Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie

This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself.  The typical approach for post-training LLMs involves Reinforcement Learning from Human Feedback (RLHF), which traditionally separates re...
1

Ppaper.bsky.social

(1/2) 52 Likes, 11 Comments, 08 Apr 2024, Hacker News

1
Profile banner
P
Paper
@paper.bsky.social
Summarize the top 30 most popular arXiv papers on Reddit and Hacker News in the last 30 days and post them on Bluesky. Source: github.com/susumuota/arxiv-reddit-summary Maintained by @ota.bsky.social
216 followers0 following4.5k posts