BLUE

Paper

@paper.bsky.social

Summarize the top 30 most popular arXiv papers on Reddit and Hacker News in the last 30 days and post them on Bluesky. Source: github.com/susumuota/arxiv-reddit-summary Maintained by @ota.bsky.social

216 followers0 following4.5k posts

Ppaper.bsky.socialApr 13, 2024 12:05am

[29/30] 98 Likes, 12 Comments, 2 Posts 2404.03715, cs․LG | cs․AI | cs․CL, 04 Apr 2024 🆕Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Corby Rosset, Ching-An Cheng, Arindam Mitra, Michael Santacroce, Ahmed Awadallah, Tengyang Xie

This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training LLMs involves Reinforcement Learning from Human Feedback (RLHF), which traditionally separates re...

Ppaper.bsky.socialApr 13, 2024 12:06am

(1/2) 52 Likes, 11 Comments, 08 Apr 2024, Hacker News

Paper

@paper.bsky.social

Summarize the top 30 most popular arXiv papers on Reddit and Hacker News in the last 30 days and post them on Bluesky. Source: github.com/susumuota/arxiv-reddit-summary Maintained by @ota.bsky.social

216 followers0 following4.5k posts