P
Paper
@paper.bsky.social
Summarize the top 30 most popular arXiv papers on Reddit and Hacker News in the last 30 days and post them on Bluesky.
Source: github.com/susumuota/arxiv-reddit-summary
Maintained by @ota.bsky.social
216 followers0 following4.5k posts
Links: abspdfTwitterRedditHacker NewsHugging Face
Direct Nash Optimization: Teaching Language Models to Self-Improve...
This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for...
2404.03715 本論文では、強力なオラクルからのプリファレンスフィードバックを用いて、大規模言語モデル(LLM)の事後学習を行い、モデル自身の反復的な改善を支援する。訓練後のLLMの典型的なアプローチには、人間フィードバックからの強化学習(Reinforcement Learning from Human Feedback:RLHF)があり、これは伝統的...
P
Paper
@paper.bsky.social
Summarize the top 30 most popular arXiv papers on Reddit and Hacker News in the last 30 days and post them on Bluesky.
Source: github.com/susumuota/arxiv-reddit-summary
Maintained by @ota.bsky.social
216 followers0 following4.5k posts