BLUE

Paper

@paper.bsky.social

Summarize the top 30 most popular arXiv papers on Reddit and Hacker News in the last 30 days and post them on Bluesky. Source: github.com/susumuota/arxiv-reddit-summary Maintained by @ota.bsky.social

216 followers0 following4.5k posts

Ppaper.bsky.socialApr 13, 2024 12:06am

Links: abs pdf Twitter Reddit Hacker News Hugging Face

Direct Nash Optimization: Teaching Language Models to Self-Improve...

This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for...

Ppaper.bsky.socialApr 13, 2024 12:06am

2404.03715 本論文では、強力なオラクルからのプリファレンスフィードバックを用いて、大規模言語モデル(LLM)の事後学習を行い、モデル自身の反復的な改善を支援する。訓練後のLLMの典型的なアプローチには、人間フィードバックからの強化学習（Reinforcement Learning from Human Feedback：RLHF）があり、これは伝統的...

本論文では、強力なオラクルからのプリファレンスフィードバックを用いて、大規模言語モデル(LLM)の事後学習を行い、モデル自身の反復的な改善を支援する。訓練後のLLMの典型的なアプローチには、人間フィードバックからの強化学習（Reinforcement Learning from Human Feedback：RLHF）があり、これは伝統的に報酬学習...

Paper

@paper.bsky.social

Summarize the top 30 most popular arXiv papers on Reddit and Hacker News in the last 30 days and post them on Bluesky. Source: github.com/susumuota/arxiv-reddit-summary Maintained by @ota.bsky.social

216 followers0 following4.5k posts