BLUE
JD
John David Pressman
@jdp.extropian.net
LLM developer, alignment-accelerationist, Fedorovist ancestor simulator, Dreamtime enjoyer. All posts public domain under CC0 1.0.
302 followers166 following481 posts
JDjdp.extropian.net

I think increasingly the value of social media posts won't be training data per se but sifting through the noise to find the actual information (Reddit has a ton of this after all, even if the modal post quality is bad) and using it as a retrieval database for grounding and factuality.

2

JDjdp.extropian.net

Ultimately as synthetic data methods get better and there are built up corpora for things like storytelling, English language styles, common sense reasoning, etc, what will remain a moving target that requires refreshing is news-like information, skill and trade information, and other factual info.

0
TUtedunderwood.me

Got it. The “sifter” there would itself require an interesting kind of intelligence—able to infer, from social/network kinds of evidence, how much to trust a particular source on a previously unseen topic.

0
JD
John David Pressman
@jdp.extropian.net
LLM developer, alignment-accelerationist, Fedorovist ancestor simulator, Dreamtime enjoyer. All posts public domain under CC0 1.0.
302 followers166 following481 posts