BLUE

John David Pressman

@jdp.extropian.net

LLM developer, alignment-accelerationist, Fedorovist ancestor simulator, Dreamtime enjoyer. All posts public domain under CC0 1.0.

302 followers166 following481 posts

Overview Posts Replies

JDjdp.extropian.netSep 8, 2024 12:36pm

I suspect in practice we're going to start seeing a kind of epistemic hardening analogous to the hardening of software supply chains that's been going on with reproducible builds and such over the last decade or so. A Wikipedia citation will be linked into a matrix of automatically validated claims.

JDjdp.extropian.netSep 8, 2024 12:33pm

Any open platform that wants to survive is going to need some kind of strategy for dealing with that, sooner rather than later. I'm particularly worried about Wikipedia which is touch and go when it comes to trolls at the best of times but now faces the prospect of scaled up robot troll armies.

JDjdp.extropian.netSep 8, 2024 12:32pm

On the other hand quantity has a quality all its own and LLMs provide the opportunity for expanded Sybil attacks on the social grading mechanisms themselves. It's simply much easier in 2024 to make up a believable fake person with fake photos, fake social media feeds, and fake opinions.

JDjdp.extropian.netSep 8, 2024 12:30pm

If nothing else we're about to be inundated in confabulated texts and AI slop. Now I think this is less of a problem than usually assumed in that there already exists an infinite procession of human generated garbage text on the Internet which gets filtered out by various social grading mechanisms.

JDjdp.extropian.netSep 8, 2024 12:28pm

It is probably not surprising to you that the guy with a public domain declaration in bio is more interested in the open platform side of this branch, but I think a sustainable open platform will have to look different now that AI training and inference is part of the ecosystem, Reddit 2 won't work.

JDjdp.extropian.netSep 8, 2024 12:22pm

Ultimately as synthetic data methods get better and there are built up corpora for things like storytelling, English language styles, common sense reasoning, etc, what will remain a moving target that requires refreshing is news-like information, skill and trade information, and other factual info.

JDjdp.extropian.netSep 8, 2024 12:20pm

I think increasingly the value of social media posts won't be training data per se but sifting through the noise to find the actual information (Reddit has a ton of this after all, even if the modal post quality is bad) and using it as a retrieval database for grounding and factuality.

JDjdp.extropian.netSep 8, 2024 12:16pm

For a sense of what this looks like in practice consider FurAffinity, which has most of its interesting (read: pornographic) content behind a login wall and I'm to understand fairly aggressive anti-scraping in place due to a history of harassment and strong community sentiment against AI training.

JDjdp.extropian.netSep 8, 2024 12:14pm

So I expect the actual bifurcation in practice at the platform level to be "kind of site that cares more about being an open platform than getting boycotted by antiscrapers" and "kind of site that doesn't mind being sort of private if it keeps users happy and the deluge of AI scrapers away".

JDjdp.extropian.netSep 8, 2024 12:12pm

I don't think the audience of people that are willing to leave *just* so AI can read their stuff is big enough to really think much about, at least right now. But the anti-scraping and AI paranoia will obviously push platforms that are willing to play along into functionally private websites.

John David Pressman

@jdp.extropian.net

LLM developer, alignment-accelerationist, Fedorovist ancestor simulator, Dreamtime enjoyer. All posts public domain under CC0 1.0.

302 followers166 following481 posts