BLUE
JD
John David Pressman
@jdp.extropian.net
LLM developer, alignment-accelerationist, Fedorovist ancestor simulator, Dreamtime enjoyer. All posts public domain under CC0 1.0.
302 followers166 following481 posts
JDjdp.extropian.net

I suspect in practice we're going to start seeing a kind of epistemic hardening analogous to the hardening of software supply chains that's been going on with reproducible builds and such over the last decade or so. A Wikipedia citation will be linked into a matrix of automatically validated claims.

1
JDjdp.extropian.net

Any open platform that wants to survive is going to need some kind of strategy for dealing with that, sooner rather than later. I'm particularly worried about Wikipedia which is touch and go when it comes to trolls at the best of times but now faces the prospect of scaled up robot troll armies.

1
JDjdp.extropian.net

On the other hand quantity has a quality all its own and LLMs provide the opportunity for expanded Sybil attacks on the social grading mechanisms themselves. It's simply much easier in 2024 to make up a believable fake person with fake photos, fake social media feeds, and fake opinions.

1
JDjdp.extropian.net

If nothing else we're about to be inundated in confabulated texts and AI slop. Now I think this is less of a problem than usually assumed in that there already exists an infinite procession of human generated garbage text on the Internet which gets filtered out by various social grading mechanisms.

1
JDjdp.extropian.net

It is probably not surprising to you that the guy with a public domain declaration in bio is more interested in the open platform side of this branch, but I think a sustainable open platform will have to look different now that AI training and inference is part of the ecosystem, Reddit 2 won't work.

1
JDjdp.extropian.net

Ultimately as synthetic data methods get better and there are built up corpora for things like storytelling, English language styles, common sense reasoning, etc, what will remain a moving target that requires refreshing is news-like information, skill and trade information, and other factual info.

0
JDjdp.extropian.net

I think increasingly the value of social media posts won't be training data per se but sifting through the noise to find the actual information (Reddit has a ton of this after all, even if the modal post quality is bad) and using it as a retrieval database for grounding and factuality.

2
JDjdp.extropian.net

For a sense of what this looks like in practice consider FurAffinity, which has most of its interesting (read: pornographic) content behind a login wall and I'm to understand fairly aggressive anti-scraping in place due to a history of harassment and strong community sentiment against AI training.

1
JDjdp.extropian.net

So I expect the actual bifurcation in practice at the platform level to be "kind of site that cares more about being an open platform than getting boycotted by antiscrapers" and "kind of site that doesn't mind being sort of private if it keeps users happy and the deluge of AI scrapers away".

2
JDjdp.extropian.net

I don't think the audience of people that are willing to leave *just* so AI can read their stuff is big enough to really think much about, at least right now. But the anti-scraping and AI paranoia will obviously push platforms that are willing to play along into functionally private websites.

1
JD
John David Pressman
@jdp.extropian.net
LLM developer, alignment-accelerationist, Fedorovist ancestor simulator, Dreamtime enjoyer. All posts public domain under CC0 1.0.
302 followers166 following481 posts