This is a lie, and it's always been a lie. Something like ChatGPT needs a TON of text in the language you're targeting to train the model. You get it by licensing it, or you by paying people to write it for you, or by stealing it. What they're saying is it's impossible to create CHEAPLY.
‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says
Pressure grows on artificial intelligence firms over the content used to train their productsBusiness live – latest updatesThe developer OpenAI has said it would be impossible to create tools like its groundbreaking chatbot ChatGPT without access to copyrighted material, as pressure grows on artificial intelligence firms over the content used to train their products.Chatbots such as ChatGPT and image generators like Stable Diffusion are “trained” on a vast trove of data taken from the internet, with much of it covered by copyright – a legal protection against someone’s work being used without permission. Continue reading...
I mean maybe if it's "impossible " for them to do it properly then maybe it just shouldn't be something that gets made 🤔
Or you can use material in the public domain and a lot of resume cover letters will sound like a Dickens character.
I recall when SBF told us his cryptocurrency business model wouldn’t work without all the fraud and we collectively shrugged let him get away with it because tech companies are legally entitled to be profitable.
They’re already bleeding money like it’s going out of style. Licensing fees on top of that? Pssshhhh. Once the government forces them to do that, they’ll all topple over one by one like little white techbro bowling pins.
A lot of the time "impossible" just means "more expensive than we can afford"
Any company would make crazy profits if they dont have to play for their materials.
Isn't that the same thing in practical terms? Licensing the entirety of what's been written or recorded throughout human history seems prohibitive to profitability. Maybe AI should be a societal utility like electricity.
Hah. This is a good take. Alternately, you *can* do it cheaply (as some open AI researchers have), but they didn't *already* do that, so all of the training hours that they've burned on GPT-4, compressed GPT-4, and the upcoming GPT-5 would be wasted. To catch up would be to be behind open models.
Exactly! There are millions of books in the public domain they could have used freely, but they wanted up-to-date parlance and vocabulary.