Books3 corpus would like you to know that all the data in it is from copyrighted books. It has reportedly been widely used in closed-source AI LLMs. “Rules for thee, not for me” shit. They’ll break copyright and then copyright what they made from it.
You’re allowed to train on copyrighted works, it isn’t illegal for anybody. This article by Kit Walsh does a good job of breaking it down. She’s a senior staff attorney at the EFF.
This has the same vibe as Github (owned by microsoft) training its AI Copilot on repositories under the GPL license, which specifically forbids any work based on it not be made proprietary. Literally a blatant disregard for the license, but it’s ok because it’s a mega-corporation doing it
Unless they start offering on-prem or there are some very high profile server hacks I don’t see that being possible. Unlike media and client software they don’t need to provide the core functionality to end users, just the output.
piracy
Gorące
Magazyn ze zdalnego serwera może być niekompletny. Zobacz więcej na oryginalnej instancji.