hmm. i've heard otherwise.
but either way, given that they think AI output is generally good, and given that they desperately need more data, are they actually going to try to filter out AI output from their training data?
"They" don't think that though. There's a whole ass computer sciences industry doing research on these models right now, don't get that confused with the hyperscaling industry whose existence is dependent on all of this ridiculous growth paying off. Are OpenAI, Anthropic, Google, Microsoft, Elon, etc going to do anything different than what they've done? No. But it's fairly trivial for someone who has the knowledge, right now, today to take the biggest deepseek model available, plug in the code from one of those research projects trained to identify AI content, and then run their own instance that can filter (most) AI generated material out of it's own training databases.
I think you're mistaking the difference between it being possible, and the big model makers to be willing to implement it. They're not going to. But you know who will? Deepseek, or the various teams working off of Deepseek models to make new shit right now. Apple when they start rolling our a next gen of apple silicon that can run custom models locally & easily with low system overhead, etc, etc.
The hyperscaled industry is going to die, but deepseek has already shown us the future of the technology itself. Lightweight on device models that you can custom train to whatever you want it to specialize in. Want to turn yours into a proprietary analytics driven hockey player scouting monster? Cool, get API access for as many free and paid analytics data sets as you can (macro and microstat) and work with it to generate custom algos with custom stat weightings for pro and minor league scouting and just pull reports off of the front end. The future of this tech is bottom up, not top down. It will still have it's drawbacks (the halucination issue is not going away, it's a feature of the tech, not a bug) but a dystopian future of AI's training on bad AI data making AI's more useless and then the next generation enshittifying further and further isn't likely. The hyperscale industry won't exist in the future to make it happen.
I am a full on Ed Zitron level skeptic of the AI industry as it exists in the US right now. That shit is fairy dust that is going to tank the economy and cause a historic recession when it goes. But the technology itself run locally on the type of hardware accessible off of the shelf at any computer store, where anyone interested in it can take a basic model and train it themselves? Some neat shit is going to come out of that. You can bet that when someone like Chayka refers to AI usage in our front office, he's not talking about using ChatGPT to vet trade ideas. He's talking about this.