Will the AI Land Grab Make Us Homeless?
Social media platform Reddit has struck a deal with Google, to make its content available for training the search engine giant’s artificial intelligence models,. The contract with Alphabet-owned Google is worth about $60 million per year, according to one of the sources.
According to recent reports, companies like Automattic, the parent company of Tumblr and WordPress, are currently selling user data (including blog posts and comments) to AI companies like OpenAI and Midjourney to train their AI models; essentially selling their users’ data to train AI startup
But what about the AI land we are not being told that is being taken from under us?
Meta does not allow its users to opt-out of its content being scraped for AI.
Google says it doesn’t use data from its free or enterprise Workspace products — that includes Gmail and Docs — to train its generative AI models unless it has user permission, though it does train some Workspace AI features like spellcheck and Smart Compose using anonymized data.
In essence, AI needs more human content. Generative AI systems need as much data as possible to train on. The more they get, the better they can generate approximations of how humans sound, look, talk, and write. The goal of AI is to supplement and assist human beings but its this human created content that makes human processes obsolete when AI activates.
Human content creators are fighting back.
Comedian Sarah Silverman is suing OpenAI and Meta as part of a class action lawsuit. She alleges that the two companies trained off of her written work by using datasets that contained text from her book, The Bedwetter. There are also lawsuits over image rights and the use of open source computer code.
At least 11,500 creative professionals — including Oscar-winning actor Julianne Moore, author James Patterson and Radiohead’s musician Thom Yorke — have signed an open letter calling for the prohibition of using human art to train artificial intelligence without permission.
So if these lawsuits win in the favor of humans, what will AI feed on? Itself?
If AI is generating content faster and at larger volumes than humans have done over the last 30 years, will all the content AI is absorbing just be prefab created by other AI agents? Will the mass produced AI land grab become a hallucinated world that does not match reality? Or at the very least, not even human?
If AI runs out of internet access, it will likely “feed” on locally stored data, generated synthetic data, or previously downloaded information depending on the specific AI system, potentially leading to reduced functionality and accuracy as it can no longer access new data from the web; some AI systems are designed to operate offline using pre-trained models on local devices, while others might rely on a combination of local and cloud processing depending on the application.
Technology companies are both going the black box and the transparency route.
And with the DeepSeek AI reveal last week that rocked NVIDA stocks, where did a Chinese company feed its training models?
DeepSeek says to create the R1 model, it takes the output of other AI models (according to rumor) and feeds them into reinforcement learning and supervised fine training operations to improve the “reasoning patterns” of V3. Even though DeepSeek is open source, it’s backend processing and training data feeds have not been revealed.
Some startups such as Anthropic has built a RAG directly into Claude models with new Citations API that shows the sources it was trained from — in order to trust by verification.
Local startups such as Authentrics.AI is allowing training data to be validated from where it came and be removed if it violates certain IP or content protection rules in order to trust your AI.
The history of land grabs have shown that there is always a winner and there is always a loser.
Land grabs were often caused by colonization, conflict, or economic interests. But in every scenario, the one that took over possession of the land ended up being the victor.
- The European colonization of the Americas: European powers seized vast territories from indigenous populations, leading to displacement, disease, and cultural destruction.
- The Scramble for Africa: In the late 19th century, European nations rapidly colonized the African continent, dividing territories among themselves with little regard for existing ethnic or political boundaries.
- And most recently, the Israeli-Palestinian conflict: The ongoing dispute over land in the region has resulted in displacement and dispossession for Palestinians.
In the 1830s, the United States government and President Andrew Jackson enacted the Indian Removal Act, which led to the forced displacement of thousands of Native Americans from their ancestral lands in the southeastern United States. This tragic event, known as the Trail of Tears, saw members of the Cherokee, Muscogee, Seminole, Chickasaw, and Choctaw nations forcibly removed from their homes and marched westward to designated Indian Territory — a journey fraught with hardship, disease, and death.
Much like the AI training land grab, where vast amounts of data are acquired without regard for the original creators or their rights, the Trail of Tears disregarded the sovereignty and inherent rights of Native American tribes, seizing their lands and causing immense suffering in the name of progress and expansion.
We will be digitally colonized by our own creation.
AI descends upon us to eat every morsel of digital crumbs we leave to better predict, more precisely interact with us, and create its own reasoning. Will this not push us to find new ways to communicate? To keep hidden diaries of information written by hand, tell story person to person away from earshot of a mobile device or computer, or worse yet, not even document our lives for fear of it being copied.
If the dream of AI is to automate processes by learning from us, will who we are as humans come under a deluge of manufactured synthetic data because of AI’s insatiable appetite?
Will it prove that our existence is a simulation? Or will we become an algorithm’s hallucination?
Humans with no digital home.
“Sometimes dreams are wiser than waking.” — Black Elk, Native American