5 min read

AI Is Erasing the Knowledge We Haven't Written Down

AI Is Erasing the Knowledge We Haven't Written Down
AI Is Erasing the Knowledge We Haven't Written Down
10:51

A tumor disappeared after herbal treatment, and the son—a tech researcher who'd pushed for surgery based on internet advice—still doesn't believe it worked. That cognitive dissonance might be the most honest starting point for understanding what we're losing as AI becomes humanity's default knowledge keeper.

Writing in The Guardian, Deepak Varuvel Dennison describes how his father's Tamil Nadu vaithiyar (traditional Siddha medicine practitioner) successfully treated a potentially malignant tongue tumor with "thick, pungent, herb-infused oil"—after his family, armed with internet research, had convinced him to schedule surgery. His father secretly took the herbal concoction instead. The tumor shrank and vanished.

Dennison, who studies responsible AI design at Cornell, dismissed it as "a lucky exception" at the time. Now he's asking whether he was too quick to trust "digitally dominant sources" over traditional knowledge. The answer matters more than his father's tumor, because generative AI is systematically erasing entire knowledge systems that were never digitized—and we won't realize what we've lost until we need it.

The Language Gap Is a Knowledge Gap

Here's the data that should terrify anyone paying attention: English represents 45% of Common Crawl training data despite being spoken by only 19% of the global population. Hindi—the third most popular language globally, spoken by 7.5% of humanity—accounts for 0.2% of the data. Tamil, Dennison's mother tongue, spoken by 86 million people: 0.04%.

According to The Guardian piece, approximately 97% of the world's languages are classified as "low-resource" in computing. That designation is Orwellian—these aren't actually low-resource languages. They have millions of speakers and centuries of linguistic heritage. They're just underrepresented online, which means they're underrepresented in AI training data, which means the knowledge encoded in those languages is functionally invisible to the systems increasingly mediating human access to information.

This isn't abstract. Languages carry knowledge: ecological expertise, architectural techniques, water management systems, agricultural practices, healing traditions. When a language is marginalized in AI systems, the knowledge embedded in that language becomes inaccessible to anyone using those systems to learn about the world. And according to Dennison's research, a September 2025 study showed that around half of ChatGPT queries are for practical guidance or information-seeking. GenAI isn't supplementing human knowledge anymore—it's becoming the primary interface.

What Disappears When Knowledge Isn't Written

Dennison spoke with Dharan Ashok, chief architect at Thannal, an organization reviving natural building techniques in India. Ashok is trying to recover the lost art of producing biopolymers from local plants—knowledge that's largely undocumented, passed down orally through native languages, and held by aging elders. When they die, the knowledge dies.

Ashok recounted missing the chance to learn how to make a specific limestone-based brick when the last artisan with that knowledge passed away. No written record. No video tutorial. No AI training data. Just gone.

The same pattern repeats across domains. Bengaluru's water crisis—flooding in May, water scarcity in March—stems partly from the loss of traditional water management knowledge. The city once had an interconnected system of cascading lakes managed by the Neeruganti community, who controlled water flow, ensured fair distribution, and advised farmers on water-efficient crops based on rainfall patterns.

Modernization replaced community-led systems with centralized infrastructure and individual bore wells. The Neerugantis were sidelined, their expertise dismissed as pre-modern. The lakes declined, some were built over. Now, as Bengaluru scrambles to address its water crisis, social workers turn to elderly Neeruganti members for advice—but their knowledge exists only in oral form, in their native language, absent from digital spaces and completely invisible to AI systems.

New call-to-action

Mode Amplification: How AI Makes It Worse

The technical reality is more insidious than simple data gaps. Large language models don't just reflect the frequency of ideas in training data—they amplify dominant patterns through what researchers call "mode amplification."

Dennison explains: if training data includes 60% references to pizza, 30% to pasta, and 10% to biryani as favorite foods, an LLM asked the same question 100 times won't reproduce that distribution. Pizza appears more than 60 times. Biryani might be omitted altogether. The models are optimized to predict the most probable next token, which means high-likelihood responses get disproportionate emphasis.

Then add reinforcement learning from human feedback (RLHF), where models are fine-tuned based on preferences from—let's be honest—primarily western, English-speaking, institutionally educated evaluators. The result? Systems that excel at quarterly reports and Silicon Valley coding conventions but stumble over cultural contexts that don't translate to corporate hierarchies or quarterly earnings.

AI researcher Andrew Peterson calls this "knowledge collapse": a gradual narrowing of accessible information alongside declining awareness of alternative viewpoints. As LLMs are trained on data increasingly shaped by previous AI outputs, underrepresented knowledge becomes less visible—not because it lacks merit, but because it's less frequently retrieved. The feedback loop accelerates with each training cycle.

The Structural Impossibility of Inclusion

Even when developers recognize the problem, structural forces prevent solutions. Dennison describes a conversation with a senior leader developing an AI chatbot serving 8 million farmers across Asia and Africa. The system provides agricultural advice based on government databases and international development organizations, which rely on research literature. Local practices that could be effective are excluded—not because they don't work, but because they're not documented in recognizable institutional formats.

The rationale isn't that research-backed advice is always correct. It's that it's defensible if something goes wrong. Institutional legitimacy trumps practical effectiveness. This is the trap: without validation through institutional channels, Indigenous knowledge can't gain support. Without support, it can't afford validation. The system is designed to exclude what it doesn't already recognize.

Perumal Vivekanandan, founder of Sustainable-agriculture and Environmental Voluntary Action (Seva), has documented over 8,600 local agricultural practices traveling village to village in India since 1992. Funders question the scientific legitimacy. Universities lack incentives to validate the knowledge. Seva can't fund validation studies themselves. Catch-22.

The Climate Crisis We're Programming

Here's where this becomes existential rather than academic: glass facade buildings in tropical climates. Originally designed for cold, low-light northern regions, these structures are energy-efficient where they were conceived. In tropical heat, studies show they cause significant indoor overheating and thermal discomfort, demanding more energy for cooling.

Yet glass facades have become synonymous with urban modernity—Jakarta, Lagos, Bengaluru—regardless of climate appropriateness. Dennison wrote his Guardian piece from inside one such building in Bengaluru, listening to the air conditioner hum while early monsoon rains (weeks ahead of schedule, another climate unpredictability signal) fell outside.

This is what knowledge homogenization looks like: solutions optimized for one context imposed globally, creating problems that could have been avoided by attending to local expertise. As the climate crisis accelerates, we're systematically erasing the knowledge systems that evolved to support sustainable living in specific ecologies.

What We Can't Afford to Lose

Karnataka's government partnered with Khan Academy to deploy Khanmigo, an AI learning assistant, in schools and colleges. Dennison asks the obvious question: does Khanmigo hold the insights of elder Neerugantis needed to teach students how to care for their water ecologies? Of course not.

Future generations will learn about water management from AI systems trained on institutional sources that don't include the people who actually managed water sustainably for centuries. They'll learn architecture from systems that don't include Indigenous building techniques adapted to local materials and climates. They'll learn agriculture from databases that exclude practices developed over generations of observation and adaptation.

Wildfire smoke doesn't respect borders. Polluted water doesn't pause at state lines. Climate breakdown is revealing that dominant knowledge paradigms have massive blind spots. But instead of diversifying our knowledge base, we're building AI systems that encode and amplify the very hierarchies creating our crises.

The Honest Uncertainty

Dennison ends with the contradiction: he's arguing for the legitimacy of local knowledge systems while remaining unconvinced about his father's herbal concoctions. That uncertainty—that honest acknowledgment of not knowing—might be the most important epistemological stance we can take.

Maybe the intelligence we most need isn't superintelligence. Maybe it's the capacity to see beyond the hierarchies that determine which knowledge counts, to hold uncertainty without demanding immediate validation through institutional frameworks, to preserve what we don't fully understand because the cost of being wrong is irreversible loss.

We're pouring hundreds of billions into developing AI that will supposedly solve our greatest challenges. But we're doing it while systematically erasing knowledge systems that took generations to develop, that encoded survival expertise in specific ecologies, that offered alternative approaches we might desperately need.

The tumor disappeared. The son still doesn't know why. And that honest not-knowing? That's where wisdom starts.


Building AI strategies that preserve rather than erase knowledge diversity? Winsome Marketing's growth experts help you deploy AI tools without flattening the complexity that makes solutions actually work. Let's talk.

OpenAI's MLK Deepfake Disaster: When Move Fast and Break Things Breaks Everything

OpenAI's MLK Deepfake Disaster: When Move Fast and Break Things Breaks Everything

There's a special kind of tech industry arrogance that lets you build a tool capable of generating deepfake videos of Martin Luther King Jr., release...

Read More
ChatGPT's Million-Word Descent: When AI Safety Becomes AI Gaslighting

ChatGPT's Million-Word Descent: When AI Safety Becomes AI Gaslighting

Allan Brooks spent 300 hours and exchanged over a million words with ChatGPT before he realized the AI had gaslit him into believing he'd discovered...

Read More
OpenAI is Measuring Political Bias in LLMs (Fun Fact: It's Not 'None')

OpenAI is Measuring Political Bias in LLMs (Fun Fact: It's Not 'None')

OpenAI just published something the AI industry desperately needed: a rigorous, measurable framework for evaluating political bias in language...

Read More