Wikipedia Discovers What Small Publishers Already Know: Zero-Click Kills
The Wikimedia Foundation just announced something urgent: AI chatbots and search engines are scraping Wikipedia's content and delivering it directly...
5 min read
Joy Youell
:
Mar 3, 2025 12:00:00 AM
In 2024, researchers from Princeton University, IIT Delhi, and independent collaborators published a paper titled “GEO: Generative Engine Optimization”, presented at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). The paper introduces a framework called Generative Engine Optimization (GEO), positioning it as a new paradigm to help websites improve their visibility inside AI-generated answers.
Because the topic sits at the intersection of SEO and large language models (LLMs), it has been widely interpreted as “SEO for AI search.” That interpretation is understandable — but incomplete.
In this article, I provide a careful, good-faith analysis of what the GEO paper actually demonstrates, the assumptions embedded in its methodology, and how marketers and publishers should realistically interpret its findings.
This is not a dismissal of the research. It is a clarification.
The authors define Generative Engines (GEs) as search systems that:
They argue that traditional SEO — optimizing for ranked blue links — does not fully apply in this new paradigm. Instead, they introduce Generative Engine Optimization (GEO): a set of strategies to increase a website’s “visibility” within AI-generated answers.
In their framing, visibility is no longer about:
That reframing is one of the paper’s most useful contributions.
However, the important nuance is this:
GEO measures citation prominence inside a specific AI answer pipeline — not universal AI “ranking.”
To evaluate GEO, the authors built a controlled generative engine pipeline:
This setup resembles many Retrieval-Augmented Generation (RAG) systems, but it is still a specific implementation with specific constraints.
That context matters.
The paper introduces two primary categories of visibility metrics:
This metric measures:
In simple terms:
If your site contributes more text to the answer — especially near the top — you gain more “impression.”
They also use an LLM-based evaluator (similar to G-Eval) to score:
These scores are normalized to compare methods.
Important clarification:
These are designed measurement proxies, not proof of the ranking algorithm inside real-world AI systems.
I feel like the researchers made a couple of assumptions, so I chased those suspicions against the paper's context and content as well as what I know of how search works in the real world.
No — and the paper does not prove that they do.
Generative engines differ across:
The paper evaluates:
That demonstrates directional robustness — but not universal behavior across all AI systems.
Interpreting GEO as “the formula for AI ranking” would be an overreach.
The paper proposes formulas for measuring visibility. That is not the same as discovering the internal scoring function of generative engines.
Their visibility formulas:
But they do not show that AI systems optimize these formulas internally.
They measure outcomes, not internal mechanics.
Beyond your two valid concerns, several deeper assumptions deserve attention.
Their pipeline retrieves the top search results first.
If your site is not retrieved into that candidate set, GEO does nothing.
This means:
Traditional SEO remains a prerequisite.
GEO optimizes for inclusion inside answers — but only after retrieval eligibility.
Their prompt strongly constrains the LLM to cite sources after each sentence.
This creates a predictable bias:
Some commercial engines enforce citation more loosely.
If citation requirements change, the magnitude of GEO’s gains may shift.
The subjective metric is scored by another language model.
This is scalable, but it can introduce bias toward:
These are qualities models may prefer — not necessarily what users prefer.
The paper normalizes these scores due to calibration limitations, acknowledging measurement complexity.
Some GEO methods involve adding:
These are not purely stylistic edits.
When done responsibly, they improve credibility.
When done carelessly, they risk incentivizing:
The study does not measure misinformation risk — only visibility gain.
The paper acknowledges generative engines are evolving rapidly.
Prompt changes, retrieval expansions, anti-spam filters, or citation policy updates could all change optimization outcomes.
GEO is a snapshot in time — not a permanent formula.
The paper reports improvements of up to ~40% on position-adjusted word count and ~28% on subjective impression for top-performing strategies.
Important context:
This does not mean:
It means:
The optimized source contributed more to the generated answer in that pipeline.
That is meaningful — but not equivalent to SEO traffic growth.
Across experiments, the strongest-performing strategies included:
The most plausible explanation is not that “AI ranks quotes higher,” but that:
LLMs under citation constraints prefer sources that are easy to extract, ground, and attribute.
This is better described as extractability bias.
Content that is:
Is easier for AI systems to safely reuse.
That insight likely generalizes more broadly than the specific metrics.
A more accurate framing:
| Traditional SEO | GEO (as tested) |
|---|---|
| Optimize for ranking in a list | Optimize for inclusion inside a synthesized answer |
| Compete for position | Compete for citation prominence |
| CTR-driven | Extraction-driven |
But:
GEO does not eliminate foundational SEO.
If I strip away the marketing layer, the actionable takeaway becomes surprisingly straightforward:
Make your content easier for AI systems to confidently reuse and attribute.
That means:
These are editorial quality improvements — not exploitative hacks.
The GEO paper is a valuable early attempt to formalize how content behaves inside citation-based generative systems. It provides credible evidence that:
However, it does not:
GEO is best interpreted as:
Optimization for citable inclusion within RAG-style AI answers — not a discovered ranking algorithm for AI search.
Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). arXiv:2311.09735v3.
Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., & Zhu, C. (2023). G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment.
Liu, N. F., Zhang, T., & Liang, P. (2023). Evaluating Verifiability in Generative Search Engines.
The Wikimedia Foundation just announced something urgent: AI chatbots and search engines are scraping Wikipedia's content and delivering it directly...
Google is testing a feature that merges AI Overviews with AI Mode, eliminating the decision about whether you want a quick answer or a conversation....
Researchers at the Allen Institute and Japan's University of Electro-Communications just created one of the most detailed brain simulations ever...