5 min read

GEO: What the KDD Paper Actually Proves — and What It Doesn’t

GEO: What the KDD Paper Actually Proves — and What It Doesn’t
GEO: What the KDD Paper Actually Proves — and What It Doesn’t
9:46

In 2024, researchers from Princeton University, IIT Delhi, and independent collaborators published a paper titled “GEO: Generative Engine Optimization”, presented at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). The paper introduces a framework called Generative Engine Optimization (GEO), positioning it as a new paradigm to help websites improve their visibility inside AI-generated answers.

Because the topic sits at the intersection of SEO and large language models (LLMs), it has been widely interpreted as “SEO for AI search.” That interpretation is understandable — but incomplete.

In this article, I provide a careful, good-faith analysis of what the GEO paper actually demonstrates, the assumptions embedded in its methodology, and how marketers and publishers should realistically interpret its findings.

This is not a dismissal of the research. It is a clarification.

New call-to-action


What Is “GEO” According to the Paper?

The authors define Generative Engines (GEs) as search systems that:

  1. Retrieve documents from the web (or a database),
  2. Use large language models to synthesize an answer,
  3. Provide inline citations to supporting sources.

They argue that traditional SEO — optimizing for ranked blue links — does not fully apply in this new paradigm. Instead, they introduce Generative Engine Optimization (GEO): a set of strategies to increase a website’s “visibility” within AI-generated answers.

In their framing, visibility is no longer about:

  • Ranking #1 in a results list
    But instead about:
  • Whether your content is cited
  • How often it is cited
  • Where in the answer it appears
  • How much of the answer relies on it

That reframing is one of the paper’s most useful contributions.

However, the important nuance is this:

GEO measures citation prominence inside a specific AI answer pipeline — not universal AI “ranking.”


How the Study Was Conducted

To evaluate GEO, the authors built a controlled generative engine pipeline:

  • Retrieve the top 5 search results
  • Feed those sources into GPT-3.5-turbo
  • Prompt the model to generate an answer with strict inline citations
  • Measure how much each source is used in the answer

This setup resembles many Retrieval-Augmented Generation (RAG) systems, but it is still a specific implementation with specific constraints.

That context matters.


The Core Metrics: What Does “Visibility” Mean?

The paper introduces two primary categories of visibility metrics:

1. Position-Adjusted Word Count

This metric measures:

  • How many words in the generated answer are attributed to a given source
  • Adjusted so earlier citations count more than later ones

In simple terms:

If your site contributes more text to the answer — especially near the top — you gain more “impression.”

2. Subjective Impression (LLM-Graded)

They also use an LLM-based evaluator (similar to G-Eval) to score:

  • Relevance
  • Influence
  • Uniqueness
  • Diversity
  • Click likelihood
  • Prominence
  • Content volume

These scores are normalized to compare methods.

Important clarification:

These are designed measurement proxies, not proof of the ranking algorithm inside real-world AI systems.


I feel like the researchers made a couple of assumptions, so I chased those suspicions against the paper's context and content as well as what I know of how search works in the real world.

Suspicion #1: “Do All Generative

Engines Rank the Same Way?”

No — and the paper does not prove that they do.

Generative engines differ across:

  • Retrieval mechanisms
  • Freshness weighting
  • Authority filters
  • Personalization
  • Safety constraints
  • Citation policies
  • Source whitelisting

The paper evaluates:

  • A custom RAG-style pipeline
  • A limited experiment with Perplexity.ai

That demonstrates directional robustness — but not universal behavior across all AI systems.

Interpreting GEO as “the formula for AI ranking” would be an overreach.


Suspicion #2: “Is There Really a Formula?”

The paper proposes formulas for measuring visibility. That is not the same as discovering the internal scoring function of generative engines.

Their visibility formulas:

  • Weight word count
  • Weight citation position
  • Normalize exposure

But they do not show that AI systems optimize these formulas internally.

They measure outcomes, not internal mechanics.


The Additional Assumptions Most Readers Miss

Beyond your two valid concerns, several deeper assumptions deserve attention.


Assumption 1: Retrieval Happens First

Their pipeline retrieves the top search results first.

If your site is not retrieved into that candidate set, GEO does nothing.

This means:

Traditional SEO remains a prerequisite.

GEO optimizes for inclusion inside answers — but only after retrieval eligibility.


Assumption 2: The Model Is Forced to Cite Every Sentence

Their prompt strongly constrains the LLM to cite sources after each sentence.

This creates a predictable bias:

  • The model prefers clean, extractable, easily attributable claims.
  • Sources with quotable lines, clear statistics, and structured facts become easier to use.

Some commercial engines enforce citation more loosely.

If citation requirements change, the magnitude of GEO’s gains may shift.


Assumption 3: LLM-Graded “Subjective Impression” Reflects User Value

The subjective metric is scored by another language model.

This is scalable, but it can introduce bias toward:

  • Confident tone
  • Quantified claims
  • “Authoritative” phrasing
  • Structured writing

These are qualities models may prefer — not necessarily what users prefer.

The paper normalizes these scores due to calibration limitations, acknowledging measurement complexity.


Assumption 4: “Statistics Addition” and “Quotation Addition” Change Content

Some GEO methods involve adding:

  • Statistics
  • Quotes
  • Citations

These are not purely stylistic edits.

When done responsibly, they improve credibility.

When done carelessly, they risk incentivizing:

  • Cosmetic data insertion
  • Superficial citation padding

The study does not measure misinformation risk — only visibility gain.


Assumption 5: The System Is Stable

The paper acknowledges generative engines are evolving rapidly.

Prompt changes, retrieval expansions, anti-spam filters, or citation policy updates could all change optimization outcomes.

GEO is a snapshot in time — not a permanent formula.


What the 40% “Visibility Boost” Actually Means

The paper reports improvements of up to ~40% on position-adjusted word count and ~28% on subjective impression for top-performing strategies.

Important context:

  • Scores are normalized per answer.
  • Visibility gains for one source can reduce exposure for others.
  • Percent improvements are relative to baseline visibility, which may be small.

This does not mean:

  • 40% more traffic
  • 40% higher ranking
  • 40% more clicks

It means:

The optimized source contributed more to the generated answer in that pipeline.

That is meaningful — but not equivalent to SEO traffic growth.


What Appears Genuinely Robust in the Findings

Across experiments, the strongest-performing strategies included:

  • Quotation Addition
  • Statistics Addition
  • Cite Sources
  • Fluency Optimization
  • Easy-to-understand rewriting

The most plausible explanation is not that “AI ranks quotes higher,” but that:

LLMs under citation constraints prefer sources that are easy to extract, ground, and attribute.

This is better described as extractability bias.

Content that is:

  • Clear
  • Fact-dense
  • Properly attributed
  • Structurally organized

Is easier for AI systems to safely reuse.

That insight likely generalizes more broadly than the specific metrics.


GEO Is Not Replacing SEO — It’s Layering On Top of It

A more accurate framing:

Traditional SEO GEO (as tested)
Optimize for ranking in a list Optimize for inclusion inside a synthesized answer
Compete for position Compete for citation prominence
CTR-driven Extraction-driven

 

But:

  • Retrieval eligibility still matters.
  • Authority signals still matter.
  • Crawlability still matters.

GEO does not eliminate foundational SEO.


A Realistic, Non-Hype Interpretation for Marketers

If I strip away the marketing layer, the actionable takeaway becomes surprisingly straightforward:

Make your content easier for AI systems to confidently reuse and attribute.

That means:

  • Write atomic, standalone claims.
  • Include verified statistics with proper context.
  • Attribute quotes clearly.
  • Structure content with headings and logical flow.
  • Reduce ambiguity and fluff.
  • Improve clarity and fluency.

These are editorial quality improvements — not exploitative hacks.


Final Assessment: A Fair Critique

The GEO paper is a valuable early attempt to formalize how content behaves inside citation-based generative systems. It provides credible evidence that:

  • Content presentation affects citation prominence.
  • Extractable, attributable content performs better.
  • Classic keyword stuffing does not meaningfully improve visibility in their pipeline.

However, it does not:

  • Reveal a universal ranking formula,
  • Prove all generative engines behave identically,
  • Measure traffic or business impact,
  • Eliminate the need for traditional SEO.

GEO is best interpreted as:

Optimization for citable inclusion within RAG-style AI answers — not a discovered ranking algorithm for AI search.


References

Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’24). arXiv:2311.09735v3.

Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., & Zhu, C. (2023). G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment.

Liu, N. F., Zhang, T., & Liang, P. (2023). Evaluating Verifiability in Generative Search Engines.

Wikipedia Discovers What Small Publishers Already Know: Zero-Click Kills

Wikipedia Discovers What Small Publishers Already Know: Zero-Click Kills

The Wikimedia Foundation just announced something urgent: AI chatbots and search engines are scraping Wikipedia's content and delivering it directly...

Read More
Google Merges AI Overviews with AI Mode: Making You Choose Less So You Click More

Google Merges AI Overviews with AI Mode: Making You Choose Less So You Click More

Google is testing a feature that merges AI Overviews with AI Mode, eliminating the decision about whether you want a quick answer or a conversation....

Read More
Scientists Built a Virtual Mouse Brain With 10 Million Neurons: Now What?

Scientists Built a Virtual Mouse Brain With 10 Million Neurons: Now What?

Researchers at the Allen Institute and Japan's University of Electro-Communications just created one of the most detailed brain simulations ever...

Read More