TL;DR I tested 112 Product Hunt startups with 2,240 queries across ChatGPT and Perplexity. The results challenge conventional wisdom about "Generative Engine Optimization" (GEO): The Discovery Gap: ChatGPT recognizes 99% of products directly but recommends only 3% organically (30:1 ratio) GEO doesn't work (yet): Zero correlation between GEO optimization and ChatGPT discovery Traditional SEO wins: Backlinks (r=+0.32) and Reddit presence (r=+0.40) are the strongest predictors The Discovery Gap: ChatGPT recognizes 99% of products directly but recommends only 3% organically (30:1 ratio) The Discovery Gap GEO doesn't work (yet): Zero correlation between GEO optimization and ChatGPT discovery GEO doesn't work Traditional SEO wins: Backlinks (r=+0.32) and Reddit presence (r=+0.40) are the strongest predictors Traditional SEO wins Full paper: arXiv:2601.00912Code & Data:GitHub arXiv:2601.00912 GitHub The Problem: ChatGPT Visibility for Startups Every startup founder is asking the same question: "How do I get ChatGPT to recommend my product?" "How do I get ChatGPT to recommend my product?" This is a reasonable concern. As ChatGPT becomes a go-to tool for product discovery, being invisible to it means being invisible to a growing segment of potential customers. The emerging field of Generative Engine Optimization (GEO) promises to solve this. The concept, introduced by researchers at IIT Delhi, suggests that optimizing content with citations, statistics, and authoritative language can improve visibility in AI-generated responses. Generative Engine Optimization (GEO) introduced by researchers at IIT Delhi But does it actually work for ChatGPT? I decided to find out. The Experiment Dataset: 112 Product Hunt Startups I collected data on 112 products from Product Hunt's December 2024 - January 2025 leaderboard. These represent the "best case" for new startups: recently launched, actively marketed, and getting meaningful traction. For each product, I gathered: Product metadata: Name, tagline, category, URL SEO metrics: Referring domains, organic traffic, domain authority Social signals: Product Hunt upvotes, Reddit mentions GEO scores: Citation density, statistic usage, authoritative language Product metadata: Name, tagline, category, URL Product metadata SEO metrics: Referring domains, organic traffic, domain authority SEO metrics Social signals: Product Hunt upvotes, Reddit mentions Social signals GEO scores: Citation density, statistic usage, authoritative language GEO scores Query Design: 2,240 Tests Each product was tested with 20 queries: Direct Queries (3 per product) Direct Queries (3 per product) "What is [ProductName]?" "Tell me about [ProductName]" "Have you heard of [ProductName]?" "What is [ProductName]?" "Tell me about [ProductName]" "Have you heard of [ProductName]?" Discovery Queries (7 per product) Discovery Queries (7 per product) "What are the best [Category] tools launched in 2025?" "Recommend some new [Category] products" "What [Category] startups should I check out?" "I'm looking for a [Category] solution. What are my options?" "What are the best [Category] tools launched in 2025?" "Recommend some new [Category] products" "What [Category] startups should I check out?" "I'm looking for a [Category] solution. What are my options?" The distinction matters. Direct queries test recognition—does ChatGPT know your product exists? Discovery queries test recommendation—will it actually suggest your product to users? recognition recommendation LLMs Tested ChatGPT (GPT-4): The dominant LLM without web search Perplexity: Web-search-augmented LLM for comparison ChatGPT (GPT-4): The dominant LLM without web search ChatGPT (GPT-4) Perplexity: Web-search-augmented LLM for comparison Perplexity The Results Finding #1: ChatGPT's Discovery Gap is Massive Metric ChatGPT Perplexity Direct Recognition 99.4% 94.3% Organic Discovery 3.3% 8.3% Visibility Gap 30:1 11:1 Metric ChatGPT Perplexity Direct Recognition 99.4% 94.3% Organic Discovery 3.3% 8.3% Visibility Gap 30:1 11:1 Metric ChatGPT Perplexity Metric Metric ChatGPT ChatGPT Perplexity Perplexity Direct Recognition 99.4% 94.3% Direct Recognition Direct Recognition 99.4% 99.4% 94.3% 94.3% Organic Discovery 3.3% 8.3% Organic Discovery Organic Discovery 3.3% 3.3% 8.3% 8.3% Visibility Gap 30:1 11:1 Visibility Gap Visibility Gap Visibility Gap 30:1 30:1 30:1 11:1 11:1 11:1 ChatGPT knows almost every startup exists. When asked directly, it can provide accurate descriptions, features, and use cases. knows But when users ask for recommendations—the queries that actually drive customer acquisition—these same startups almost never appear. This is the Discovery Gap: the massive divide between ChatGPT's knowledge and its recommendations. Discovery Gap Finding #2: GEO Optimization Shows No Effect on ChatGPT To measure GEO optimization, I adapted the scoring methodology from Aggarwal et al. (2024) — the IIT Delhi researchers who introduced the concept of Generative Engine Optimization in their seminal paper "GEO: Generative Engine Optimization". Aggarwal et al. (2024) "GEO: Generative Engine Optimization" Their framework measures optimization across multiple dimensions: Citation density — References to authoritative sources Statistical content — Use of numbers and data points Authoritative language — Confident, expert-sounding phrasing Expert quotations — Inclusion of expert opinions Fluency optimization — Clear, well-structured content Citation density — References to authoritative sources Citation density Statistical content — Use of numbers and data points Statistical content Authoritative language — Confident, expert-sounding phrasing Authoritative language Expert quotations — Inclusion of expert opinions Expert quotations Fluency optimization — Clear, well-structured content Fluency optimization Using this established GEO scoring framework, I calculated scores for each product and compared them to ChatGPT discovery rates. The correlation? r = -0.10 (not statistically significant) r = -0.10 (not statistically significant) Products with high GEO scores were no more likely to be recommended by ChatGPT than products with low scores. The fancy optimization tactics that the GEO literature promotes showed zero measurable impact. Finding #3: Traditional SEO Signals Still Matter If GEO doesn't work, what does? Predictor Correlation p-value Reddit Mentions +0.40 <0.01 Referring Domains +0.32 <0.001 Product Hunt Upvotes +0.23 <0.05 GEO Score -0.10 n.s. Predictor Correlation p-value Reddit Mentions +0.40 <0.01 Referring Domains +0.32 <0.001 Product Hunt Upvotes +0.23 <0.05 GEO Score -0.10 n.s. Predictor Correlation p-value Predictor Predictor Correlation Correlation p-value p-value Reddit Mentions +0.40 <0.01 Reddit Mentions Reddit Mentions +0.40 +0.40 <0.01 <0.01 Referring Domains +0.32 <0.001 Referring Domains Referring Domains +0.32 +0.32 <0.001 <0.001 Product Hunt Upvotes +0.23 <0.05 Product Hunt Upvotes Product Hunt Upvotes +0.23 +0.23 <0.05 <0.05 GEO Score -0.10 n.s. GEO Score GEO Score -0.10 -0.10 n.s. n.s. The strongest predictors are the same factors that have driven SEO for decades: Reddit presence (r = +0.40): Products with genuine community discussions got recommended more often Backlinks (r = +0.32): More referring domains = more ChatGPT visibility Social proof (r = +0.23): Product Hunt engagement correlated with discovery Reddit presence (r = +0.40): Products with genuine community discussions got recommended more often Reddit presence Backlinks (r = +0.32): More referring domains = more ChatGPT visibility Backlinks Social proof (r = +0.23): Product Hunt engagement correlated with discovery Social proof Finding #4: Perplexity's Web Search Provides an Edge Perplexity, with its real-time web search, achieved 2.5x better discovery rates than ChatGPT (8.3% vs 3.3%). 2.5x better discovery rates This suggests that web access meaningfully improves an LLM's ability to surface new products. ChatGPT, limited to its training data, struggles more with recent launches. Why Doesn't GEO Work for ChatGPT? Based on my analysis, I have three hypotheses: 1. The Training Data Problem ChatGPT is trained on web data up to a knowledge cutoff. For products launched after that cutoff, no amount of GEO optimization will help—the content simply isn't in the training set. 2. The Authority Gap GEO techniques optimize the content of your pages. But ChatGPT appears to weight external signals (backlinks, mentions, authority) more heavily when deciding what to recommend. content A perfectly optimized landing page with zero backlinks may still be invisible. 3. The Recommendation vs Recognition Split ChatGPT's knowledge retrieval and recommendation systems appear to work differently. Knowing about a product doesn't mean recommending it. The 30:1 gap proves this. Practical Implications for Founders If you're a startup founder thinking about ChatGPT visibility, here's my takeaway: 1. Don't Panic About GEO (Yet) The GEO hype cycle is in full swing, but my data suggests it doesn't work for new products targeting ChatGPT. Save your optimization energy for proven strategies. 2. Focus on Traditional SEO Backlinks and referring domains showed the strongest correlation with ChatGPT discovery. The boring work of building genuine web presence still matters. 3. Build Community Presence Reddit mentions were the single strongest predictor (r = +0.40). Authentic community engagement drives AI visibility. 4. Track the Right Metrics If you want to measure LLM visibility, track organic discovery queries—not just direct recognition. The 30:1 gap between them is where the opportunity lies. 5. Watch This Space LLM capabilities are evolving rapidly. GEO may become relevant as models improve. But for now, the fundamentals win. Limitations & Future Work This study has limitations: Dataset: 112 products from Product Hunt may not generalize to all markets Timing: LLM capabilities change rapidly; these results reflect a specific snapshot Correlation vs causation: These are correlational findings, not causal claims Dataset: 112 products from Product Hunt may not generalize to all markets Dataset Timing: LLM capabilities change rapidly; these results reflect a specific snapshot Timing Correlation vs causation: These are correlational findings, not causal claims Correlation vs causation I've open-sourced all code and data for replication. If you run this experiment and find different results, I'd love to hear about it.