๐Ÿค– AI & Visibilityโฑ 7 min read

How to Get Cited by Perplexity AI

Perplexity doesn't use Bing or Google. It runs its own proprietary search index with a six-stage retrieval pipeline. Citations are embedded structurally before the AI generates its response โ€” meaning your content must pass quality reranking to even be considered.

Perplexity AI uses a six-stage RAG pipeline to select which sources to cite

Perplexity doesn't use Bing or Google. It runs its own proprietary search index, built on a custom embedding model (pplx-embed) trained on 250 billion tokens across 30 languages. Getting cited by Perplexity requires understanding its six-stage retrieval pipeline โ€” because citations are embedded structurally before the AI generates its response, not added afterwards.

How Perplexity's citation pipeline works

Perplexity processes every query through six stages:

Stage 1 โ€” Query intent parsing

Perplexity analyses what you're actually asking. It classifies the query type (factual lookup, comparison, how-to, opinion) and identifies the key entities and concepts.

Stage 2 โ€” Embedding-based indexing

Your content has already been indexed and converted into vector embeddings by Perplexity's crawlers. When a query comes in, the system finds content whose embeddings are semantically similar to the query โ€” not just keyword matches.

Stage 3 โ€” Multi-method retrieval

Perplexity uses three retrieval methods simultaneously: BM25 (traditional keyword matching), dense vector search (semantic similarity), and hybrid search (combining both). This means your content needs to satisfy both keyword relevance and semantic meaning.

Stage 4 โ€” Three-layer ML reranking

Retrieved results pass through three machine learning reranking layers (L1, L2, L3), each progressively more selective. Sources below a quality threshold of approximately 0.7 are dropped entirely. According to Onely's research, this is where most content fails โ€” it gets retrieved but doesn't survive reranking.

Stage 5 โ€” Structured prompt assembly

The surviving sources are assembled into a structured prompt with citations already embedded. This is architecturally different from ChatGPT โ€” Perplexity decides which sources to cite before generating any text.

Stage 6 โ€” Constrained LLM synthesis

The LLM generates the response, constrained to cite only the pre-selected sources. It cannot introduce new sources during generation.

What Perplexity values most in content

Based on analysis of Perplexity's citation patterns:

Answer in the first 100 words

Research shows 90% of Perplexity's top-cited pages answer the query directly in the opening paragraph. If your page buries the answer below marketing copy or lengthy introductions, it will be retrieved but dropped during reranking.

JSON-LD structured data is a significant advantage

Pages with JSON-LD schema markup achieve a 47% Top-3 citation rate compared to 28% for pages without it. FAQPage, HowTo, Article, and Organization schemas all contribute to better citation performance.

Topical depth beats domain authority

A surprising finding from Semrush research: 92.78% of Perplexity citations come from pages with fewer than 10 referring domains. This means niche expertise and content depth matter far more than backlink volume. A small consultancy with deep expertise on a specific topic can outperform a major publication.

Content freshness within 12-18 months

Approximately 70% of Perplexity's top citations come from content updated within the last 12-18 months. Stale content gets deprioritised during reranking, even if it's otherwise excellent.

Self-contained content blocks

Perplexity's RAG system extracts 200-400 word content blocks independently. Each section of your content should make sense on its own without requiring context from surrounding sections. Use clear H2/H3 headings and answer the heading's question within each block.

How to allow PerplexityBot access

Perplexity's crawler is identified as PerplexityBot in your robots.txt. To ensure your content is indexed:

  • Add User-Agent: PerplexityBot with Allow: / to your robots.txt
  • Don't block Perplexity's IP ranges in your firewall or CDN
  • Ensure your pages load without JavaScript โ€” Perplexity's crawler doesn't execute JavaScript
  • Use server-side rendering or static HTML for all content you want cited

The engagement feedback loop

Perplexity has a feedback mechanism: if users consistently click through to your source and spend time reading, your content is ranked higher in future queries. Conversely, poor sources โ€” those that users bounce from or that receive thumbs-down ratings โ€” are deprioritised within approximately one week. This makes content quality genuinely self-reinforcing on Perplexity.

Is Perplexity citing your competitors instead of you?

RabbiiCo Studio's free AI Visibility Audit tests whether Perplexity, ChatGPT, Claude, and Gemini mention your business โ€” and identifies exactly what's needed to start getting cited.

Get your free AI visibility audit โ†’