8 min read1,800 words

Decoding siteFocusScore: The Hidden Metric Google Uses to Evaluate Your Site

The May 2024 Google API leak revealed siteFocusScore, a signal measuring topical coherence. Here's what it means for your content strategy and AI visibility.

Google AlgorithmsiteFocusScoreSemantic ArchitectureAI VisibilityContent Strategy

The Signal Google Doesn't Talk About {#the-signal}

In May 2024, an accidental leak exposed Google's internal API documentation. The leak, initially published to Hexdocs, revealed 14,014 ranking attributes organized across 2,596 modules.

Among those attributes: siteFocusScore.

Google has never publicly discussed this signal. No official documentation explains it. No Google spokesperson has acknowledged it exists. Yet the API documentation is clear: Google measures and uses topical coherence at the site level.

This is not speculation about what Google might do. This is documentation of what Google actually does.

The leak confirmed what experienced practitioners had suspected. Individual page quality matters, but site-level semantic architecture matters more than most realize. Google does not evaluate pages in isolation. It evaluates how pages fit into the semantic identity of the entire domain.


What siteFocusScore Actually Measures {#what-it-measures}

siteFocusScore measures topical coherence across a domain.

Not individual page quality. Not keyword optimization. Not backlink profiles. Topical coherence: how consistently a site addresses a defined semantic territory.

Consider two sites in the same industry.

Site A publishes 500 articles covering everything remotely related to their market. Customer success tips. Industry news. Thought leadership on tangential topics. Content marketing about content marketing. The site covers broad ground but lacks depth in any specific area.

Site B publishes 80 articles focused on their core expertise. Deep technical content. Comprehensive guides. Detailed case studies. Every piece reinforces the site's authority in a defined domain.

siteFocusScore evaluates which approach Google should trust more. Site B's focused coverage signals genuine expertise. Site A's scattered coverage signals a content mill trying to capture any traffic it can.

The signal does not reward breadth. It rewards coherent depth.


The Semantic Architecture Trio {#semantic-trio}

siteFocusScore does not operate alone. It works alongside two related signals to form what researchers have termed the semantic architecture trio.

SignalWhat It Measures
site2vecEmbeddingEncodedMathematical representation of site's entire semantic identity
siteFocusScoreTopical coherence and concentration
siteRadiusDeviation of individual pages from site's core identity

site2vecEmbeddingEncoded is your site's semantic fingerprint. Google converts your entire domain into a vector representation that captures what your site is about at a fundamental level. This is not about keywords. It is about the aggregate semantic meaning of all your content.

siteFocusScore measures how focused that fingerprint is. A site can have a clear identity but still be unfocused if it publishes content that drifts from its core. siteFocusScore evaluates whether your content maintains coherent direction.

siteRadius measures how far individual pages deviate from the center. Some deviation is normal. A software company might publish an occasional piece about company culture. But if many pages drift far from the semantic center, siteRadius increases and signals to Google that the site lacks coherent focus.

These three signals work together. A site with a clear identity (strong site2vec), high focus (good siteFocusScore), and tight clustering (low siteRadius) sends strong signals of topical authority. A site with scattered content fails across all three dimensions.


Why This Kills the "More Content" Strategy {#kills-more-content}

For years, the dominant content strategy was volume. Publish more. Cover more keywords. Capture more long-tail traffic. The logic was simple: more pages, more ranking opportunities, more traffic.

siteFocusScore inverts this logic.

Publishing 500 articles on scattered topics actively hurts your siteFocusScore. Each piece of off-topic content dilutes your semantic identity. The aggregate signal becomes noisier. Google's confidence in your topical authority decreases.

Content mills built on the volume strategy now face algorithmic headwinds. Their scattered coverage, once an asset for capturing diverse traffic, is now a liability for semantic evaluation.

The "publish everything" era is algorithmically over.

This does not mean publishing less is automatically better. It means publishing with coherent focus matters more than publishing at scale. A hundred focused articles will outperform a thousand scattered ones in siteFocusScore evaluation.

The strategic implication is clear. Before publishing new content, evaluate whether it strengthens or dilutes your topical focus. Content that drifts from your core expertise carries semantic debt that accumulates over time.


The AI Visibility Connection {#ai-visibility}

siteFocusScore matters beyond traditional search rankings. It predicts AI visibility as well.

iPullRank's analysis of 10,000 domains found striking correlations between topical coherence and AI citation:

High-coherence sites (cosine similarity above 0.85) achieved 4.7× higher average position in traditional search results.

Low-coherence sites (cosine similarity below 0.60) were 67% more likely to be filtered from AI Overviews entirely.

RAG systems use similar evaluation principles. When AI systems retrieve sources to cite, they evaluate semantic relevance at multiple levels. A site with clear topical authority in a domain is more likely to be retrieved for queries in that domain. A site with scattered coverage sends confused signals that reduce retrieval confidence.

This is why semantic architecture matters for both SEO and GEO. Google's siteFocusScore and AI retrieval systems are evaluating similar qualities: does this site have genuine, focused expertise in the topic being queried?

Sites with high siteFocusScore tend to have high semantic density in their content. Sites with low siteFocusScore tend to have fragmented semantic signals that confuse both Google and AI retrieval systems.

The connection is not coincidental. Both systems are trying to identify authoritative sources. Both penalize scattered, unfocused content strategies.


How to Diagnose Your siteFocusScore {#diagnose}

Google does not expose siteFocusScore in any public tool. You cannot check a dashboard and see your number. But you can diagnose your topical coherence through systematic audit.

Map your content topics. Export your content inventory and categorize every page by primary topic. How many distinct topics does your site address? How deep is coverage within each topic?

Warning signs of poor siteFocusScore:

Scattered topic distribution with thin coverage across many areas. If you publish about marketing, technology, culture, industry news, and business advice without deep coverage in any, your focus is diluted.

Content that targets keywords without topical connection to your core expertise. These pages may capture traffic but damage site-level coherence.

Legacy content from previous strategic directions. Many sites carry years of content from different eras with different focus areas. This historical accumulation compounds semantic drift.

Healthy signs of strong siteFocusScore:

Deep coverage of interrelated topics. Multiple pieces addressing different aspects of the same core subject, with clear relationships between them.

Consistent terminology and conceptual frameworks across content. Your content reinforces itself rather than introducing conflicting definitions.

Clear topical boundaries. You can articulate what your site does and does not cover.

The diagnosis often reveals a consolidation opportunity. Sites with scattered content can improve siteFocusScore by merging thin pieces into comprehensive resources and pruning content that falls outside their semantic territory.


Improving Topical Coherence {#improving}

Improving siteFocusScore requires strategic content work over months, not quick fixes.

Define your semantic territory. What topics should your site own? Where does your genuine expertise lie? Draw boundaries that reflect both your capabilities and your business objectives. Not everything you could write about should be on your site.

Consolidate scattered content. Ten 500-word articles on variations of one topic carry less semantic weight than one 3,000-word comprehensive piece. Consolidation concentrates signal and reduces noise. Redirect thin pieces to consolidated resources.

Prune content outside your territory. Content that does not fit your semantic focus actively harms your siteFocusScore. Removing or noindexing this content is not losing value. It is removing negative signal. Consider whether legacy content still serves your current focus.

Strengthen connections between related content. Internal linking helps, but semantic connection matters more. Does each piece reference shared concepts? Do your articles build on each other? Contextual coherence within and between pages signals topical depth.

Establish standards for new content. Prevention is easier than remediation. Before publishing new content, evaluate topical fit. Does this piece strengthen your semantic territory or dilute it? Build this evaluation into your content workflow.

The timeline for improvement is measured in months. Google re-evaluates site-level signals periodically rather than in real-time. Expect 3-6 months before consolidation and pruning efforts reflect in measurable changes. The work compounds over time as your semantic identity clarifies.


FAQs {#faqs}

What is siteFocusScore?

siteFocusScore is a Google internal signal measuring topical coherence across a domain. It evaluates how consistently a site addresses its established semantic territory. Sites with high focus scores receive preferential treatment for queries within their semantic space. The signal was revealed in the May 2024 Google API documentation leak.

How was siteFocusScore discovered?

siteFocusScore was discovered through the May 2024 Google API documentation leak, which exposed 14,014 ranking attributes across 2,596 modules. The leak, initially published to Hexdocs, was analyzed by SEO researchers including iPullRank who conducted correlation studies across 10,000 domains to understand the signal's impact.

Does siteFocusScore affect AI citations?

Yes. iPullRank analysis found that low-coherence sites (cosine similarity below 0.60) are 67% more likely to be filtered from AI Overviews. RAG systems use similar semantic evaluation principles, meaning siteFocusScore-like signals affect both Google rankings and AI citation likelihood across ChatGPT, Claude, and Perplexity.

Can I check my siteFocusScore directly?

No, siteFocusScore is an internal Google signal not exposed in any public tool or API. However, you can diagnose your topical coherence by auditing your content topics, measuring semantic similarity between pages, and identifying how far your content drifts from your core subject matter.

How long does it take to improve siteFocusScore?

Improving topical coherence typically takes 3-6 months to reflect in ranking changes. Google re-evaluates site-level signals periodically, not in real-time. Content consolidation, pruning, and focused publishing must accumulate before the aggregate signal shifts measurably.

What's the relationship between siteFocusScore and siteRadius?

siteFocusScore measures overall topical coherence while siteRadius measures how far individual pages deviate from the site's semantic center. They work together: a site can have a clear identity (good site2vec embedding) but poor focus if many pages drift far from that center (high siteRadius). Both signals inform Google's evaluation of site-level semantic quality.


The Implications

The May 2024 API leak did not reveal anything practitioners had not suspected. But it provided documentation of mechanisms that Google never publicly acknowledged.

siteFocusScore is proof that Google evaluates semantic architecture, not just keywords and links. The algorithm measures topical coherence. It rewards focused expertise. It penalizes scattered, unfocused content strategies.

For content strategists, the implication is clear. The era of publishing volume is over. The era of semantic architecture has arrived.

Sites that build coherent topical authority will see preferential treatment in both traditional search and AI-mediated discovery. Sites that continue scattered publishing will accumulate semantic debt that compounds into algorithmic headwinds.

The signal exists. The evidence is documented. The strategic response is your choice.

About the Author

Jack Metalle

Founding Technical Architect, DecodeIQ

M.Sc. (2004), 20+ years semantic systems architecture