Why Sonar Has More Insights — and How to Close the Gap

Root cause analysis of the source volume disparity, with a concrete expansion plan to reach Sonar-level insight density

Prototype v2 — Dual Stream Competitive Gap Data Pipeline Updated Mar 2026

✅ March 2026 Update — Dual-Stream Pipeline Live

The original analysis (Feb 28) was based on a Steam-only prototype with 80 insights per game. The pipeline has been rebuilt with dual-stream (press + player), multi-source data, and GitHub Actions automation. Here's what changed:

~150
Avg insights per game now
was 80 (Steam only)
1,518
Total insights across 10 games
was ~800 (100 Steam reviews × 10)
2
Separate sentiment streams
Press (articles) + Player (Steam)
Change Before (Feb 28) After (Mar 2026) Status
Data streamsSteam onlyPress articles + Steam reviews✅ Done
Sources per game100 (Steam reviews only)100–122 (Steam + press articles)✅ Done
Category framework8 categories (keyword match)26 categories (AI extraction)✅ Done
Reference scoreNoneSteam All-Time % + Metacritic✅ Done
OpenCriticPlanned❌ Auth now required (was "free public")Blocked
RedditPlanned❌ HTTP 403 from GitHub Actions IPs (Azure block)Blocked
YouTube transcriptsPhase 2⏸ YT_BLOCKED=True (needs transcript proxy)Backlog
GitHub Actions automationNone (local only)workflow_dispatch — run from any browser✅ Done
📊 Current gap (Balatro): Our prototype: 112 insights · 103 sources  →  Sprung Sonar: 370 insights · 56 sources  →  3.3× gap (down from 4.6× in Feb baseline). Main remaining drivers: YouTube (disabled) and Reddit (IP blocked).

1. The Numbers — Our Prototype vs. Sonar

Using Balatro as the benchmark: Sonar shows it publicly in their press coverage, giving a clean apples-to-apples comparison point.

Our Prototype — Balatro (Mar 2026)
112
total insights (was 80 — Feb baseline)
2 source types · 100 Steam reviews + 3 press articles · 103 sources total
Sonar — Balatro
370
insights from 56 sources
3+ source types · YouTube + articles + community

That's a 3.3× insight gap on a single game — down from 4.6× in the Feb baseline. The remaining gap is almost entirely explained by YouTube (disabled) and Reddit (blocked on GitHub Actions IPs).

Across all 10 games (Mar 2026 run)

Game Press % Player % Total Ins Sources Steam All-Time Metacritic Sprung Sonar
Black Myth: Wukong58%58%15110294%81
PalworldN/A84%7710095%TBD
Helldivers 293%56%13910383%82
Balatro100%77%11210398%9088% · 370 ins
Hell Is Us54%49%24310187%77
Outer Wilds100%76%14412295%8581% · 610 ins
Disco Elysium89%72%13110292%91
Pentiment95%67%17310295%86
Citizen Sleeper86%67%18410294%8275% · 389 ins
Signalis84%78%15410297%81

Insight gap by dimension (updated Mar 2026)

Total source docs (Balatro)
103
/
56

⚠ We now have MORE source documents than Sprung (103 vs 56) — but our 100 are short Steam reviews; theirs are long-form YouTube, articles, community. Quality > quantity here.

Insights per source doc
~1.1
/
~6.6
Category depth (# cats)
16
/
26

Category coverage up from 8 → 16 for Balatro, 19–25/26 for other games. Balatro's lower coverage (16/26) due to its focused card-game design rather than open-world variety.

Total insights (Balatro)
112
/
370

2. Root Cause Breakdown

The insight gap comes from three separable problems, not one.

🔴 Source diversity
We use only Steam reviews. Sonar pulls from YouTube videos, written articles, and community boards. Each source type has different signal density — a 15-minute YouTube video can yield 30+ extractable insights; a 60-word Steam review yields ~1.
Impact: ~40×
🔴 Source volume
We cap at 100 reviews per game. Sonar had 56 distinct source documents for Balatro. Long-form sources (YouTube, articles) are fewer in number but exponentially richer per piece — each one is independently curation-worthy.
Impact: ~56×
🟡 Extraction density
Our keyword matcher tags a whole review to one category max. Sonar extracts multiple discrete insights per source — a single article might contribute 8 insights across 5 categories. This is an AI extraction quality problem, not a data access problem.
Impact: ~6–8×
🟡 Category resolution
Sonar has 26 categories vs our 8. More categories = more surfaces for insights to land on = higher raw insight count even from the same source material. Their categories include things like Monetization, Player Agency, Replay Value, Social Features — all missed by us.
Impact: ~3×

3. Source-by-Source Breakdown

What each source type provides, what's available for free, and estimated implementation effort.

Source Sonar Uses? We Use? Insights / Source Doc API Access Effort
Steam Reviews ❓ Unconfirmed ✅ Yes (100/game) ~1.1 (AI extraction) Free, no key Done
Gaming Press Articles ✅ Confirmed ✅ Yes (3–5/game) 5–15 per article Google Search + scrape Done
YouTube Videos (transcripts) ✅ Primary source ⏸ Disabled 10–40 per video YouTube Data API v3 (free quota) YT_BLOCKED=True — needs proxy
Reddit (r/gaming, game-specific) ✅ Likely (community boards) ❌ Blocked 2–8 per thread Blocked on GH Actions
Metacritic User Reviews ❓ Unconfirmed ❌ No ~1 per review (like Steam) Scrape only (no API) Medium
OpenCritic (critic reviews) ❓ Unconfirmed ❌ Blocked 8–15 per full review Now requires API key — was listed as free Email developers@opencritic.com
Metacritic Critic Reviews ❓ Unconfirmed ❌ No 8–15 per review Scrape only Medium
Steam Discussion Forums ❓ Unconfirmed ❌ No 2–6 per thread ISteamPublishedFile API (free) Easy
Twitch/Streaming VODs ❓ Unlikely (mentioned videos) ❌ No Variable (reaction-heavy) Hard (transcription cost)

4. Our Current Output — Top 5 by Category Mentions

With Steam only (1,000 reviews across 10 games), here's how mentions distribute across our 8 categories. Note: these are keyword matches, not discrete insights per Sonar's definition.

Rank Source → Category Total Mentions % of All Matches Gap vs. Sonar Equivalent
#1 Steam Reviews → Gameplay 532 32.4% Well-covered — Sonar's equivalent likely similar
#2 Steam Reviews → Player Experience 393 24.0% Comparable, but emotionally shallow (short text)
#3 Steam Reviews → Aesthetics 323 19.7% YouTube is massively better here (visual reactions)
#4 Steam Reviews → Mechanics 100 6.1% Undercounted — Steam reviews rarely breakdown loops
#5 Steam Reviews → Game & World Systems 91 5.5% Highly undercounted — better in long-form articles
⚠ The bottom 3 categories — Accessibility (52), Technical (74), UX-UI (55) — are structurally undercounted from Steam reviews. Players rarely articulate accessibility or UI problems in short-form text. Long-form YouTube and critic reviews cover these substantially better.

5. Remaining Expansion Plan — From 112 to 370+ Insights Per Game

Four concrete phases, prioritized by ROI (insight gain per hour of dev work). Fully executable with the existing Python pipeline.

Phase 1 — 2–3 hours
Max out Steam + add OpenCritic
Est. insight lift: ~2.5×
The simplest wins: pull 500 reviews per game instead of 100, and add OpenCritic which has a free public API. OpenCritic aggregates full written critic reviews — these are 500–2000 words each and yield 8–15 insights per review vs. ~0.8 from a Steam blurb.

What to do:

1. Change max_reviews=100 to max_reviews=500 in analyze.py

2. Add fetch_opencritic_reviews(game_name) — search opencritic.com/api/game/search?criteria={name}, then fetch reviews at opencritic.com/api/review?game={id}. No API key needed.

3. Feed critic review text into the same classify pipeline. Each review is long — let AI extract multiple insights per review.

Steam ×500 OpenCritic
Phase 2 — 4–6 hours
Add YouTube transcripts
Est. insight lift: ~4–6×
This is likely Sonar's primary edge. A 15-minute YouTube video review is ~2,000–3,000 words and touches every dimension of a game. YouTube Data API v3 is free (10,000 units/day default quota). Transcripts are fetchable via the youtube-transcript-api Python library (no key needed).

What to do:

1. pip install youtube-transcript-api google-api-python-client

2. Search YouTube: f"{game_name} review" site:youtube.com — target video reviews, not "let's play" content. Filter by: duration 8–25 min (full reviews), view count >10k, published within 2 years.

3. Fetch auto-captions via youtube-transcript-api. Clean the transcript, chunk into 800-token blocks.

4. Run each chunk through Claude with a multi-insight extraction prompt: "Extract all distinct player experience insights from this transcript. For each insight, identify the category, sentiment, and a 1-sentence quote."

5. Target 5–10 videos per game → 150–400 insights per game from YouTube alone.

Steam ×500 OpenCritic YouTube (5–10 videos/game)
Phase 3 — 3–4 hours
Add Reddit + Steam Discussions
Est. insight lift: ~1.5×
Reddit houses the most in-depth mechanical discussions anywhere — game-specific subreddits often have pinned threads specifically about UX, accessibility, and bugs. Steam Discussions are similarly dense for technical issues. Both are low-cost to add.

What to do:

1. Reddit: use PRAW (Python Reddit API Wrapper). Free tier allows 60 req/min. Search r/{game_subreddit} for top posts tagged "criticism", "feedback", "ux", "accessibility". Also search r/gaming with game name + keywords.

2. Steam Discussions: endpoint is https://store.steampowered.com/appreviews/{appid} — Steam's IGetDiscussionList API also available for forum threads (no auth needed for public games).

3. Filter threads for signal: sort by upvotes, minimum comment count >5. Each quality thread = 4–8 extractable insights.

Steam ×500 OpenCritic YouTube Reddit (r/{game}) Steam Discussions
Phase 4 — 2 hours
Upgrade to multi-insight extraction
Est. insight lift: ~3× on same data
This is an AI prompt change, not a data change. Currently our classifier assigns one category per review. Sonar extracts multiple discrete insights per source. A single 200-word Steam review can contain observations about gameplay, technical performance, and UI — we're collapsing all that to one label.

What to do:

1. Change the classification prompt from: "Which category does this review primarily discuss?" to: "Extract all distinct insights from this review. For each, return: category, sentiment (positive/negative/mixed), severity (minor/notable/major), and a verbatim quote of 1–2 sentences."

2. Update the schema from one object per review to a list of insight objects. Update the HTML renderer to aggregate insights per category rather than reviews per category.

3. Expected output: 2–4 insights per Steam review, 6–12 per OpenCritic piece, 15–40 per YouTube transcript chunk.

All sources Multi-insight extraction

6. Projected Insight Count After Full Expansion

Projected insights per game after all 4 phases

112
Current — Mar 2026
was ~80 (Feb baseline)
~370
Sprung Sonar benchmark (Balatro)
~500–800
Our target (remaining phases)
After Phase Sources Active Estimated Insights/Game Δ vs Current Δ vs Sonar
Baseline — Mar 2026 ✅ Steam ×100 + press articles 112 +1.4× vs Feb −70%
Next: Reddit OAuth + Reddit (GitHub Secret) ~145 +29% −61%
Next: YouTube + YouTube (5–10 vids/game) ~350–450 +3–4× ≈ parity or above
Future: OpenCritic + OpenCritic (API key req.) ~400–500 +3.6–4.5× +8–35%
Full build All sources + higher Steam cap ~600–1000 +5–9× +60–170% vs Sonar

7. Recommended Next Steps

Biggest remaining win: YouTube transcripts. This is almost certainly Sonar's primary data source — their emphasis on "videos" is explicit in every press mention. A single 10-minute video review is ~2,000–3,000 words and touches every dimension of a game. Adding 5–10 videos per game would likely close the gap to parity or beyond. Blocker: youtube-transcript-api returns empty results on GitHub Actions; needs a self-hosted proxy or the /youtube-scrape skill.
Easy win: Reddit OAuth credentials as GitHub Secret. Reddit HTTP 403s are caused by GitHub Actions' Azure IP range being blocked. Adding a Reddit app client ID/secret as secrets would restore ~30 additional insights per game at no API cost.
OpenCritic is no longer "free public": The API now returns {"message":"API key is required. Email developers@opencritic.com"}. This was documented as a free public API in the original plan — that's no longer the case. Contact them for a key, or pivot to scraping Metacritic critic reviews directly.
Press score caveat: Our press scores consistently run 10–15 points higher than Metacritic for well-reviewed games (Balatro 100% vs MC 90, Outer Wilds 100% vs MC 85). This is because our 3–5 article sample skews toward positive coverage. A larger, more diverse article sample — especially including mixed/negative reviews — would improve calibration.