When Your Product Starts Talking Back

How tracking everything reshapes what you build next

11 minute read

A big part of my day the past few months has been focused on one of my game engines. A key unlock from our previous iteration into this latest is a much deeper analytics and tracking system. That work’s been worthwhile as it’s already paying returns. Players were posting on Discord that the fireball spell was overpowered. We needed data to either confirm that or push back with evidence so we weren’t tuning in the dark.

What we got instead was a window into how players experienced the game versus how we designed it to be experienced. Those two things were further apart than we expected. The first surprise came within a week of turning on system-wide event logging. It didn’t come from the fireball data.

The Event Schema: What to Capture and Why

Before covering what the data revealed, it’s worth explaining what we tracked and why specific fields matter. The instinct with game analytics is to log events (“skill used,” “enemy defeated”) and count them. Counts answer what happened. They rarely answer why. The schema design decision that changes everything is adding behavioral context to each event.

Every skill-use event captured:

FieldTypeWhy It Matters
skill_idintWhich skill
char_levelintPlayer’s level at time of cast
target_typeenumEnemy category (ranged, melee, boss)
hitboolDid it land
damage_dealtintActual output
mana_pctfloatResource state at cast (constrained or free?)
cooldowns_activeintHow many other skills were unavailable
time_since_last_castfloatSpacing pattern (spam vs. deliberate)

Every enemy defeat added:

FieldTypeWhy It Matters
enemy_idintWhich enemy type
zone_idintWhere it happened
threat_levelintWas this a dangerous encounter
time_since_last_deathfloatPlayer survival arc
skills_on_cooldownint[]Did the player have full options?
player_health_pctfloatHow close was this?

The cooldowns_active field on skill events turned out to be the one that answered the fireball question.

Pattern One: Constraint, Not Preference

Fireball appeared in roughly 73% of all mage skill-use events in the level 1–25 range. The forum posts weren’t wrong; our players were using it constantly. The forum interpretation was wrong.

When we filtered skill usage by cooldowns_active, the picture changed:

Fireball Dominance: Choice or Constraint?Mage Skill Usage (Levels 1–15) — Constraint Context73%Fireball share ofall skill castsThe headline number68%of casts when all otherskills were on cooldownNo real alternative12%chosen when alternativeswere actually availableActual player preferenceThe diagnostic insightPlayers weren't choosing fireball — they were defaulting to it. Nerfing damage would have treated a symptom.Representative data pattern — 90-day cohort, levels 1–15

Players weren’t spamming fireball because it was the best option. They were spamming it because the early skill tree’s acquisition cost made other spells economically inaccessible at those levels. Fireball was frequently the only skill available to cast.

A nerf would have reduced fireball’s damage while leaving the underlying constraint untouched. Players would still spam fireball—at lower output—and now feel weaker. The correct fix was adjusting skill acquisition cost to give players meaningful choices earlier.

 
When a single skill or feature shows disproportionate usage, filter by constraint context before assuming it’s overtuned. Usage dominance can mean “players love this” or “players have no alternative.” The behavioral context — what other options were available — is what separates the two.

Pattern Two: Players Found a Better Use

Three months in, a skill-usage heatmap by level range showed something the community team didn’t expect. Ice lance—designed as a situational crowd-control tool—was the dominant offensive skill in the level 15–25 range, by a significant margin.

We mapped this against the enemy defeat data for that zone:

Zone (Levels 15–25)Dominant Enemy TypeIce Lance UsageBaseline Usage (Other Zones)
Ashfen MarshlandsFast-moving swarm (small)41%9%
Ironhold KeepArmored, slow (large)11%9%
Duskhaven RuinsMixed (ranged + melee)18%9%

Ice lance’s slow effect was disproportionately valuable against fast-moving small enemies. The ability and the zone had been designed independently. Players discovered the combination organically and built a mid-game playstyle around it.

The community team’s first read was to nerf ice lance. The product read was different: what happens if we lean into what players already decided they wanted, rather than correcting it?

Two synergy abilities were added in the next update—late-game skills that built on ice lance’s crowd-control mechanic. Retention in the level 15–30 band improved by roughly 22% following that update, compared to the prior cohort. Players with an “ice mage” identity had a reason to continue.

 
Look for usage that’s significantly above baseline in a specific context. Unexpected spikes in bounded contexts usually indicate emergent player strategy, not a bug. Before correcting for it, ask whether the emergent strategy is something your design should validate rather than suppress. Players often find things worth building around.

Pattern Three: What Deaths Tell You About Design

The time_since_last_death and post-death behavior data produced the most counterintuitive finding of the project. One mid-game zone had one of the highest first-death rates in the game. Conventional interpretation: zone is poorly balanced, players are frustrated, fix the difficulty.

The post-death behavior told a different story:

Post-Death Behavior by Zone TypePost-Death Behavior by Zone TypeHigh-Frustration Zone38%quit session after dying41%retried within 2 minutes21%moved to another areaHigh-Death / High-Retry Zone9%quit session after dying84%retried within 2 minutes7%moved to another areaSame death rate in both zones, but a completely different player response.

Same death rate, but a completely different player response. In frustration zones, players died and left. In this zone, players died and immediately tried again. The deaths weren’t unfair—they were interesting.

Rather than adjusting the zone, we used it as a design template: extract what made deaths feel fair (clear visual telegraphing of enemy attacks, recoverable positioning mistakes, no one-shot mechanics below 30% health) and apply those principles to the zones where death was causing logout rather than retry.

 
Death rate alone is the wrong metric. Track the behavior sequence — what players do after a terminal event. A high-death zone where players retry is a signal of engaging design. A low-death zone where players log off can signal boredom. Aggregate outcome metrics obscure this; behavioral sequences reveal it.

Pattern Four: Extending the Schema Mid-Analysis

“Skills used at moment of death” wasn’t in the original schema. We added it after noticing that specific boss encounters had suspiciously similar death events—players dying while using the same skill combination.

That combination was a dominant strategy that worked until a specific boss mechanic triggered, at which point it caused an almost-guaranteed death. Players who discovered the strategy were effectively optimizing themselves into a hard failure point.

Once visible, we could respond: add a subtle audio cue before the failure-triggering mechanic as a warning signal. Deaths in that encounter dropped by roughly 40% in the following release without changing the difficulty rating.

 
Some of the most valuable metrics aren’t ones you planned. Build the event schema to be extensible — adding context fields to existing events is significantly cheaper than rebuilding the pipeline. When you notice a pattern that your current schema can’t fully explain, add the field that would explain it. The analysis you can run against existing and new data together is often where the highest-leverage insights live.

Reading the Same Data From Both Chairs

Each of these patterns looks different depending on which role you’re carrying when the data comes in. The product leader and the technical leader are reading the same events — they’re just asking different first questions. What makes this work in practice is that those questions depend on each other.

The Same Signals, Two First Questions

Product Leader
Pattern 1: Is this a balance problem or a design economics problem? Which lever do we actually move?

Pattern 2: Should we suppress this emergent behavior or build on it? What does it reveal about what players want?

Pattern 3: Is this zone broken, or is it good design we should replicate? What makes a death feel fair instead of cheap?

Pattern 4: Which failure states are most common, and what change would address the root cause?
Technical Leader
Pattern 1: Does our schema capture cooldowns_active? Can we filter by constraint context, or do we need to rebuild?

Pattern 2: Are zone_id and enemy_type cross-referenced in skill events? Is the query feasible?

Pattern 3: Are we capturing the behavioral sequence after death, or just the death event itself?

Pattern 4: Can we add skills_at_death to existing events retroactively, or only forward?

The product leader can only ask these questions if the technical leader built the schema to surface them. The technical leader only knows which context to capture if the product leader has thought through what matters. This is why the two roles need to design the analytics system together — before the pipeline is built, not after the first question comes back unanswerable.

The architecture below addresses the technical side of this directly: how to build a pipeline that keeps the game stable while keeping the analytics side extensible enough to answer questions you haven’t thought of yet.

The Data Configuration That Protects Your Game

One thing worth addressing explicitly: a poorly designed analytics pipeline can become a game stability problem, not just a product intelligence problem.

Event logging at the volume needed for behavioral analysis—potentially thousands of events per minute per active player—requires a clear separation between your game data path and your analytics data path:

Game Analytics Pipeline: Separation from Critical PathCRITICAL PATHANALYTICS PATHGame Logic LayerCore gameplay · rules · player stateGame State DBSave files · progress · inventoryStrict read/write guaranteesAnalytics Event QueueWrite-only · fire-and-forgetBest-effort — dropped events OKAnalytics writes must NEVER block or fail the game loopAnalytics Processing LayerSeparate serviceSeparate failure domainAnalytics Store / DashboardQueries · reporting · insightsTwo separate failure domains — analytics down does not mean game down

Analytics event writes must never block or fail the game loop — this is the non-negotiable. If your analytics pipeline goes down, the game continues. If analytics writes compete with save file writes on the same connection or thread, you’ve introduced a reliability dependency where none should exist.

In practice, this means:

  • Queue analytics events locally, flush asynchronously
  • Treat event delivery as best-effort (dropped events are acceptable; game corruption is not)
  • Use a separate database connection, separate schema, or separate service entirely
  • Test your game loop with the analytics pipeline artificially delayed or unavailable
 
The failure mode to avoid: analytics writes sharing a synchronous database connection with game state writes. Under load, analytics volume can starve your game state operations. This tends to surface in production at the worst possible time—high concurrent player counts—and produces symptoms that look like game state bugs rather than analytics bugs.

Teaching the Pattern, Not Just the Examples

Each of the scenarios above represents a reusable analytical pattern. Before running any usage analysis, it helps to have these patterns in your toolkit:

PatternQuestion It AnswersContext Field Required
Constraint filterIs dominance choice or necessity?alternatives_available
Context spikeIs this emergent player strategy?zone_id, enemy_type
Post-event behaviorIs this terminal event engaging or frustrating?next_action, time_to_next_action
Co-occurrence analysisWhat player state correlates with failure?skills_at_death, health_at_death

These patterns aren’t unique to game development — they’re just easier to see there. A play session is bounded, every action is logged, and the feedback loop between a design decision and a player response is measured in hours rather than weeks. Most product environments don’t have that compression, but the underlying signals exist. They’re spread across longer timescales and noisier data, which makes them harder to find and easier to explain away.

The constraint filter shows up in any product where one feature or workflow carries a disproportionate share of usage. Before reading that as validation, it’s worth asking what alternatives existed at the moment of use. In B2B software especially, a workflow that handles 80% of activity often does so because it’s the only complete path through the system, not because users prefer it. The adoption number looks like success. It’s frequently covering up a design gap that nobody’s named yet.

Post-event behavioral sequences apply anywhere you have a terminal event worth understanding — subscription cancellation, checkout abandonment, task completion. What users do in the 90 seconds after they finish something, or fail at something, tends to be more diagnostic than the event itself. A user who cancels and immediately re-subscribes is telling you something different than one who cancels and goes quiet. The behavior after the fireball, after the death, after the checkout — that’s where the real signal is.

The analytical question you started with is rarely the most important one your data can answer. Building the schema to support follow-up questions — and reviewing the data for patterns you didn’t anticipate — is where the product intelligence actually lives.

The game was talking. Getting it to say something useful required building the infrastructure to ask better questions.

What are you building that you need to ask better questions about? The product intelligence you need is probably already in your data. You just have to know how to listen for it.