UI/UX Atlas
Strategy & Metrics Advanced

UX Metrics Frameworks: HEART, PULSE, GSM & CASTLE

Master four complementary UX measurement systems — HEART, PULSE, GSM, and CASTLE — and learn how to tie design decisions directly to outcomes that matter.

9 min read

The full lesson

Shipping a redesign without measuring its impact is a guess dressed up as a decision. The four frameworks in this lesson — HEART (Google), PULSE (Microsoft), GSM (Goals-Signals-Metrics), and CASTLE (an emerging enterprise model) — give you structured vocabularies for turning fuzzy design goals into trackable signals. That makes conversations with product managers, engineers, and executives far more concrete.

Knowing when to use each framework, and how to combine them, is the difference between saying “we improved the experience” and saying “task success rose from 61% to 84%, reducing support contacts by 22%.”

Why Frameworks Beat Ad-Hoc Metrics

Most teams default to metrics that are already set up: page views, daily active users (DAU), NPS. These aren’t useless. But they’re incomplete as UX evidence because they mix up usage with satisfaction, and engagement with real value.

The deeper problem is the say/do gap: attitudinal surveys measure what users say, not what they do. A user can give you an 8/10 NPS score and still abandon your checkout every time the address form errors. Behavioral data — task success rates, time-on-task, error rates — is far more reliable. Good metrics frameworks push you to instrument both layers and compare them.

Frameworks also create alignment artifacts. When a designer, PM, and data analyst all sign off on the same GSM tree, you’ve ended the “we measure success differently” argument before it starts.

The HEART Framework

Google’s HEART framework was introduced by Kerry Rodden and colleagues in a 2010 CHI paper. It gives product teams five categories that together describe the full user experience.

CategoryWhat it capturesExample metric
HappinessSubjective satisfaction, perceived easeUMUX-Lite score, CSAT
EngagementDepth and frequency of useSessions per user per week, features used
AdoptionNew users reaching a key milestone% of new users completing onboarding in 7 days
RetentionUsers returning over time30-day retention, churn rate
Task SuccessObjective usability — completion rate, time, errorsTask completion rate, time-on-task

A few nuances practitioners often miss:

  • Not all five categories apply to every project. A search-results redesign might focus on Task Success and Happiness. A notification-system redesign might focus on Engagement and Retention. Pick the categories that match the problem.
  • Engagement is a double-edged signal. High engagement can mean users love the product — or that it’s confusing and they’re spending extra time recovering from errors. Always pair Engagement metrics with Task Success to understand the direction.
  • Happiness metrics must be validated instruments. Home-grown satisfaction questions have no psychometric baseline, so comparisons over time become unreliable. Use UMUX-Lite (2 items, Likert-7) or SUS for standardized benchmarking.

Goals-Signals-Metrics (GSM)

GSM is not a separate tool. It is the process for populating HEART (or any metrics framework) with meaningful numbers. The hierarchy has three levels:

  1. Goal — What is the user trying to accomplish? What is the product trying to achieve? Write goals in plain language: “Users should be able to find the right pricing plan without contacting support.”
  2. Signal — Observable user behavior that shows progress toward the goal. Signals can be positive (completed plan selection) or negative (rage-clicked the compare table). Define signals before asking what’s feasible to instrument.
  3. Metric — The specific, quantified measurement of a signal. For example: ”% of users who complete plan selection within a single session without contacting support.”

Building a GSM Tree

Build the GSM tree collaboratively in a workshop with design, PM, engineering, and data. The output is a table, not a paragraph. Here is a condensed example for an enterprise SaaS dashboard:

GoalSignalMetric
Users find the right report quicklyUsers reach the target report without backtracking% of sessions reaching report with zero navigation backtrack
Dashboard feels trustworthyUsers don’t question data freshness% of users who hover the last-updated timestamp (proxy for doubt)
Teams adopt sharing featuresUsers send a report link to a colleague30-day share-to-viewer conversion rate

GSM enforces a discipline in the signal step. Most teams jump straight from goal to metric and skip the behavioral reasoning. The signal layer forces the question: “What would we actually observe if users were succeeding or failing?” That surfaces blind spots in your instrumentation plan before you’re stuck logging the wrong events.

The PULSE Framework

Microsoft’s PULSE framework pre-dates HEART. Its categories lean more toward product health and engineering observability.

CategoryMeaningTypical source
Page viewsVolume of use and reachAnalytics
UptimeReliability and availabilitySRE / monitoring
LatencyPerceived performanceRUM (Real User Monitoring)
Seven-day active usersEngagement breadthProduct analytics
EarningsBusiness outcome proxyRevenue data

PULSE is primarily a product health dashboard, not a UX quality instrument. It tells you whether the product is working and being used. It does not tell you whether users are succeeding or satisfied. The classic mistake is treating PULSE metrics — especially page views and 7-day actives — as the main measure of success, equating engagement with value.

PULSE shines in joint OKR conversations with engineering and business. Designers often lack visibility into latency and uptime data, but both have direct UX consequences. A 200 ms vs. 2 s response time is a user-experience difference, not just an infrastructure difference. Including Latency in your PULSE dashboard gives you a hook for advocating performance budgets.

HEART + PULSE Together

Use them at different altitudes:

  • PULSE as the operational baseline — the product must be fast, reliable, and used.
  • HEART + GSM as the UX quality layer — within that working product, are users succeeding and satisfied?

If a regression in PULSE Latency shows up and your HEART Task Success data confirms it is causing abandonment, that multi-signal case is a compelling argument for a performance sprint.

The CASTLE Framework

CASTLE is a more recent framework designed specifically for complex enterprise and B2B products, where the user, the buyer, and the business beneficiary are often different people.

LetterCategoryWhat it captures
CompletionTask and workflow completionEnd-to-end process success, not just micro-task success
AdoptionFeature and platform uptake% of licensed users actively using key capabilities
SatisfactionAttitudinal qualityValidated survey scores (UMUX-Lite, CES)
Time efficiencyEffort and speedTime-on-task, Customer Effort Score
LearnabilityOnboarding and skill acquisitionTime-to-first-value, training hours required
ErrorsQuality and reliabilityError rate, recovery rate, escalations to support

CASTLE acknowledges a reality HEART underweights: in enterprise software, adoption and learnability are first-class UX outcomes. A feature that power users love but 80% of licensed seat holders have never touched is a UX failure — even if its HEART scores are excellent.

CASTLE in Practice

Enterprise UX teams typically run CASTLE measurement at three levels:

  • Product level — aggregate scores across the full platform
  • Workflow level — scores for specific end-to-end processes (for example, the invoice-approval workflow)
  • Feature level — scores for individual components (for example, the bulk-edit panel)

This hierarchy maps cleanly onto enterprise user research programs. Product-level CASTLE informs roadmap prioritization. Workflow-level findings drive redesign scopes. Feature-level data validates individual design decisions.

Do

Define CASTLE metrics per workflow, not just per product. A single aggregate CASTLE score hides which workflows are dragging down the overall experience. Map Completion and Time Efficiency metrics to specific job-to-be-done flows so you know exactly where to invest.

Don't

Don’t apply CASTLE to consumer apps — it creates unnecessary overhead. CASTLE earns its complexity only when you have multi-role workflows, large licensed seat counts, formal onboarding programs, and an IT/admin buyer persona separate from the end user.

Choosing the Right Framework

ScenarioRecommended approach
Consumer app, single user typeHEART + GSM
Enterprise SaaS, complex workflowsCASTLE + GSM
Cross-functional product health reviewPULSE as baseline, HEART for UX layer
Usability study on a specific featureHEART Task Success + SUS/UMUX-Lite
OKR-setting with leadershipGSM tree to back up HEART categories with specific metrics

The most common mistake is treating these as competing methodologies. They are complementary lenses. PULSE tells you the engine is running. HEART tells you the driver is comfortable. CASTLE tells you the whole logistics fleet is moving cargo efficiently. GSM is the process you use to decide which gauge on the dashboard to add next.

Connecting Metrics to Outcomes

Frameworks only create value if they connect to decisions. Three practices close that loop.

1. Set directional baselines before a project starts. Without a pre-launch baseline, post-launch improvement is unverifiable. Even a two-week instrumentation sprint to capture current Task Success rates before a redesign gives you something to compare against.

2. Define a North Star metric per initiative, not per product. A single product can have multiple initiatives running at the same time, each with its own primary metric from the HEART or CASTLE vocabulary. Combining them into one product-level number makes it impossible to attribute cause.

3. Report outcome-tied metrics, not activity metrics. “We ran 12 usability tests” is activity. “Task success on the filter panel increased from 54% to 78% after redesign” is an outcome. Present the latter to leadership; keep the former in your team’s internal documentation.

Mixed-Method Triangulation

No single metric tells the full story. Modern UX measurement triangulates across three data types:

  • Behavioral (what users do) — task completion rates, click paths, error logs, funnel drop-offs
  • Attitudinal (what users say) — UMUX-Lite, CES, open-ended interview themes
  • Physiological / biometric (what users show) — eye tracking, facial expression coding (less common; most valuable for high-stakes decisions)

HEART and CASTLE provide the vocabulary. GSM determines which signals to instrument. Mixed-method triangulation determines which data sources fill those signals. Relying on attitudinal surveys alone to predict behavior is the canonical say/do gap mistake — users consistently report higher ease and satisfaction than their behavior supports.

For quantitative benchmarking studies — comparing your product to a competitor, or to your own baseline from a year ago — you need at least 40 participants to reach 95% confidence with a reasonable margin of error. The “5-user rule” applies only to qualitative problem-finding studies. Applying it to quantitative benchmarking will produce wildly unreliable numbers.