UX Metrics Frameworks: HEART, PULSE, GSM & CASTLE

Key takeaways

HEART (Happiness, Engagement, Adoption, Retention, Task Success) is the most widely adopted UX framework; use GSM to translate HEART categories into specific, instrumented metrics.
PULSE covers product health (performance, reliability, scale) and pairs with HEART — PULSE at operational altitude, HEART at UX quality altitude.
CASTLE adds Completion, Learnability, and Errors as first-class categories, making it the right choice for enterprise products with multi-role workflows and formal onboarding.
Behavioral data (task success, error rates, funnel analytics) is more reliable evidence than attitudinal surveys alone — triangulate both for defensible conclusions.
Define GSM signal layers before checking what is already instrumented, so you measure what matters rather than what happens to be easy to log.

The full lesson

Shipping a redesign without measuring its impact is a guess dressed up as a decision. The four frameworks in this lesson — HEART (Google), PULSE (Microsoft), GSM (Goals-Signals-Metrics), and CASTLE (an emerging enterprise model) — give you structured vocabularies for turning fuzzy design goals into trackable signals. That makes conversations with product managers, engineers, and executives far more concrete.

Knowing when to use each framework, and how to combine them, is the difference between saying “we improved the experience” and saying “task success rose from 61% to 84%, reducing support contacts by 22%.”

Why Frameworks Beat Ad-Hoc Metrics

Most teams default to metrics that are already set up: page views, daily active users (DAU), NPS. These aren’t useless. But they’re incomplete as UX evidence because they mix up usage with satisfaction, and engagement with real value.

The deeper problem is the say/do gap: attitudinal surveys measure what users say, not what they do. A user can give you an 8/10 NPS score and still abandon your checkout every time the address form errors. Behavioral data — task success rates, time-on-task, error rates — is far more reliable. Good metrics frameworks push you to instrument both layers and compare them.

Frameworks also create alignment artifacts. When a designer, PM, and data analyst all sign off on the same GSM tree, you’ve ended the “we measure success differently” argument before it starts.

The HEART Framework

Google’s HEART framework was introduced by Kerry Rodden and colleagues in a 2010 CHI paper. It gives product teams five categories that together describe the full user experience.

Category	What it captures	Example metric
Happiness	Subjective satisfaction, perceived ease	UMUX-Lite score, CSAT
Engagement	Depth and frequency of use	Sessions per user per week, features used
Adoption	New users reaching a key milestone	% of new users completing onboarding in 7 days
Retention	Users returning over time	30-day retention, churn rate
Task Success	Objective usability — completion rate, time, errors	Task completion rate, time-on-task

A few nuances practitioners often miss:

Not all five categories apply to every project. A search-results redesign might focus on Task Success and Happiness. A notification-system redesign might focus on Engagement and Retention. Pick the categories that match the problem.
Engagement is a double-edged signal. High engagement can mean users love the product — or that it’s confusing and they’re spending extra time recovering from errors. Always pair Engagement metrics with Task Success to understand the direction.
Happiness metrics must be validated instruments. Home-grown satisfaction questions have no psychometric baseline, so comparisons over time become unreliable. Use UMUX-Lite (2 items, Likert-7) or SUS for standardized benchmarking.

Goals-Signals-Metrics (GSM)

GSM is not a separate tool. It is the process for populating HEART (or any metrics framework) with meaningful numbers. The hierarchy has three levels:

Goal — What is the user trying to accomplish? What is the product trying to achieve? Write goals in plain language: “Users should be able to find the right pricing plan without contacting support.”
Signal — Observable user behavior that shows progress toward the goal. Signals can be positive (completed plan selection) or negative (rage-clicked the compare table). Define signals before asking what’s feasible to instrument.
Metric — The specific, quantified measurement of a signal. For example: ”% of users who complete plan selection within a single session without contacting support.”

Building a GSM Tree

Build the GSM tree collaboratively in a workshop with design, PM, engineering, and data. The output is a table, not a paragraph. Here is a condensed example for an enterprise SaaS dashboard:

Goal	Signal	Metric
Users find the right report quickly	Users reach the target report without backtracking	% of sessions reaching report with zero navigation backtrack
Dashboard feels trustworthy	Users don’t question data freshness	% of users who hover the last-updated timestamp (proxy for doubt)
Teams adopt sharing features	Users send a report link to a colleague	30-day share-to-viewer conversion rate

GSM enforces a discipline in the signal step. Most teams jump straight from goal to metric and skip the behavioral reasoning. The signal layer forces the question: “What would we actually observe if users were succeeding or failing?” That surfaces blind spots in your instrumentation plan before you’re stuck logging the wrong events.

The PULSE Framework

Microsoft’s PULSE framework pre-dates HEART. Its categories lean more toward product health and engineering observability.

Category	Meaning	Typical source
Page views	Volume of use and reach	Analytics
Uptime	Reliability and availability	SRE / monitoring
Latency	Perceived performance	RUM (Real User Monitoring)
Seven-day active users	Engagement breadth	Product analytics
Earnings	Business outcome proxy	Revenue data

PULSE is primarily a product health dashboard, not a UX quality instrument. It tells you whether the product is working and being used. It does not tell you whether users are succeeding or satisfied. The classic mistake is treating PULSE metrics — especially page views and 7-day actives — as the main measure of success, equating engagement with value.

PULSE shines in joint OKR conversations with engineering and business. Designers often lack visibility into latency and uptime data, but both have direct UX consequences. A 200 ms vs. 2 s response time is a user-experience difference, not just an infrastructure difference. Including Latency in your PULSE dashboard gives you a hook for advocating performance budgets.

HEART + PULSE Together

Use them at different altitudes:

PULSE as the operational baseline — the product must be fast, reliable, and used.
HEART + GSM as the UX quality layer — within that working product, are users succeeding and satisfied?

If a regression in PULSE Latency shows up and your HEART Task Success data confirms it is causing abandonment, that multi-signal case is a compelling argument for a performance sprint.

The CASTLE Framework

CASTLE is a more recent framework designed specifically for complex enterprise and B2B products, where the user, the buyer, and the business beneficiary are often different people.

Letter	Category	What it captures
Completion	Task and workflow completion	End-to-end process success, not just micro-task success
Adoption	Feature and platform uptake	% of licensed users actively using key capabilities
Satisfaction	Attitudinal quality	Validated survey scores (UMUX-Lite, CES)
Time efficiency	Effort and speed	Time-on-task, Customer Effort Score
Learnability	Onboarding and skill acquisition	Time-to-first-value, training hours required
Errors	Quality and reliability	Error rate, recovery rate, escalations to support

CASTLE acknowledges a reality HEART underweights: in enterprise software, adoption and learnability are first-class UX outcomes. A feature that power users love but 80% of licensed seat holders have never touched is a UX failure — even if its HEART scores are excellent.

CASTLE in Practice

Enterprise UX teams typically run CASTLE measurement at three levels:

Product level — aggregate scores across the full platform
Workflow level — scores for specific end-to-end processes (for example, the invoice-approval workflow)
Feature level — scores for individual components (for example, the bulk-edit panel)

This hierarchy maps cleanly onto enterprise user research programs. Product-level CASTLE informs roadmap prioritization. Workflow-level findings drive redesign scopes. Feature-level data validates individual design decisions.

Define CASTLE metrics per workflow, not just per product. A single aggregate CASTLE score hides which workflows are dragging down the overall experience. Map Completion and Time Efficiency metrics to specific job-to-be-done flows so you know exactly where to invest.

Don't

Don’t apply CASTLE to consumer apps — it creates unnecessary overhead. CASTLE earns its complexity only when you have multi-role workflows, large licensed seat counts, formal onboarding programs, and an IT/admin buyer persona separate from the end user.

Choosing the Right Framework

Scenario	Recommended approach
Consumer app, single user type	HEART + GSM
Enterprise SaaS, complex workflows	CASTLE + GSM
Cross-functional product health review	PULSE as baseline, HEART for UX layer
Usability study on a specific feature	HEART Task Success + SUS/UMUX-Lite
OKR-setting with leadership	GSM tree to back up HEART categories with specific metrics

The most common mistake is treating these as competing methodologies. They are complementary lenses. PULSE tells you the engine is running. HEART tells you the driver is comfortable. CASTLE tells you the whole logistics fleet is moving cargo efficiently. GSM is the process you use to decide which gauge on the dashboard to add next.

Connecting Metrics to Outcomes

Frameworks only create value if they connect to decisions. Three practices close that loop.

1. Set directional baselines before a project starts. Without a pre-launch baseline, post-launch improvement is unverifiable. Even a two-week instrumentation sprint to capture current Task Success rates before a redesign gives you something to compare against.

2. Define a North Star metric per initiative, not per product. A single product can have multiple initiatives running at the same time, each with its own primary metric from the HEART or CASTLE vocabulary. Combining them into one product-level number makes it impossible to attribute cause.

3. Report outcome-tied metrics, not activity metrics. “We ran 12 usability tests” is activity. “Task success on the filter panel increased from 54% to 78% after redesign” is an outcome. Present the latter to leadership; keep the former in your team’s internal documentation.

Mixed-Method Triangulation

No single metric tells the full story. Modern UX measurement triangulates across three data types:

Behavioral (what users do) — task completion rates, click paths, error logs, funnel drop-offs
Attitudinal (what users say) — UMUX-Lite, CES, open-ended interview themes
Physiological / biometric (what users show) — eye tracking, facial expression coding (less common; most valuable for high-stakes decisions)

HEART and CASTLE provide the vocabulary. GSM determines which signals to instrument. Mixed-method triangulation determines which data sources fill those signals. Relying on attitudinal surveys alone to predict behavior is the canonical say/do gap mistake — users consistently report higher ease and satisfaction than their behavior supports.

For quantitative benchmarking studies — comparing your product to a competitor, or to your own baseline from a year ago — you need at least 40 participants to reach 95% confidence with a reasonable margin of error. The “5-user rule” applies only to qualitative problem-finding studies. Applying it to quantitative benchmarking will produce wildly unreliable numbers.