Methodology

Last updated May 2026 · Source on GitHub

The Consumer Compass is a 0-100 gauge of U.S. consumer financial health. It combines labor, income, balance-sheet, credit, spending, sentiment, inflation, and big-ticket affordability data into a single score while preserving the underlying sub-scores and indicator histories.

Methodology overhaul status

The production headline score currently uses the v1 state-based methodology described below: transformed indicators are converted into expanding-window percentile scores, smoothed, then aggregated. The v2 overhaul keeps that state score as the anchor but adds explicit short momentum and medium-trend layers before any headline-weight change is adopted.

The key improvement is not simply "more momentum." A consumer health score needs to be stable enough to describe the present, but responsive enough to catch turning points. The working v2 target is a tested state / momentum / trend mix, with the exact weights chosen by backtest rather than by intuition.

Current quality-control fixes focus on two v1 mechanics: stale indicators are carried forward only within a frequency-specific freshness window before the headline is computed, and the household net-worth / DPI ratio normalizes FRED Z.1 net worth into the same dollar scale as disposable income. These fixes reduce artificial month-to-month jumps without changing the broader v2 testing plan.

One remaining pattern to watch is the small step-like move that can appear when quarterly indicators refresh inside a monthly chart. That pattern reflects release cadence and mixed data frequency, not ordinary consumer seasonality. The v2 work should make those effects visible through staleness and attribution rather than letting them masquerade as a clean economic signal.

Scoring Principles

No look-ahead bias

A score at time t is computed only from data available through time t. Historical percentiles use expanding windows, not the full future dataset.

Economic direction first

Each series is transformed into the version that best matches consumer stress or strength before scoring: growth rates for flow data, levels for ratios, and target proximity for inflation.

Transparent before clever

The score should be explainable from visible indicators, weights, transforms, and validation results. New features must earn their place through walk-forward testing.

Signal Architecture

The current score is primarily a state score. The methodology overhaul makes the three useful questions explicit: what is the level, how fast is it changing, and whether the change is persistent.

Layer Question Production today V2 upgrade
State How healthy is the current reading versus its own history? Expanding-window percentile score, direction-adjusted. Keep as the anchor, with robust z-score diagnostics for severity.
Short momentum Is the indicator improving or deteriorating over the last 1-3 releases? Published today as headline 1m and 3m score deltas. Add indicator-level momentum scores and roll them into the composite.
Medium trend Is the signal persistently moving over a 6-12 month horizon? Visible in history charts and sparklines, not directly scored. Add trend slope / moving-average crossover scores after backtesting.

Current Production Formula

For each scored indicator, the pipeline applies an economic transform, ranks the transformed value against its own history through that date, flips the direction where lower values are healthier, smooths the result, and averages the available indicators inside each sub-score.

state_score(t) = percentile_rank(value(t), history[0:t])
indicator_score(t) = direction_adjusted(state_score(t))
sub_score(t) = average(smoothed_indicator_scores(t))
headline(t) = weighted_average(sub_scores(t))
  • Higher-is-better indicators keep their percentile rank.
  • Lower-is-better indicators are inverted, so weaker labor, rising delinquencies, and higher rates reduce the score.
  • CPI indicators are scored by proximity to 2% rather than by lower-is-always-better logic.
  • Indicators are carried forward only inside a freshness window before monthly sub-scores and the headline are computed.

Pre-Score Transforms

Indicator family Transform before scoring
Nonfarm Payrolls (PAYEMS) Month-over-month change, then 3-month rolling average
Real Hourly Earnings, DPI, Retail Sales, PCE Food Services Year-over-year percent change
Real PCE momentum 3-month annualized rate of change
Net Worth / DPI Household net worth divided by annualized disposable personal income
CPI, Core CPI, Shelter CPI Year-over-year inflation scored by distance from 2%
TSA Throughput Daily throughput compared with 2019 same-period baseline
Rates, delinquencies, claims, sentiment, saving rate Economically meaningful level

Smoothing and Update Cadence

Smoothing is applied after scoring so extreme single releases do not dominate the composite. Monthly indicators use a 3-month trailing average, weekly indicators use a 4-week trailing average, daily indicators use a 30-day trailing average, and quarterly indicators are carried forward only long enough to remain relevant in the monthly score timeline.

A v2 staleness adjustment should make release timing visible: weekly claims should move the labor signal faster than quarterly delinquency data, and stale quarterly data should never masquerade as fresh evidence.

Sub-Score Weights

Sub-score Headline weight Signals
Labor & Income 20% UNRATE, payroll growth, initial claims, continued claims, real wages
Credit Stress 20% Bank delinquencies, charge-offs, NY Fed serious delinquency transitions, SLOOS tightening
Balance Sheet 15% Saving rate, real income growth, debt service ratio, net worth / DPI
Spending & Demand 15% Real PCE momentum, real retail sales, travel demand, food services
Sentiment 10% UMich sentiment, OECD confidence proxy, NY Fed missed-payment probability
Inflation 10% Headline CPI, core CPI, shelter CPI, gasoline prices
Big Ticket 10% Mortgage rates, auto loan rates, credit-card rates, housing affordability

Within each sub-score, production v1 weights indicators equally unless a source is unavailable. V2 will test modest leading/coincident/lagging tilts, but only if they improve validation.

Validation Standard

The score is useful only if it behaves like a disciplined signal rather than a nice-looking dashboard. Validation should be published alongside the methodology and rerun when transforms, weights, or indicator sets change.

Test What it checks
Recession response Score should deteriorate before or by NBER recession starts without relying on future data.
Consumer outcome fit Backtest against real PCE, retail sales, delinquencies, charge-offs, and employment weakness.
False-positive discipline Track periods where the score warned but consumer data did not subsequently weaken.
Revision awareness Prefer vintages where available and flag indicators with heavy benchmark revisions.
Staleness control Do not let delayed quarterly indicators dominate a fresh monthly or weekly signal.

V2 Roadmap

  1. 1 Publish state / momentum / trend views separately before changing the headline score.
  2. 2 Backtest several candidate mixes: 60/25/15, 55/30/15, and Perplexity-style 40/40/20.
  3. 3 Adopt the mix that improves turning-point detection without making the headline score too noisy.
  4. 4 Add indicator-level release lag, revision risk, and leading/coincident/lagging metadata to the UI.
  5. 5 Publish a reproducible validation report with weights, excluded series, and known failures.

Data Sources and Legal Notes

  • FRED: Federal Reserve Bank of St. Louis, including several federal and private-source series redistributed through FRED.
  • BLS: Bureau of Labor Statistics labor data.
  • BEA: Bureau of Economic Analysis income and spending data.
  • NY Fed HHDC and SCE: Household credit and consumer-expectations data.
  • EIA: U.S. retail gasoline price data.
  • TSA: Daily throughput data used as a travel-demand proxy.
  • SEC EDGAR: Public company filings used for earnings-call quote context.
  • Manheim MUVVI and restricted vendor series: Used only where redistribution terms allow; otherwise linked or excluded from scoring.

The Consumer Compass is not investment advice. Scores are statistical computations and source code is MIT-licensed.