Methodology

Last updated May 2026 · Source on GitHub

The Consumer Compass is a 0-100 gauge of U.S. consumer financial health. It combines labor, income, balance-sheet, credit, spending, sentiment, inflation, and big-ticket affordability data into a single score while preserving the underlying sub-scores and indicator histories.

Methodology overhaul status

The production headline score currently uses the v1 state-based methodology described below: transformed indicators are converted into expanding-window percentile scores, smoothed, then aggregated. The v2 overhaul keeps that state score as the anchor but adds explicit short momentum and medium-trend layers before any headline-weight change is adopted.

The key improvement is not simply "more momentum." A consumer health score needs to be stable enough to describe the present, but responsive enough to catch turning points. The working v2 target is a tested state / momentum / trend mix, with the exact weights chosen by backtest rather than by intuition.

Current quality-control fixes focus on two v1 mechanics: stale indicators are carried forward only within a frequency-specific freshness window before the headline is computed, and the household net-worth / DPI ratio normalizes FRED Z.1 net worth into the same dollar scale as disposable income. These fixes reduce artificial month-to-month jumps without changing the broader v2 testing plan.

One remaining pattern to watch is the small step-like move that can appear when quarterly indicators refresh inside a monthly chart. That pattern reflects release cadence and mixed data frequency, not ordinary consumer seasonality. The v2 work should make those effects visible through staleness and attribution rather than letting them masquerade as a clean economic signal.

The debt stack now explicitly separates the amount owed from the payment burden: FRED Z.1 household debt / DPI captures broad leverage, while the Fed household debt-service ratio captures required payments. NY Fed student-loan serious-delinquency transitions add a borrower-stress signal that aggregate debt-service data can miss.

The range-bound look over the last several years is also a v1 limitation: an expanding-percentile state score can stay flat when several components remain in middling historical percentiles, even if the lived consumer story feels like gradual erosion. That is exactly why the v2 work separates level, momentum, and medium-trend signals instead of relying on the state score alone.

Scoring Principles

No look-ahead bias

A score at time t is computed only from data available through time t. Historical percentiles use expanding windows, not the full future dataset.

Economic direction first

Each series is transformed into the version that best matches consumer stress or strength before scoring: growth rates for flow data, levels for ratios, and target proximity for inflation.

Transparent before clever

The score should be explainable from visible indicators, weights, transforms, and validation results. New features must earn their place through walk-forward testing.

Signal Architecture

The current score is primarily a state score. The methodology overhaul makes the three useful questions explicit: what is the level, how fast is it changing, and whether the change is persistent.

Layer	Question	Production today	V2 upgrade
State	How healthy is the current reading versus its own history?	Expanding-window percentile score, direction-adjusted.	Keep as the anchor, with robust z-score diagnostics for severity.
Short momentum	Is the indicator improving or deteriorating over the last 1-3 releases?	Published today as headline 1m and 3m score deltas.	Add indicator-level momentum scores and roll them into the composite.
Medium trend	Is the signal persistently moving over a 6-12 month horizon?	Visible in history charts and sparklines, not directly scored.	Add trend slope / moving-average crossover scores after backtesting.

Current Production Formula

For each scored indicator, the pipeline applies an economic transform, ranks the transformed value against its own history through that date, flips the direction where lower values are healthier, smooths the result, and averages the available indicators inside each sub-score.

state_score(t) = percentile_rank(value(t), history[0:t])

indicator_score(t) = direction_adjusted(state_score(t))

sub_score(t) = average(smoothed_indicator_scores(t))

headline(t) = weighted_average(sub_scores(t))

Higher-is-better indicators keep their percentile rank.
Lower-is-better indicators are inverted, so weaker labor, rising delinquencies, and higher rates reduce the score.
CPI indicators are scored by proximity to 2% rather than by lower-is-always-better logic.
Indicators are carried forward only inside a freshness window before monthly sub-scores and the headline are computed.

Pre-Score Transforms

Indicator family	Transform before scoring
Nonfarm Payrolls (PAYEMS)	Month-over-month change, then 3-month rolling average
Real Hourly Earnings, DPI, Retail Sales, PCE Food Services	Year-over-year percent change
Real PCE momentum	3-month annualized rate of change
Debt / DPI and Net Worth / DPI	Household debt and net worth compared with disposable personal income
CPI, Core CPI, Shelter CPI	Year-over-year inflation scored by distance from 2%
TSA Throughput	Daily throughput compared with 2019 same-period baseline
Rates, delinquencies, claims, sentiment, saving rate	Economically meaningful level

Smoothing and Update Cadence

Smoothing is applied after scoring so extreme single releases do not dominate the composite. Monthly indicators use a 3-month trailing average, weekly indicators use a 4-week trailing average, daily indicators use a 30-day trailing average, and quarterly indicators are carried forward only long enough to remain relevant in the monthly score timeline.

A v2 staleness adjustment should make release timing visible: weekly claims should move the labor signal faster than quarterly delinquency data, and stale quarterly data should never masquerade as fresh evidence.

Sub-Score Weights

Sub-score	Headline weight	Signals
Labor & Income	20%	UNRATE, payroll growth, initial claims, continued claims, real wages
Credit Stress	20%	Bank delinquencies, charge-offs, NY Fed credit-card and student-loan serious delinquency transitions, SLOOS tightening
Balance Sheet	15%	Saving rate, real income growth, debt / DPI, debt service ratio, net worth / DPI
Spending & Demand	15%	Real PCE momentum, real retail sales, travel demand, food services
Sentiment	10%	UMich sentiment, OECD confidence proxy, NY Fed missed-payment probability
Inflation	10%	Headline CPI, core CPI, shelter CPI, gasoline prices
Big Ticket	10%	Mortgage rates, auto loan rates, credit-card rates, housing affordability

Within each sub-score, production v1 weights indicators equally unless a source is unavailable. V2 will test modest leading/coincident/lagging tilts, but only if they improve validation.

Validation Standard

The score is useful only if it behaves like a disciplined signal rather than a nice-looking dashboard. Validation should be published alongside the methodology and rerun when transforms, weights, or indicator sets change.

Test	What it checks
Recession response	Score should deteriorate before or by NBER recession starts without relying on future data.
Consumer outcome fit	Backtest against real PCE, retail sales, delinquencies, charge-offs, and employment weakness.
False-positive discipline	Track periods where the score warned but consumer data did not subsequently weaken.
Revision awareness	Prefer vintages where available and flag indicators with heavy benchmark revisions.
Staleness control	Do not let delayed quarterly indicators dominate a fresh monthly or weekly signal.

V2 Roadmap

1 Publish state / momentum / trend views separately before changing the headline score.
2 Backtest several candidate mixes: 60/25/15, 55/30/15, and Perplexity-style 40/40/20.
3 Adopt the mix that improves turning-point detection without making the headline score too noisy.
4 Add indicator-level release lag, revision risk, and leading/coincident/lagging metadata to the UI.
5 Publish a reproducible validation report with weights, excluded series, and known failures.

Data Sources and Legal Notes

FRED: Federal Reserve Bank of St. Louis, including several federal and private-source series redistributed through FRED.
BLS: Bureau of Labor Statistics labor data.
BEA: Bureau of Economic Analysis income and spending data.
NY Fed HHDC and SCE: Household credit and consumer-expectations data.
EIA: U.S. retail gasoline price data.
TSA: Daily throughput data used as a travel-demand proxy.
SEC EDGAR: Public company filings used for earnings-call quote context.
Manheim MUVVI and restricted vendor series: Used only where redistribution terms allow; otherwise linked or excluded from scoring.

The Consumer Compass is not investment advice. Scores are statistical computations and source code is MIT-licensed.