Methodology
Last updated May 2026 · Source on GitHub
The Consumer Compass is a 0-100 gauge of U.S. consumer financial health. It combines labor, income, balance-sheet, credit, spending, sentiment, inflation, and big-ticket affordability data into a single score while preserving the underlying sub-scores and indicator histories.
Methodology overhaul status
The production headline score currently uses the v1 state-based methodology described below: transformed indicators are converted into expanding-window percentile scores, smoothed, then aggregated. The v2 overhaul keeps that state score as the anchor but adds explicit short momentum and medium-trend layers before any headline-weight change is adopted.
The key improvement is not simply "more momentum." A consumer health score needs to be stable enough to describe the present, but responsive enough to catch turning points. The working v2 target is a tested state / momentum / trend mix, with the exact weights chosen by backtest rather than by intuition.
Current quality-control fixes focus on two v1 mechanics: stale indicators are carried forward only within a frequency-specific freshness window before the headline is computed, and the household net-worth / DPI ratio normalizes FRED Z.1 net worth into the same dollar scale as disposable income. These fixes reduce artificial month-to-month jumps without changing the broader v2 testing plan.
One remaining pattern to watch is the small step-like move that can appear when quarterly indicators refresh inside a monthly chart. That pattern reflects release cadence and mixed data frequency, not ordinary consumer seasonality. The v2 work should make those effects visible through staleness and attribution rather than letting them masquerade as a clean economic signal.
Scoring Principles
No look-ahead bias
A score at time t is computed only from data available through time t. Historical percentiles use expanding windows, not the full future dataset.
Economic direction first
Each series is transformed into the version that best matches consumer stress or strength before scoring: growth rates for flow data, levels for ratios, and target proximity for inflation.
Transparent before clever
The score should be explainable from visible indicators, weights, transforms, and validation results. New features must earn their place through walk-forward testing.
Signal Architecture
The current score is primarily a state score. The methodology overhaul makes the three useful questions explicit: what is the level, how fast is it changing, and whether the change is persistent.
| Layer | Question | Production today | V2 upgrade |
|---|---|---|---|
| State | How healthy is the current reading versus its own history? | Expanding-window percentile score, direction-adjusted. | Keep as the anchor, with robust z-score diagnostics for severity. |
| Short momentum | Is the indicator improving or deteriorating over the last 1-3 releases? | Published today as headline 1m and 3m score deltas. | Add indicator-level momentum scores and roll them into the composite. |
| Medium trend | Is the signal persistently moving over a 6-12 month horizon? | Visible in history charts and sparklines, not directly scored. | Add trend slope / moving-average crossover scores after backtesting. |
Current Production Formula
For each scored indicator, the pipeline applies an economic transform, ranks the transformed value against its own history through that date, flips the direction where lower values are healthier, smooths the result, and averages the available indicators inside each sub-score.
- Higher-is-better indicators keep their percentile rank.
- Lower-is-better indicators are inverted, so weaker labor, rising delinquencies, and higher rates reduce the score.
- CPI indicators are scored by proximity to 2% rather than by lower-is-always-better logic.
- Indicators are carried forward only inside a freshness window before monthly sub-scores and the headline are computed.
Pre-Score Transforms
| Indicator family | Transform before scoring |
|---|---|
| Nonfarm Payrolls (PAYEMS) | Month-over-month change, then 3-month rolling average |
| Real Hourly Earnings, DPI, Retail Sales, PCE Food Services | Year-over-year percent change |
| Real PCE momentum | 3-month annualized rate of change |
| Net Worth / DPI | Household net worth divided by annualized disposable personal income |
| CPI, Core CPI, Shelter CPI | Year-over-year inflation scored by distance from 2% |
| TSA Throughput | Daily throughput compared with 2019 same-period baseline |
| Rates, delinquencies, claims, sentiment, saving rate | Economically meaningful level |
Smoothing and Update Cadence
Smoothing is applied after scoring so extreme single releases do not dominate the composite. Monthly indicators use a 3-month trailing average, weekly indicators use a 4-week trailing average, daily indicators use a 30-day trailing average, and quarterly indicators are carried forward only long enough to remain relevant in the monthly score timeline.
A v2 staleness adjustment should make release timing visible: weekly claims should move the labor signal faster than quarterly delinquency data, and stale quarterly data should never masquerade as fresh evidence.
Sub-Score Weights
| Sub-score | Headline weight | Signals |
|---|---|---|
| Labor & Income | 20% | UNRATE, payroll growth, initial claims, continued claims, real wages |
| Credit Stress | 20% | Bank delinquencies, charge-offs, NY Fed serious delinquency transitions, SLOOS tightening |
| Balance Sheet | 15% | Saving rate, real income growth, debt service ratio, net worth / DPI |
| Spending & Demand | 15% | Real PCE momentum, real retail sales, travel demand, food services |
| Sentiment | 10% | UMich sentiment, OECD confidence proxy, NY Fed missed-payment probability |
| Inflation | 10% | Headline CPI, core CPI, shelter CPI, gasoline prices |
| Big Ticket | 10% | Mortgage rates, auto loan rates, credit-card rates, housing affordability |
Within each sub-score, production v1 weights indicators equally unless a source is unavailable. V2 will test modest leading/coincident/lagging tilts, but only if they improve validation.
Validation Standard
The score is useful only if it behaves like a disciplined signal rather than a nice-looking dashboard. Validation should be published alongside the methodology and rerun when transforms, weights, or indicator sets change.
| Test | What it checks |
|---|---|
| Recession response | Score should deteriorate before or by NBER recession starts without relying on future data. |
| Consumer outcome fit | Backtest against real PCE, retail sales, delinquencies, charge-offs, and employment weakness. |
| False-positive discipline | Track periods where the score warned but consumer data did not subsequently weaken. |
| Revision awareness | Prefer vintages where available and flag indicators with heavy benchmark revisions. |
| Staleness control | Do not let delayed quarterly indicators dominate a fresh monthly or weekly signal. |
V2 Roadmap
- 1 Publish state / momentum / trend views separately before changing the headline score.
- 2 Backtest several candidate mixes: 60/25/15, 55/30/15, and Perplexity-style 40/40/20.
- 3 Adopt the mix that improves turning-point detection without making the headline score too noisy.
- 4 Add indicator-level release lag, revision risk, and leading/coincident/lagging metadata to the UI.
- 5 Publish a reproducible validation report with weights, excluded series, and known failures.
Data Sources and Legal Notes
- FRED: Federal Reserve Bank of St. Louis, including several federal and private-source series redistributed through FRED.
- BLS: Bureau of Labor Statistics labor data.
- BEA: Bureau of Economic Analysis income and spending data.
- NY Fed HHDC and SCE: Household credit and consumer-expectations data.
- EIA: U.S. retail gasoline price data.
- TSA: Daily throughput data used as a travel-demand proxy.
- SEC EDGAR: Public company filings used for earnings-call quote context.
- Manheim MUVVI and restricted vendor series: Used only where redistribution terms allow; otherwise linked or excluded from scoring.
The Consumer Compass is not investment advice. Scores are statistical computations and source code is MIT-licensed.