Course

Chapter 5: Market Selection and Synthetic Control Methods

November 1, 2025

Table of contents

The foundation of reliable Matched Market Testing lies in selecting appropriate test and control markets, then creating accurate synthetic controls. This technical deep-dive covers the statistical methods and quality assurance processes that ensure robust experimental design.

Strategic Market Selection Approach

The first step in our MMT process involves carefully selecting test and control markets to ensure reliable measurement. We select medium-to-large-sized test markets that are representative of the broader market while excluding major metropolitan areas like New York, Los Angeles, and Chicago to avoid disrupting significant revenue streams.

Our market selection process combines statistical rigor with practical business considerations.

Statistical Correlation Analysis

Core Methodology

We analyze historical time-series data for key performance indicators (KPIs) such as sales or conversions across all potential markets. Using the Pearson correlation coefficient, we quantify relationships between markets over a pre-test period, typically requiring a high positive correlation (>0.7) between test and potential control markets.

This ensures control markets have historically trended similarly to test markets, creating a reliable comparative baseline.

Correlation Requirements

Minimum Correlation: 0.5 correlation required, 0.7+ preferred for highest confidence
Main KPI Correlation: >0.5 minimum, >0.7 preferred between test and control markets
Correlation Stability: Consistent correlation across multiple time windows
Log-Correlation: Often examined to account for size differences between markets

Technical Implementation

# Example correlation analysis using R

library(tidyverse)

library(corrplot)

# Calculate Pearson correlation between test and potential control markets

market_correlations <- historical_data %>%

  pivot_wider(names_from = market, values_from = conversions) %>%

  cor(use = “complete.obs”, method = “pearson”)

# Filter markets with >0.7 correlation to test market

suitable_controls <- market_correlations[test_market, ] %>%

  .[. > 0.7] %>%

  names()

Comprehensive Matching Criteria

Beyond statistical correlation, we evaluate markets across multiple dimensions:

Demographic Alignment

Population characteristics including age, income, and education levels should align within 10% variance between test and control markets.

Economic Conditions

Similar unemployment rates
Comparable cost of living indices
Consistent economic growth patterns
Regional economic stability

Market Size Considerations

Control markets should collectively represent similar volume characteristics to test markets
Test Market Coverage: Typically 10-15% of total business volume for optimal balance
Minimum Volume Thresholds: At least 10 conversions per day or $500 daily revenue per test market

Competitive Landscape

Similar competitive presence across markets
Comparable promotional activity levels
Consistent market maturity stages
Similar brand awareness levels

Geographic Independence

Sufficient separation to prevent spillover effects between test and control regions, ensuring true isolation of marketing effects.

Quality Thresholds and Exclusions

Data Quality Requirements

Historical Data: 12-24 months of stable historical performance data required
Completeness: <5% missing data points across measurement period
Consistency: Stable reporting methodologies across test and control markets
Outlier Detection: Identification and treatment of anomalous data points that could skew results

Market Exclusions

Reset Periods: 90-day waiting period for markets previously used in testing
Operational Feasibility: Markets must allow precise geographic targeting within advertising platforms
Major Metro Exclusions: Avoid NYC, LA, Chicago to protect significant revenue streams

Creating the Counterfactual: Synthetic Controls

Advanced Synthetic Control Methodology

Rather than using simple averages of control markets, we employ sophisticated synthetic control methods to create more accurate counterfactuals. This approach recognizes that different control markets may be better predictors of test market behavior in different ways.

The Optimization Process

We solve an optimization problem that finds the optimal weights for control markets by minimizing the Root Mean Squared Prediction Error (RMSPE) between the test market and weighted control group over the pre-intervention period.

Example: Instead of equally weighting three control markets (33% each), our algorithm might determine that 60% Market A + 30% Market B + 10% Market C creates the most accurate replica of the test market’s historical patterns.

Technical Implementation with tidysynth

Using the open-source tidysynth R package, we create weighted combinations that produce superior “business-as-usual” baselines.

library(tidysynth)

# Create synthetic control

synthetic_control <- historical_data %>%

  synthetic_control(outcome = conversions,

                   unit = market,

                   time = date,

                   i_unit = “test_market”,

                   i_time = intervention_date) %>%

  generate_predictor(time_window = pre_period,

                    conversions = mean(conversions, na.rm = TRUE)) %>%

  generate_weights(optimization_window = pre_period) %>%

  generate_control()

This methodology accounts for nuanced relationships between markets and provides more accurate counterfactuals for what would have happened in test markets without intervention.

Quality Assurance Metrics

Synthetic Control Quality Scores

Causal Impact (CI) Score: <0.7 indicates reliable predictive capability
Pre-Test RMSPE: Lower values indicate better synthetic control fit
Weight Distribution: Balanced allocation across multiple control markets preferred to avoid over-reliance on any single control

Market Size and Power Metrics

Control Pool Size: Minimum 10-15 potential control markets for robust synthetic control creation
Geographic Distribution: Ensure test markets don’t over-represent specific regions
Volume Balance: Control markets should collectively match test market characteristics

Pre-Test Validation

Pre-Test Fit: Measures how closely the synthetic control matches test market historical trends
Weight Distribution: Ensures balanced allocation across multiple markets
Stability Testing: Validates consistent weights across different time periods and calculation methods
Adaptive Recalibration: Allows weight adjustments if market dynamics change significantly between test design and execution

Synthetic Control Advantages

This synthetic control approach transforms multiple imperfect control markets into a single, highly accurate counterfactual that captures the complex dynamics influencing test market performance.

Benefits Over Simple Averaging

Higher Accuracy: Optimized weights create better historical fit
Reduced Variance: More stable counterfactual predictions
Flexibility: Adapts to unique market characteristics
Transparency: Clear mathematical foundation for market weighting

Quality Validation Process

Historical Fit Assessment: Evaluate how well synthetic control replicates test market pre-period
Weight Reasonableness: Ensure no single market dominates synthetic control
Stability Testing: Validate consistent performance across different time windows
Placebo Testing: Apply methodology to non-test periods to validate approach

Implementation Best Practices

Market Selection Workflow

Historical Data Collection: Gather 12-24 months of market-level performance data
Correlation Analysis: Calculate Pearson correlations between all market pairs
Multi-Criteria Filtering: Apply demographic, economic, and operational filters
Synthetic Control Creation: Use tidysynth to optimize control market weights
Quality Validation: Assess synthetic control fit and stability metrics
Final Selection: Choose test and control markets meeting all quality thresholds

Tools and Resources

Primary Tool: tidysynth R package for synthetic control implementation
GitHub Repository: edunford/tidysynth
Additional Resources: Causal Impact package for post-test analysis validation

This rigorous market selection and synthetic control process ensures that MMT results provide reliable, actionable insights for marketing optimization and scaling decisions.

Next Steps: Learn about Platform-Specific MMT Implementation

Next Chapter