I Built a Causal AI Model to Find What Actually Causes Stock Drawdowns

Every investor talks about risk, but few can explain what truly drives it. We measure volatility, beta, or max drawdown, and assume we understand exposure. Yet these numbers are only outcomes. They tell us how much pain existed, not why it happened. Sometimes a high P/E stock crashes harder. Sometimes a low-margin business stays resilient. Correlation alone can’t answer what actually causes those reactions. That’s where most models stop, and where this story begins. causes This experiment tries to answer one deceptively simple question: Do fundamentals and market traits directly cause deeper drawdowns, or do they just move in sync with them? Do fundamentals and market traits directly cause deeper drawdowns, or do they just move in sync with them? To find out, I built a causal AI framework that analyzes how valuation, volatility, and profitability affect a stock’s downside. The data comes from EODHD’s historical APIs, covering ten years of S&P 500 stocks, which is more than enough to see how company characteristics shape real risk, not just statistical noise. EODHD’s historical APIs EODHD’s historical APIs The goal isn’t to forecast crashes or design a trading signal. It’s to understand what truly makes a portfolio fragile, and how causal inference can separate reason from coincidence. From Correlation to Causation It’s easy to assume that if two things move together, one must cause the other. High valuations, rising volatility, falling prices; the patterns often look convincing enough. I used to take those relationships at face value until I realized that correlation is mostly storytelling. It looks logical, it feels logical, but it rarely holds up when conditions change. That’s where causal inference comes in. Instead of asking what usually happens together, it asks what would have happened if something were different. what usually happens together what would have happened if something were different For instance, if two companies were identical except for their valuation, would the higher-valued one experience a deeper drawdown? That’s the kind of “what-if” question correlation can’t answer, but causal models can. In this study, I treated valuation, volatility, and profitability as the key “treatments.” The idea was to see how each factor, when changed, affects a company’s downside risk while holding everything else constant. It’s less about prediction and more about simulation, i.e., building alternate realities to see which traits consistently lead to more pain. I used EODHD’s data to make this possible: ten years of S&P 500 history, packed with price, volume, and fundamental data. The quality and consistency of that feed made it possible to treat this like a real-world experiment instead of just another backtest. EODHD’s data The Setup: Getting the Data Right Before getting into the causal analysis, I needed a proper foundation, which is clean, consistent data. Everything starts here. Importing Packages Importing Packages I first imported the essential packages. Nothing fancy, just what’s needed to pull data, shape it, and run the models later. import pandas as pd import numpy as np import requests import time from datetime import datetime, timedelta from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier import matplotlib.pyplot as plt import pandas as pd import numpy as np import requests import time from datetime import datetime, timedelta from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier import matplotlib.pyplot as plt This covers the basics: pandas and numpy for handling data, requests for API calls, sklearn for causal modeling, and matplotlib for plotting. Config and Extracting Tickers Config and Extracting Tickers Next, I configured the API and built a small universe of S&P 500 tickers. Instead of manually typing them out, I pulled the current list directly from the EODHD Fundamental Data endpoint. EODHD Fundamental Data endpoint EODHD Fundamental Data endpoint EODHD Fundamental Data endpoint api_key = 'YOUR EODHD API KEY' base_url = 'https://eodhd.com/api' tickers_url = f'{base_url}/fundamentals/GSPC.INDX?api_token={api_key}&fmt=json' tickers_raw = requests.get(tickers_url).json() tickers = list(pd.DataFrame(tickers_raw['Components']).T['Code']) tickers = [item + '.US' for item in t] start_date = '2018-01-01' end_date = datetime.today().strftime('%Y-%m-%d') api_key = 'YOUR EODHD API KEY' base_url = 'https://eodhd.com/api' tickers_url = f'{base_url}/fundamentals/GSPC.INDX?api_token={api_key}&fmt=json' tickers_raw = requests.get(tickers_url).json() tickers = list(pd.DataFrame(tickers_raw['Components']).T['Code']) tickers = [item + '.US' for item in t] start_date = '2018-01-01' end_date = datetime.today().strftime('%Y-%m-%d') EODHD’s /fundamentals/GSPC.INDX endpoint provides the full list of S&P 500 constituents. By appending .US to each code, I ensured compatibility with the /eod/ endpoint for fetching daily prices. Fetching Historical Data Fetching Historical Data Once the tickers were ready, I moved on to fetching price data and calculating returns. The function below retrieves daily OHLC values using EODHD’s Historical Data endpointfor each symbol between the start and end date, then stitches everything together. EODHD’s Historical Data endpoint EODHD’s Historical Data endpoint EODHD’s Historical Data endpoint def fetch_eod_prices(ticker, start, end): url = f'{base_url}/eod/{ticker}?from={start}&to={end}&api_token={api_key}&fmt=json' r = requests.get(url).json() if not isinstance(r, list) or not r: return pd.DataFrame(columns=['date','close','ticker']) df = pd.DataFrame(r)[['date','close']] df['ticker'] = ticker print(f'{ticker} FETCHED DATA') time.sleep(1) return df frames = [fetch_eod_prices(t, start_date, end_date) for t in tickers] prices_df = pd.concat(frames, ignore_index=True).dropna(subset=['close']) prices_df['date'] = pd.to_datetime(prices_df['date']) prices_df = prices_df.sort_values(['ticker','date']) prices_df['ret'] = prices_df.groupby('ticker')['close'].pct_change().fillna(0.0) prices_df = prices_df[['ticker','date','close','ret']] prices_df = prices_df.reset_index(drop=True) prices_df def fetch_eod_prices(ticker, start, end): url = f'{base_url}/eod/{ticker}?from={start}&to={end}&api_token={api_key}&fmt=json' r = requests.get(url).json() if not isinstance(r, list) or not r: return pd.DataFrame(columns=['date','close','ticker']) df = pd.DataFrame(r)[['date','close']] df['ticker'] = ticker print(f'{ticker} FETCHED DATA') time.sleep(1) return df frames = [fetch_eod_prices(t, start_date, end_date) for t in tickers] prices_df = pd.concat(frames, ignore_index=True).dropna(subset=['close']) prices_df['date'] = pd.to_datetime(prices_df['date']) prices_df = prices_df.sort_values(['ticker','date']) prices_df['ret'] = prices_df.groupby('ticker')['close'].pct_change().fillna(0.0) prices_df = prices_df[['ticker','date','close','ret']] prices_df = prices_df.reset_index(drop=True) prices_df This creates a unified dataset of over a million rows across all S&P 500 tickers, each with the closing price and daily return. This is how the final dataframe looks like: The output confirms everything is aligned: Each ticker is ordered by date. The first row per ticker shows a return of zero (which validates the group logic). The data is dense, balanced, and ready for feature engineering in the next step. Each ticker is ordered by date. The first row per ticker shows a return of zero (which validates the group logic). The data is dense, balanced, and ready for feature engineering in the next step. Measuring Stress: Calculating Drawdowns Once the return data was ready, I needed to turn it into something that represents risk. Returns tell how a stock moved, but not how it behaved under pressure. Drawdown captures exactly that, which is how deep a stock fell from its recent peak during stress. For this, I defined two major market stress windows: COVID crash (Feb–Apr 2020) Rate-hike shock (Aug–Oct 2022) COVID crash (Feb–Apr 2020) COVID crash (Feb–Apr 2020) Rate-hike shock (Aug–Oct 2022) Rate-hike shock (Aug–Oct 2022) Each window captures a different macro shock, one liquidity-driven, the other inflation-driven. This gives a good contrast for causal inference later. **Calculating drawdowns def max_drawdown(series: pd.Series) -> float: cummax = series.cummax() dd = series / cummax - 1.0 return float(dd.min()) stress_windows = [ ('2020-02-15', '2020-04-30'), ('2022-08-01', '2022-10-31') ] rows = [] for t in prices_df['ticker'].unique(): df_t = prices_df.loc[prices_df['ticker'] == t, ['date','close']].set_index('date').sort_index() for i, (start, end) in enumerate(stress_windows, start=1): s = df_t.loc[start:end, 'close'] if s.empty: continue rows.append({ 'ticker': t, 'window_id': i, 'start': start, 'end': end, 'max_dd': max_drawdown(s) }) drawdowns_df = pd.DataFrame(rows) drawdowns_df def max_drawdown(series: pd.Series) -> float: cummax = series.cummax() dd = series / cummax - 1.0 return float(dd.min()) stress_windows = [ ('2020-02-15', '2020-04-30'), ('2022-08-01', '2022-10-31') ] rows = [] for t in prices_df['ticker'].unique(): df_t = prices_df.loc[prices_df['ticker'] == t, ['date','close']].set_index('date').sort_index() for i, (start, end) in enumerate(stress_windows, start=1): s = df_t.loc[start:end, 'close'] if s.empty: continue rows.append({ 'ticker': t, 'window_id': i, 'start': start, 'end': end, 'max_dd': max_drawdown(s) }) drawdowns_df = pd.DataFrame(rows) drawdowns_df The max_drawdown() function tracks the running peak of a stock’s price and measures the percentage drop from that peak. Then, for each ticker, I extract prices within the defined stress windows and compute the worst drawdown in that range. By iterating through two different market crises, the result shows how each company handled external shocks. Each stock now has up to two entries, one per stress event, along with its corresponding maximum drawdown. Fetching Fundamentals & Defining Treatments With the price and drawdown data ready, the next step was to define the “treatments.” In causal inference, a treatment represents an event or characteristic whose effect we want to measure. Here, I wanted to understand how certain financial traits, like valuation, volatility, and profitability, affect a company’s risk profile during market stress. The three treatments I picked were: High PE ratio: stocks that are priced expensively relative to their earnings. High beta: stocks that move aggressively with the market. Low profit margin: stocks with thinner profitability cushions. High PE ratio: stocks that are priced expensively relative to their earnings. High PE ratio High beta: stocks that move aggressively with the market. High beta Low profit margin: stocks with thinner profitability cushions. Low profit margin To build these, I used EODHD’s Fundamentals API, which provides all the key metrics I needed. Fetching Fundamentals Data Fetching Fundamentals Data def fetch_fundamentals(ticker): url = f'{base_url}/fundamentals/{ticker}?api_token={api_key}&fmt=json' r = requests.get(url).json() if not isinstance(r, dict): return pd.DataFrame() data = r.get('Highlights', {}) general = r.get('General', {}) val = r.get('Valuation', {}) row = { 'ticker': ticker, 'sector': general.get('Sector', np.nan), 'market_cap': data.get('MarketCapitalization', np.nan), 'beta': r.get('Technicals')['Beta'], 'eps': data.get('EarningsShare', np.nan), 'div_yield': data.get('DividendYield', np.nan), 'net_margin': data.get('ProfitMargin', np.nan), 'pe_ratio': val.get('TrailingPE'), 'pb_ratio': val.get('PriceBookMRQ'), } print(f'{ticker} FETCHED DATA') time.sleep(1) return pd.DataFrame([row]) fund_frames = [fetch_fundamentals(t) for t in tickers] fundamentals_df = pd.concat(fund_frames, ignore_index=True) num_cols = ['market_cap','beta','eps','div_yield','net_margin','pe_ratio','pb_ratio'] fundamentals_df[num_cols] = fundamentals_df[num_cols].apply(pd.to_numeric, errors='coerce') fundamentals_df def fetch_fundamentals(ticker): url = f'{base_url}/fundamentals/{ticker}?api_token={api_key}&fmt=json' r = requests.get(url).json() if not isinstance(r, dict): return pd.DataFrame() data = r.get('Highlights', {}) general = r.get('General', {}) val = r.get('Valuation', {}) row = { 'ticker': ticker, 'sector': general.get('Sector', np.nan), 'market_cap': data.get('MarketCapitalization', np.nan), 'beta': r.get('Technicals')['Beta'], 'eps': data.get('EarningsShare', np.nan), 'div_yield': data.get('DividendYield', np.nan), 'net_margin': data.get('ProfitMargin', np.nan), 'pe_ratio': val.get('TrailingPE'), 'pb_ratio': val.get('PriceBookMRQ'), } print(f'{ticker} FETCHED DATA') time.sleep(1) return pd.DataFrame([row]) fund_frames = [fetch_fundamentals(t) for t in tickers] fundamentals_df = pd.concat(fund_frames, ignore_index=True) num_cols = ['market_cap','beta','eps','div_yield','net_margin','pe_ratio','pb_ratio'] fundamentals_df[num_cols] = fundamentals_df[num_cols].apply(pd.to_numeric, errors='coerce') fundamentals_df Each call to /fundamentals/{ticker} returns multiple nested blocks: General, Highlights, Valuation, and Technicals. I extracted what I needed from each: General: Sector Highlights: Earnings per share, profit margin, dividend yield, market cap Valuation: PE and PB ratios Technicals: Beta General: Sector Highlights: Earnings per share, profit margin, dividend yield, market cap Valuation: PE and PB ratios Technicals: Beta All fields are converted into a single flat record per ticker and combined into one large DataFrame. The structure is wide enough to hold all core features and still compact enough to merge easily with returns later. Merging data and creating treatments Merging data and creating treatments This step creates the dataset used for causal inference, combining financial performance with firm-level traits. # merge fundamentals with drawdowns per ticker-window combined_df = drawdowns_df.merge(fundamentals_df, on='ticker', how='left') # basic controls combined_df['log_mcap'] = np.log(combined_df['market_cap'].replace({0: np.nan})) combined_df['sector'] = combined_df['sector'].fillna('Unknown') combined_df.head() # merge fundamentals with drawdowns per ticker-window combined_df = drawdowns_df.merge(fundamentals_df, on='ticker', how='left') # basic controls combined_df['log_mcap'] = np.log(combined_df['market_cap'].replace({0: np.nan})) combined_df['sector'] = combined_df['sector'].fillna('Unknown') combined_df.head() Each row here represents a stock-window pair: its drawdown during a specific stress period, valuation profile, and sector context. Log-transforming market cap helps stabilize the data since firm sizes vary wildly, while sector labels later help balance comparisons. Defining Treatment Flags Defining Treatment Flags With the merged dataset ready, I defined what qualifies as “treated.” For example, a stock is treated for a high PE if it falls above the 70th percentile PE within its own sector. That ensures comparisons happen within similar industries instead of globally. def sector_pe_thresholds(df, min_count=5, q=0.70): th = df.groupby('sector')['pe_ratio'].quantile(q).rename('pe_thr').to_dict() global_thr = df['pe_ratio'].quantile(q) return {s: (thr if not np.isnan(thr) else global_thr) for s, thr in th.items()} pe_thr_map = sector_pe_thresholds(fundamentals_df) combined_df['high_pe'] = combined_df.apply( lambda r: 1 if pd.notna(r['pe_ratio']) and r['pe_ratio'] > pe_thr_map.get(r['sector'], np.nan) else 0, axis=1 ) combined_df['high_beta'] = (combined_df['beta'] > 1.2).astype(int) # market sensitivity proxy combined_df['low_margin'] = (combined_df['net_margin'] < 0).astype(int) # weak profitability treat_cols = ['high_pe', 'high_beta', 'low_margin'] ctrl_cols = ['log_mcap', 'sector'] model_df = combined_df[['ticker','window_id','start','end','max_dd','pe_ratio','beta','net_margin','market_cap'] + treat_cols + ctrl_cols].copy() model_df.head(10) def sector_pe_thresholds(df, min_count=5, q=0.70): th = df.groupby('sector')['pe_ratio'].quantile(q).rename('pe_thr').to_dict() global_thr = df['pe_ratio'].quantile(q) return {s: (thr if not np.isnan(thr) else global_thr) for s, thr in th.items()} pe_thr_map = sector_pe_thresholds(fundamentals_df) combined_df['high_pe'] = combined_df.apply( lambda r: 1 if pd.notna(r['pe_ratio']) and r['pe_ratio'] > pe_thr_map.get(r['sector'], np.nan) else 0, axis=1 ) combined_df['high_beta'] = (combined_df['beta'] > 1.2).astype(int) # market sensitivity proxy combined_df['low_margin'] = (combined_df['net_margin'] < 0).astype(int) # weak profitability treat_cols = ['high_pe', 'high_beta', 'low_margin'] ctrl_cols = ['log_mcap', 'sector'] model_df = combined_df[['ticker','window_id','start','end','max_dd','pe_ratio','beta','net_margin','market_cap'] + treat_cols + ctrl_cols].copy() model_df.head(10) The logic is simple but deliberate: High PE stocks represent overvalued firms within each sector. High beta stocks are those that move more than 20% above market sensitivity. Low margin stocks reflect profitability risk. High PE stocks represent overvalued firms within each sector. High PE High beta stocks are those that move more than 20% above market sensitivity. High beta Low margin stocks reflect profitability risk. Low margin The resulting dataset looks like this: Each stock now carries its drawdown, financial metrics, and binary treatment flags. We have exactly what’s needed for causal analysis in the next section. Estimating Causal Effects With all the inputs ready, drawdowns, fundamentals, and treatment flags, it was time to move beyond correlations and actually test for causation. The question was straightforward: How much more (or less) do high-PE, high-beta, and low-margin stocks fall during market stress after accounting for factors like size and sector? How much more (or less) do high-PE, high-beta, and low-margin stocks fall during market stress after accounting for factors like size and sector? That’s a classic causal inference problem. Instead of using traditional regressions, I decided to rely on two modern estimators: Inverse Probability Weighting (IPW) reweights observations to mimic a randomized experiment. Doubly Robust (DR) Estimator combines regression and reweighting to reduce bias. Inverse Probability Weighting (IPW) reweights observations to mimic a randomized experiment. Inverse Probability Weighting (IPW) Doubly Robust (DR) Estimator combines regression and reweighting to reduce bias. Doubly Robust (DR) Estimator Implementing the Estimators Implementing the Estimators The first function computes the IPW estimate. It fits a logistic model to estimate the propensity score, which is the probability of being “treated” (for example, being a high-PE stock) given the controls. Then it adjusts the sample weights to make the treated and control groups comparable. The second function goes a step further by combining a regression model with weighting to create a doubly robust estimate. Even if one of the models is slightly misspecified, the results tend to remain stable. def causal_ipw(data, treatment, outcome, controls): df = data.dropna(subset=[treatment, outcome] + controls).copy() X = pd.get_dummies(df[controls], drop_first=True) T = df[treatment] y = df[outcome] # propensity score model ps_model = LogisticRegression(max_iter=500) ps_model.fit(X, T) p = ps_model.predict_proba(X)[:,1].clip(0.01, 0.99) # inverse weighting weights = T/p + (1-T)/(1-p) ate = np.mean(weights * (T - p) * y) / np.mean(weights * (T - p)**2) return ate def causal_dr(data, treatment, outcome, controls): df = data.dropna(subset=[treatment, outcome] + controls).copy() X = pd.get_dummies(df[controls], drop_first=True) T = df[treatment].values y = df[outcome].values # models m_y = RandomForestRegressor(n_estimators=200, random_state=42) m_t = LogisticRegression(max_iter=500) m_y.fit(np.column_stack([X, T]), y) m_t.fit(X, T) p = m_t.predict_proba(X)[:,1].clip(0.01,0.99) y_hat_1 = m_y.predict(np.column_stack([X, np.ones(len(X))])) y_hat_0 = m_y.predict(np.column_stack([X, np.zeros(len(X))])) dr_scores = (T*(y - y_hat_1)/p) - ((1-T)*(y - y_hat_0)/(1-p)) + (y_hat_1 - y_hat_0) ate = np.mean(dr_scores) return ate results = [] for t in ['high_pe','high_beta','low_margin']: ate_ipw = causal_ipw(model_df, t, 'max_dd', ['log_mcap','sector']) ate_dr = causal_dr(model_df, t, 'max_dd', ['log_mcap','sector']) results.append({'treatment': t, 'ate_ipw': ate_ipw, 'ate_dr': ate_dr}) effects_df = pd.DataFrame(results) effects_df def causal_ipw(data, treatment, outcome, controls): df = data.dropna(subset=[treatment, outcome] + controls).copy() X = pd.get_dummies(df[controls], drop_first=True) T = df[treatment] y = df[outcome] # propensity score model ps_model = LogisticRegression(max_iter=500) ps_model.fit(X, T) p = ps_model.predict_proba(X)[:,1].clip(0.01, 0.99) # inverse weighting weights = T/p + (1-T)/(1-p) ate = np.mean(weights * (T - p) * y) / np.mean(weights * (T - p)**2) return ate def causal_dr(data, treatment, outcome, controls): df = data.dropna(subset=[treatment, outcome] + controls).copy() X = pd.get_dummies(df[controls], drop_first=True) T = df[treatment].values y = df[outcome].values # models m_y = RandomForestRegressor(n_estimators=200, random_state=42) m_t = LogisticRegression(max_iter=500) m_y.fit(np.column_stack([X, T]), y) m_t.fit(X, T) p = m_t.predict_proba(X)[:,1].clip(0.01,0.99) y_hat_1 = m_y.predict(np.column_stack([X, np.ones(len(X))])) y_hat_0 = m_y.predict(np.column_stack([X, np.zeros(len(X))])) dr_scores = (T*(y - y_hat_1)/p) - ((1-T)*(y - y_hat_0)/(1-p)) + (y_hat_1 - y_hat_0) ate = np.mean(dr_scores) return ate results = [] for t in ['high_pe','high_beta','low_margin']: ate_ipw = causal_ipw(model_df, t, 'max_dd', ['log_mcap','sector']) ate_dr = causal_dr(model_df, t, 'max_dd', ['log_mcap','sector']) results.append({'treatment': t, 'ate_ipw': ate_ipw, 'ate_dr': ate_dr}) effects_df = pd.DataFrame(results) effects_df Results Results Interpretation Interpretation This table is where things start getting interesting. Each row represents one treatment, and the two columns (ate_ipw and ate_dr) show how that characteristic influences the maximum drawdown on average. Here’s what stands out: High PE stocks tend to fall about 0.21 more (IPW) during stress periods. The DR estimator shows a smaller effect, but the direction is consistent. Overvalued stocks feel the heat first. High beta stocks show a 0.25 deeper drawdown, which is not surprising. They amplify market moves both ways. Low-margin companies are hit the hardest, with nearly 0.37 additional downside. This suggests that profitability cushions truly help during panic phases. High PE stocks tend to fall about 0.21 more (IPW) during stress periods. The DR estimator shows a smaller effect, but the direction is consistent. Overvalued stocks feel the heat first. High PE stocks High beta stocks show a 0.25 deeper drawdown, which is not surprising. They amplify market moves both ways. High beta stocks Low-margin companies are hit the hardest, with nearly 0.37 additional downside. This suggests that profitability cushions truly help during panic phases. Low-margin companies While the magnitudes differ slightly across estimators, the pattern is clear. Fragile fundamentals amplify stress. And that’s exactly the kind of causal evidence I wanted to see. Not just which stocks correlate with volatility, but which ones actually cause higher risk exposure when the market turns. Visualizing and Validating the Effects Once I had the causal estimates, I wanted to see what they actually looked like. Numbers can tell you the “what,” but visuals often reveal the “how.” At this point, the objective wasn’t to add more math, but to see whether the intuition from the earlier steps holds up when plotted. This stage also acts as a sanity check. If the visual patterns contradict the statistical results, something’s usually off in the setup. But if both align, you can be more confident that the model’s logic is sound. Plotting the Causal Effects Plotting the Causal Effects The first visualization compares the estimated effects from the two causal models, IPW and DR. This helps check whether both estimators point in the same direction and how strongly each treatment affects the drawdown. effects_plot = effects_df.melt(id_vars='treatment', value_vars=['ate_ipw','ate_dr'], var_name='method', value_name='ate') plt.figure(figsize=(6,4)) for m in effects_plot['method'].unique(): subset = effects_plot[effects_plot['method']==m] plt.bar(subset['treatment'], subset['ate'], alpha=0.7, label=m) plt.axhline(0, color='black', linewidth=0.8) plt.ylabel('Estimated Causal Effect on Max Drawdown') plt.title('Causal Effect Estimates (IPW vs DR)') plt.legend() plt.tight_layout() plt.show() effects_plot = effects_df.melt(id_vars='treatment', value_vars=['ate_ipw','ate_dr'], var_name='method', value_name='ate') plt.figure(figsize=(6,4)) for m in effects_plot['method'].unique(): subset = effects_plot[effects_plot['method']==m] plt.bar(subset['treatment'], subset['ate'], alpha=0.7, label=m) plt.axhline(0, color='black', linewidth=0.8) plt.ylabel('Estimated Causal Effect on Max Drawdown') plt.title('Causal Effect Estimates (IPW vs DR)') plt.legend() plt.tight_layout() plt.show() The bar chart above captures the story behind all those computations. Each bar represents the estimated causal effect of a financial trait on maximum drawdown. Across the three treatments, the trend is consistent. High PE, high beta, and low margin all deepen downside risk, with low-margin stocks showing the sharpest impact. The difference between the blue and orange bars (IPW and DR) is minor, which is exactly what you want to see. It means both models are converging toward the same conclusion. The takeaway is clear: fundamental fragility doesn’t just coincide with volatility. It drives it. Checking Overlap Between Treated and Control Groups Checking Overlap Between Treated and Control Groups Before trusting any causal estimate, it’s essential to check whether the treated and control samples are comparable. In theory, causal inference assumes overlap, meaning every treated stock has a comparable control stock with similar characteristics, except for the treatment itself. If there’s no overlap, the results become meaningless because we’re comparing two entirely different worlds. Propensity Score Distribution Propensity Score Distribution To test this, I plotted the propensity scores for each treatment. The score represents the probability of a stock being “treated” (for instance, having a high PE ratio) given its market cap and sector. for t in ['high_pe','high_beta','low_margin']: df = model_df.dropna(subset=[t,'log_mcap','sector']).copy() X = pd.get_dummies(df[['log_mcap','sector']], drop_first=True) T = df[t] ps_model = LogisticRegression(max_iter=500) ps_model.fit(X, T) ps = ps_model.predict_proba(X)[:,1] plt.figure(figsize=(6,4)) plt.hist(ps[T==1], bins=20, alpha=0.6, label='treated', color='tab:blue') plt.hist(ps[T==0], bins=20, alpha=0.6, label='control', color='tab:orange') plt.title(f'Propensity Score Overlap — {t}') plt.xlabel('propensity score') plt.ylabel('count') plt.legend() plt.tight_layout() plt.show() for t in ['high_pe','high_beta','low_margin']: df = model_df.dropna(subset=[t,'log_mcap','sector']).copy() X = pd.get_dummies(df[['log_mcap','sector']], drop_first=True) T = df[t] ps_model = LogisticRegression(max_iter=500) ps_model.fit(X, T) ps = ps_model.predict_proba(X)[:,1] plt.figure(figsize=(6,4)) plt.hist(ps[T==1], bins=20, alpha=0.6, label='treated', color='tab:blue') plt.hist(ps[T==0], bins=20, alpha=0.6, label='control', color='tab:orange') plt.title(f'Propensity Score Overlap — {t}') plt.xlabel('propensity score') plt.ylabel('count') plt.legend() plt.tight_layout() plt.show() The histograms above show how well the model managed to balance the treated and control groups across the three traits. For high PE stocks, there’s a healthy overlap between blue and orange bars, which means the weighting process has done its job fairly well. For high beta, the distributions are more scattered, hinting that these stocks behave differently enough to make perfect matching difficult. Low margin companies show the least overlap, which is expected since unprofitable firms often cluster in specific sectors like biotech or early-stage tech. For high PE stocks, there’s a healthy overlap between blue and orange bars, which means the weighting process has done its job fairly well. high PE For high beta, the distributions are more scattered, hinting that these stocks behave differently enough to make perfect matching difficult. high beta Low margin companies show the least overlap, which is expected since unprofitable firms often cluster in specific sectors like biotech or early-stage tech. Low margin While the overlaps aren’t perfect, they’re sufficient to move forward. The key point is that there’s enough shared ground between treated and control groups to make causal comparison meaningful. Verifying Covariate Balance Verifying Covariate Balance Even though the visual overlaps looked decent, I wanted a numerical check to confirm that the weighting step did what it was supposed to. The idea is simple: after reweighting, the treated and control groups should look nearly identical in their key characteristics, except for the treatment itself. To test this, I compared the mean difference in log market capitalization between high-beta stocks and their controls, both before and after weighting. t = 'high_beta' df = model_df.dropna(subset=[t,'log_mcap','sector']).copy() X = pd.get_dummies(df[['log_mcap','sector']], drop_first=True) T = df[t] ps_model = LogisticRegression(max_iter=500) ps_model.fit(X, T) ps = ps_model.predict_proba(X)[:,1] w = T/ps + (1-T)/(1-ps) m_treat_raw = df.loc[T==1,'log_mcap'].mean() m_ctrl_raw = df.loc[T==0,'log_mcap'].mean() m_treat_w = np.average(df['log_mcap'], weights=T*w) m_ctrl_w = np.average(df['log_mcap'], weights=(1-T)*w) print('Covariate balance for log_mcap (high_beta)') print(f'Raw diff: {m_treat_raw - m_ctrl_raw:.4f}') print(f'Weighted diff: {m_treat_w - m_ctrl_w:.4f}') t = 'high_beta' df = model_df.dropna(subset=[t,'log_mcap','sector']).copy() X = pd.get_dummies(df[['log_mcap','sector']], drop_first=True) T = df[t] ps_model = LogisticRegression(max_iter=500) ps_model.fit(X, T) ps = ps_model.predict_proba(X)[:,1] w = T/ps + (1-T)/(1-ps) m_treat_raw = df.loc[T==1,'log_mcap'].mean() m_ctrl_raw = df.loc[T==0,'log_mcap'].mean() m_treat_w = np.average(df['log_mcap'], weights=T*w) m_ctrl_w = np.average(df['log_mcap'], weights=(1-T)*w) print('Covariate balance for log_mcap (high_beta)') print(f'Raw diff: {m_treat_raw - m_ctrl_raw:.4f}') print(f'Weighted diff: {m_treat_w - m_ctrl_w:.4f}') Before weighting, the average difference in log market cap between high-beta and low-beta stocks was around 0.19, which means the two groups weren’t perfectly comparable. After applying the weights, that difference dropped to 0.0008 (nearly zero). This confirms that the balancing process worked as intended. By reweighting observations according to their propensity scores, the treated and control groups now share almost identical distributions in their control variables. That alignment is what gives the causal estimates credibility. Without it, we would simply be comparing random subsets of the market instead of isolating the effect of specific traits like valuation, volatility, or profitability. What This Means for Risk Management After spending hours inside Python, it’s easy to forget what this all connects to, which is the real market. The value of causal modeling is that it reframes how we think about risk. Instead of assuming or guessing which factors drive volatility, it gives us a measurable way to quantify how much each one truly contributes. In traditional analysis, drawdowns are treated as outcomes of randomness or sentiment. But once we look through a causal lens, the randomness begins to fade. We start to see structure and the patterns that repeat when certain traits combine. A portfolio manager can use this insight to design more resilient allocations. For instance, if low-margin firms consistently amplify downside during stress windows, those positions can be hedged or sized down ahead of similar conditions. If high-beta stocks display asymmetric losses even after accounting for size and sector, then they deserve a risk premium or tighter stop rules. The same principle applies to stress testing. Rather than simulating arbitrary shocks, we can construct counterfactual scenarios: What would have happened if the portfolio held fewer high-PE names during a sell-off? What would have happened if the portfolio held fewer high-PE names during a sell-off? How much of the observed risk was structural, and how much was avoidable? How much of the observed risk was structural, and how much was avoidable? Causal models turn these questions into measurable answers. They allow risk managers to move from intuition to evidence, from correlation to attribution. Closing Thoughts This entire workflow, from raw data to counterfactual reasoning, was never meant to predict the next crash or rally. It was meant to understand why portfolios behave the way they do when markets turn unstable. By combining fundamental traits, stress windows, and causal estimators, we built a framework that explains structural drivers of risk rather than chasing signals. Each piece of code served a purpose: defining clean treatments, balancing controls, estimating effects, and validating the logic through overlap and weighting. Causal inference will not replace forecasting or technical analysis. But it adds a new dimension. The one focused on understanding rather than prediction. When markets fall, it helps separate what was inevitable from what was amplified by exposure to specific traits. Markets don’t just move randomly. They react to structure, and causal AI helps us see that structure clearly.