Granger Causality

Testing Granger Causality in Gold Prices

Given two time series, $Y_t$ and $X_t$ , a Granger causality test asks whether past values $Y_{t-\gamma}$ improve predictions of $X_t$ . Despite the name, the test measures incremental predictive information rather than causal mechanism. This project tests whether that criterion is enough to select inputs for a useful gold-price model.

Source Code

The source code is on GitHub.

Preliminaries

The candidate series are gold prices, oil prices, unemployment, interest rates, and market capitalization. I first test their stationarity and cointegration, then run Granger tests against gold-price changes. Finally, I compare a VAR fitted only on selected variables with one fitted on the full stationary feature set. The COVID period remains in the data because the validation should include abrupt macroeconomic changes.

Exploratory Analysis

I join the series by date and inspect the missing values created by different publication schedules.

I forward-fill each series from its previous observation.

The aligned series can then be plotted on a common timeline.

Only unemployment appears plausibly stationary in levels. A VAR and Granger test require stationary inputs, so I use the augmented Dickey-Fuller test. Its null hypothesis is that a series has a unit root, while the alternative is stationarity.

At $\alpha = 0.05$ , only unemployment rejects the unit-root null in levels. For each remaining series, I find the minimum order of differencing $I(d)$ needed to reach covariance stationarity, testing both first and second differences.

After first differencing, the other series reject the unit-root null at $\alpha = 0.05$ . Unemployment remains in levels as an $I(0)$ series.

The next step tests cointegration among the $I(1)$ variables. Cointegrated series can share a stable long-run relationship even when each series is nonstationary on its own.

I use the Johansen test because it can estimate the cointegration rank across several time series. The test runs on the level data with one lag. Its null hypotheses concern the number of cointegrating relationships in the system.

The result suggests at least one long-run relationship among the level series. A Johansen test applies to the system as a whole, so it does not by itself identify oil as the variable responsible or establish a pairwise causal relationship.

Before the Granger tests, I inspect the distribution of each stationary series with the Anderson-Darling normality test.

Each test statistic exceeds its critical value, so the series reject the normality null at the reported levels. The plots below show the departures more directly.

The histograms are roughly bell-shaped, but skew and kurtosis differ from a normal distribution. I test both moments separately below.

High kurtosis means more probability mass in the center and tails relative to a normal distribution. In financial changes, it often reflects many small moves interrupted by rare large ones. That departure matters because a Gaussian VAR error assumption will understate tail behavior.

Granger Tests and Forecasting

I now test whether each stationary candidate series adds predictive information for first-differenced gold. The null hypothesis is that past values of $X_t$ do not improve predictions of $Y_t$ , where $Y_t$ is the change in gold price.

Unemployment rejects the null at $\alpha = 0.05$ for lags 1 and 2. First-differenced oil is borderline at the same lags. Interest-rate and market-capitalization changes fail to reject the null. In plain language, past unemployment adds predictive information for the next gold-price change in this sample. That result does not show that unemployment causes gold prices to move.

I fit a VAR and evaluate whether it predicts the direction of the next gold-price change. A rolling window produces 100 one-step-ahead forecasts, scored as correct or incorrect by sign.

The selected-variable model predicts the correct direction only 41% of the time, below a coin-flip baseline. The Granger-selected inputs therefore do not produce a useful directional model in this validation window.

If we utilize all of our stationary data rather than our Granger causal variables we get.

The full stationary feature set performs better despite including variables that failed the individual Granger tests. In this experiment, Granger significance alone is a poor feature-selection rule for out-of-sample directional forecasting.

References

Data Sources

https://fred.stlouisfed.org/series/