by Samuel Forster, Product Manager & Daniel Leveau, VP Investor Solutions
Volatility is a central tenet of financial markets, impacting a wide range of investors’ daily activities, including risk management, portfolio construction and option pricing. To improve their investment decisions, investors are spending considerable time and effort on finding new ways to accurately measure and estimate volatility. Incorporating intraday data in your quantitative analysis will enable you to extract more relevant information.
This blog post explores the advantages of applying intraday data to volatility forecasts.
Why intraday data?
Only a fraction of the huge amount of information produced in financial markets will inform investment decisions. When assessing and estimating risk, endofday data is the standard. However, with the objective of extracting a maximal amount of information from available price movements, investors are increasingly using intraday data. This is expected to increase the accuracy of estimates and, in turn, produce better longterm investment results.
To illustrate the impact of various data frequencies, the below graph displays a snapshot of price movements for the March 2022 S&P 500 futures contract with observation frequencies from endofday to 1minute bars. 10minute snapshots will result in 138 data points^{[1]} per day compared to 1 data point per day for endofday close prices. The green area in the graph indicates when the underlying US cash equity market is open^{[2]}, and the red area indicates the hour when the S&P 500 futures contract is closed^{[3]} for trading.
Intraday distribution of volatility
To further emphasize the point that relevant information might be missed when focusing solely on endofday data, we have broken down volatility at intraday intervals. The graph below displays the proportion of the realized variance of S&P 500 futures contracts over the previous 15 years attributable to each 30minute interval during the day.
The graph shows that the variance changes throughout the trading day, with the largest price movements around the underlying cash market open and close. Of note is that 37% of the variance occurs outside of official US stock market trading hours. This is partly attributable to many price moving events occurring outside of official trading hours, with a larger proportion of volatility arising in the hours prior to the US market open than in the hours following its close. This could be explained by trading activity in Europe and Asia.
What is the true measure of volatility?
Before presenting various estimation methods and measuring their respective accuracies, we first need to define precisely what is meant by volatility.
Following the work of researchers such as Andersen and Bollersev^{[4]}, we assume that the log returns of assets can be described as an expression of the form: r_{t }= σ_{t}z_{t}, where σ_{t }is the unobservable underlying asset volatility that varies with time and z_{t} is a source of random noise. We will forecast σ_{t}.
A trading day can be split into time intervals: 1 hour, 5 minutes, etc. Log returns can be calculated over these intervals. For example, when considering the 30minute interval we calculate the returns from 9:009:30, 9:3010:00, and so on. This framework shows that as the frequency at which returns are sampled increases, assuming certain conditions hold, the sum of squared returns converges to the realized variance. In symbols, letting r_{t,k,N} denote the log return over bucket k on day t, which has been split into N buckets, we have:
It follows that a good proxy for the unobservable volatility of an asset can be obtained by considering returns over highly refined time buckets. Two guiding principles are:

Higher frequency returns should provide better estimates

Returns should not display strong autocorrelation
Principle 1 suggests that the obvious choice of a proxy for realized volatility would be 1 minute returns as this is the highest frequency used in our analysis. However, due to the ‘bidask bounce’ – where trade prices oscillate between fixed bid and ask levels – there is considerable negative autocorrelation at 1minute frequency. Thus, violating principle 2.
How do these two principles hold up in our example? The below chart shows the lag 1 autocorrelation for S&P 500, 10Y Treasury Note and WTI Crude Oil futures. In each case, the magnitude of the autocorrelation is higher at the 1minute frequency than at the 5minute frequency, and considerably so for the 10Y Treasury Note futures. Based on these observations, we define the volatility based on 5minute intervals as the proxy for the true measure of volatility. Therefore we have chosen the 5minute frequency as reference point to measure the accuracy of the various volatility estimates.
Methodologies to estimate volatility
Below we present 14 volatility estimates and calculations based on the returns of the S&P 500 futures from 2007 to 2022. The methodologies can be split into three broad segments and seven specific categories.
How to best estimate volatility
When evaluating the various estimates we use the realized volatility as proxied by the one based on 5minute intervals. The below chart shows this realized volatility, annualized, over the full sample period. As expected, we see the largest spikes in volatility occurring during larger selloffs, such as in March 2020 and during the Global Financial Crisis in 2008.
To assess the accuracy of the various estimation methodologies, we make use of standard techniques; comparisons of both mean squared error and mean absolute errors, as well as regression models. Interestingly, the hierarchy of the results across the methods are very similar.
Below are the results for the mean absolute errors. The highfrequency intraday models all outperform the endofday and GARCH models by a considerable margin. The endofday methodologies ‘Constant’ and ‘Expanding Window’ are, as expected, rather poor. The EWMA improves upon simple rolling windows, GARCH(1, 1) offers a better estimation method and GJRGARCH further improves the accuracy.
The hierarchy of the results is almost exactly as would be predicted based on the assumptions that more sophisticated endofday models produce better estimates, and intraday models extract more information and hence produce better results.
Furthermore, the hierarchy of the intraday models also corresponds to the theory. The estimates of the higher frequency forecast are generally more accurate but the negative autocorrelation at the highest available frequency has an adverse impact on the results.
Frequency matters
What implications do these results have for investors? Many would agree that it is not prudent to proclaim any quantitative method to be universally superior. Any chosen approach needs to fit the investor’s specific situation and investment objective. However, our results show that the accuracy of volatility estimates can be improved by extracting information from intraday data. This in particular holds true when comparing the intraday results to traditional endofday methods.
However, making efficient use of intraday data is often easier said than done. Investors need to ensure both access to high quality and operationally ready data, as well as cuttingedge technological infrastructure to take full advantage of datadriven investment processes.
References
[1] ES futures are open 23 hours per day, meaning 6 x 23 = 138 data points at the 10m frequency
[2] Cash market hours are 9:3016:00 Eastern Time (ET) – called ‘America/New_York’ in the chart
[3] 17:0018:00 Eastern Time (ET)
[4] Various academic studies show that ignoring drift may lead to more accurate volatility estimates due to the inherent difficulty in estimating expected returns
This document is not, and should not be construed as financial advice or an invitation to purchase financial products. It is provided for information purposes only and is subject to the terms and conditions of our disclaimer which can be accessed at: https://www.sigtech.com/legal/generaldisclaimer