What is a Time Series?
A time series is a series of data points ordered in time. Time series adds an explicit order dependence between observations: a time dimension. In a normal machine learning dataset, the dataset is a collection of observations that are treated equally when future is being predicted. In time series the order of observations provides a source of additional information that should be analyzed and used in the prediction process.
Time series are typically assumed to be generated at regularly spaced interval of time (e.g. daily temperature), and so are called regular time series. But the data in a time series doesn’t have to come in regular time intervals. In that case it is called irregular time series. In irregular time series the data follows a temporal sequence, but the measurements might not occur at a regular time intervals. For example, the data might be generated as a burst or with varying time intervals. Account deposits or withdrawals from an ATM are examples of an irregular time series. Time series can have one or more variables that change over time. If there is only one variable varying over time, we call it Univariate time series. If there is more than one variable it is called Multivariate time series. For example, a tri-axial accelerometer. There are three accelerations variables, one for each axis (x,y,z) and they vary simultaneously over time.
Some major purposes of the statistical analysis of time series are:
To understand the variability of the time series.
To identify the regular and irregular oscillations of the time series.
To describe the characteristics of these oscillations.
To understand the physical processes that give rise to each of these oscillations.
The Stationary and Markov Property
Stationary Property
In the most intuitive sense, stationarity means that the statistical properties of a process generating a time series do not change over time. It does not mean that the series does not change over time, just that the way it changes does not itself change over time. The algebraic equivalent is thus a linear function, perhaps, and not a constant one; the value of a linear function changes as 𝒙 grows, but the way it changes remains constant — it has a constant slope; one value that captures that rate of change.
Markov Property
There exists some well-known families of random processes: gaussian processes, poisson processes, autoregressive models, moving-average models, Markov chains and others. These particular cases have, each, specific properties that allow us to better study and understand them.
“Markov property” makes the study of a random process much easier. In a very informal way, the Markov property says, for a random process, that if we know the value taken by the process at a given time, we won’t get any additional information about the future behaviour of the process by gathering more knowledge about the past. Stated in slightly more mathematical terms, for any given time, the conditional distribution of future states of the process given present and past states depends only on the present state and not at all on the past states (memory less property). A random process with the Markov property is called Markov process.
Autocovariance and Autocorrelation functions
Autocovariance function
Given a stochastic process, the autocovariance is a function that gives the covariance of the process with itself at pairs of time points. Autocovariance is closely related to the autocorrelation of the process in question. Autocovariance is defined as the covariance between the present value with the previous value (xt-1) and the present value (xt) with (xt-2). And it is denoted as ϒ. Here Mean will not change if it is a stationary time series. So the formula becomes:
Autocorrelation function
In time series we will deal with variables w.r.t time like sales of a company over the years (predicting feature temperature, ozone level etc) . While predicting the feature sales of a company the past sales will impact more on the feature sales than the previous one. Then finding a correlation between the present(xt) and the previous sales(xt-1) and then with (xt) and (xt-2), (xt-3) etc… to find correlation in the same column we use autocorrelation.
Autocorrelation can be defined as a the correlation between itself and the other values of same variable(features) (in our case correlation between (Xt and Xt-1) (Xt and Xt-2). etc…) and it is denoted as ρ.
Autocorrelation function(ACF) of time series is defined as:
ARIMA
The Autoregressive Integrated Moving Average (ARIMA) model uses time-series data and statistical analysis to interpret the data and make future predictions. The ARIMA model aims to explain data by using time series data on its past values and uses linear regression to make predictions.
Understanding the ARIMA Model
The “AR” in ARIMA stands for autoregression, indicating that the model uses the dependent relationship between current data and its past values. In other words, it shows that the data is regressed on its past values.
This technique is similar to the moving average technique, except that the forecast of the next observation is based on a regression equation that uses past observations (and not past errors as with the MA model). Just like with MA-models, the order of the AR model also indicates how many previous observations are used for the forecast.
The “I” stands for integrated, which means that the data is stationary. Stationary data refers to time-series data that’s been made “stationary” by subtracting the observations from the previous values.
The “MA” stands for moving average model, indicating that the forecast or outcome of the model depends linearly on the past values. Also, it means that the errors in forecasting are linear functions of past errors. Note that the moving average models are different from statistical moving averages.
With this time-series technique, the next observation (i.e. forecast) is based on the weighted average of one or more past observations. Mathematically, in an MA model, a forecast will be based on a regression equation based on past errors (also called ‘noise’). An MA(1) model is a first order MA model in which the forecast is based on only the last error, in an MA(2) model it is based on the last two errors, and an an MA(n) model the forecast is based on the last n errors.
Each of the AR, I, and MA components are included in the model as a parameter. The parameters are assigned specific integer values that indicate the type of ARIMA model.
Fitting Time Series Model to Data
Similarly to fitting parametric distribution to data, it is also possible to fit parametric time-series models to historical data to create forecasts. The main difference is that when fitting a distribution to data, we assumed that they are randomly sampled from independent, identical distributions. In plain words, we assume the data points come from the same distribution and are not correlated. For example, if we have measurements of heights of 100 people, each of such measurements is assumed to be the measurement of a randomly selected individual from a single population distribution of heights. In contrast, time-series data is by nature sequential as the value in the next period is linked to that of previous periods. For example, the daily price of a commodity is highly dependent on its price in prior days. This type of dependency is called autocorrelation or serial correlation, and must be incorporated in a time-series fit.
GARCH models for measuring volatility
GARCH models describe financial markets in which volatility can change, becoming more volatile during periods of financial crises or world events and less volatile during periods of relative calm and steady economic growth.
Conclusion
Time series analysis is a must for every company to understand seasonality, cyclicality, trend and randomness in the sales and other attributes. In the coming blogs we will learn more on how to perform time series analysis with R, python and Hadoop.