Time Series Forecasting
Origins and Unfolding
Part 1 : The History & Basic Intuition
Time. An indefinite continued progress of existence and events that occur in apparently irreversible succesion from the past , through the present and into the future. Time is a component quantity of various measurements used to sequence events…
Time is a pretty interesting dimension, irrespective of who you are, it defines you, your every action. Number of completed tasks in your To-Do list can also constitute to a time series, which you never bothered to keep , Once Upon A Time!
I will be writing a series of articles, of which this is Part 1, on Time Series Forecasting to delve deep into the core concepts of the subject starting from the inception till modelling. I will be presenting my views in as layman fashion as possible. If you feel I went wrong somewhere, please feel free to leave a comment. I would be happy to consider your opinion.
Time has been there since…. god knows since when, but it has always been there, and people with scientific urge have leveraged it to scribble observations against it — for their experiments.
Few hundred years later came, Tycho Brahe [1580’s], also called as Astronomy First True Observer. He built a danish observatory, where from he measured positions of the planets and stars.
This field has since evolved significantly, notable advances in Time Series Analysis were made by G.U Yule in early 1920’s, such as Yule’s Coefficient of Association, Yules Paradox [later to be known as Simpsons Paradox → there can be opposite conclusions derived from the same data depending on how the data was divided in the first place → Causality Consideration], among many others.
G.U. Yule was offered “demonstrator” post by Karl Pearson in University College, London in 1892. Karl Pearson is the person behind χ² test [Chi Square Test] and Pearson correlation, which we come across in later articles to come.
Herman Wold is well known for the “Wold decomposition” separating stationary time series into a deterministic component and random component, which came to light in 1938.
“An Introduction to the Theory of Statistics” — The eleventh edition was the first to be jointly undertaken by G.U. Yule and M.G. Kendall. Kendall Correlation is named after him.
By 1970’s, “Time Series Analysis” by G. E. P. Box and G. M. Jenkins came out, containing the full modeling procedure for individual series: specification, estimation, diagnostics and forecasting. Then as time progressed, came Multivariate ARMA models, VAR, ARIMA, SARIMA etc. Today even machine learning is being used to perform time series forecasting.
If some of the technical terms above were confusing, it completely fine, we go through each one in detail and much more, in further articles to come.
Time Series Analysis can prove to be quite an important weapon in your data science arsenal. If I were to summarise what Time Series Analysis really means → Giving meaning to or Deriving meaning from a sequential data, recorded against time, at specific intervals [Frequency] and effect of one time series on the other, in a nutshell.
Now, lets explore the depth of your imagination, shall we? Imagine..
“Garry Jordan, a grocery shop owner in Aalborg Town , wants you to assist him increase his profits. You have to build a predictive model that could inform him of how much the demand would be for each product that he sells in his shop. So that he could manage his supply chain and discounts accordingly. He has 10 years of recorded data of the retail sales of his product from Jan’2009 to Dec’2018.”
Don’t worry about the discounts and supply chain part, just predict the demand for now...
For this example an artificial data was curated, notebook of which is here. If you don’t understand parts of it, its quite all right, i will be covering them in detail in articles to come.
Context Setting : It is always beneficial to understand the context behind the any modelling procedure. For this example let me set the context..
Aalborg town used to be a rural environment, in 2009 government decided to invest in it to make it an Urban Tech Centre. Seeing the investment growth and the large scale opportunities people from nearby town started migrating to Aalborg, increasing the population of the town over the years [see the blue dotted line in the above plot w.r.t time], in tandem increasing the retail sales as well → For now associate this phenomenon with Trend. As the population increased, the retail sales increased.
Aalborg is a lively town, where numerous festivals are celebrated, especially in the month of January , April, October & December.
During these seasons the retails sales go through the roof → Associate this with Seasonality. Whenever there is a festival, people will shop more. Obvious, right?
For ex in 2014,
Right there, you now have two variables
- ) Population → Having a high correlation with the Retail Sales Trend
- ) Number of Festivals in a month → Having a high correlation with the Seasonality of the Retail Sales.
Here our Target Variable is Retail Sales, and External Regressors (or) Features are Population & Number Of Festivals.
Note that the features though might not be highly correlated with the actual target variable’s time series, can have a high correlation with a Component of the Target Variables time series → Yes.
This leads to a part of my next article on the subject of “Exploratory Data Analysis & Decomposition Of a Time Series” i.e a Time Series can be divided into its fundamental components → Seasonality, Trend and Noise and in some cases Cyclicity as well. If these terms don’t make much sense now, it’s alright, i will be covering them in much detail in the next article, with much more.
This was entirely theoretical, i know. I wanted to set the context and give a brief overview of the subject. but from next article on, it would be programming intensive — Python & R. So get ready to get your hands dirty..
Please leave your thoughts in a comment below. Would love to read your feedback 😃