Time Series Forecasting

Origins and Unfolding

Achintya Gupta
5 min readOct 14, 2019

Part 1 : The History & Basic Intuition

Time. An indefinite continued progress of existence and events that occur in apparently irreversible succesion from the past , through the present and into the future. Time is a component quantity of various measurements used to sequence events…

Time is a pretty interesting dimension, irrespective of who you are, it defines you, your every action. Number of completed tasks in your To-Do list can also constitute to a time series, which you never bothered to keep , Once Upon A Time!

I will be writing a series of articles, of which this is Part 1, on Time Series Forecasting to delve deep into the core concepts of the subject starting from the inception till modelling. I will be presenting my views in as layman fashion as possible. If you feel I went wrong somewhere, please feel free to leave a comment. I would be happy to consider your opinion.

²
Hellenistic Culture : The earliest forms of Time Series Observations can be seen in the field of Astronomy where Greeks inherited astronomical records from the Babylonians and utilised the data to construct a cosmological framework.

Time has been there since…. god knows since when, but it has always been there, and people with scientific urge have leveraged it to scribble observations against it — for their experiments.

Tycho Brahe (1546–1601) was a famous Danish Astronomer who dedicated himself to obtaining astronomical observations of unprecidented accuracy. Johannes Kepler (1571–1630) used his observations of Mars to uncover the “true” nature of the solar system.

Few hundred years later came, Tycho Brahe [1580’s], also called as Astronomy First True Observer. He built a danish observatory, where from he measured positions of the planets and stars.

These are the actual recordings[some of them, the entire visualisation that i made can be found in my repo] by Tycho Brahe, spanning [1582.12.11–1600.03.16]. Although most of the data is missing, but the accuracy to which this data was recorded matches closely to the calculations made using modern day techniques. One of the earliest TIME SERIES.. and beautiful, not just the graph, but the fact that it was one of those tools that helped our civilisation realise that earth is round, and Sun at the centre, not Earth .…[This GIF was made in python using this data]

This field has since evolved significantly, notable advances in Time Series Analysis were made by G.U Yule in early 1920’s, such as Yule’s Coefficient of Association, Yules Paradox [later to be known as Simpsons Paradoxthere can be opposite conclusions derived from the same data depending on how the data was divided in the first place → Causality Consideration], among many others.

G.U. Yule was offered “demonstrator” post by Karl Pearson in University College, London in 1892. Karl Pearson is the person behind χ² test [Chi Square Test] and Pearson correlation, which we come across in later articles to come.

Herman Wold is well known for the “Wold decomposition” separating stationary time series into a deterministic component and random component, which came to light in 1938.

“An Introduction to the Theory of Statistics” — The eleventh edition was the first to be jointly undertaken by G.U. Yule and M.G. Kendall. Kendall Correlation is named after him.

By 1970’s, “Time Series Analysis” by G. E. P. Box and G. M. Jenkins came out, containing the full modeling procedure for individual series: specification, estimation, diagnostics and forecasting. Then as time progressed, came Multivariate ARMA models, VAR, ARIMA, SARIMA etc. Today even machine learning is being used to perform time series forecasting.

If some of the technical terms above were confusing, it completely fine, we go through each one in detail and much more, in further articles to come.

Time Series Analysis can prove to be quite an important weapon in your data science arsenal. If I were to summarise what Time Series Analysis really means → Giving meaning to or Deriving meaning from a sequential data, recorded against time, at specific intervals [Frequency] and effect of one time series on the other, in a nutshell.

Now, lets explore the depth of your imagination, shall we? Imagine..

Garry Jordan’s Grocery Shop @ Aalborg Town

“Garry Jordan, a grocery shop owner in Aalborg Town , wants you to assist him increase his profits. You have to build a predictive model that could inform him of how much the demand would be for each product that he sells in his shop. So that he could manage his supply chain and discounts accordingly. He has 10 years of recorded data of the retail sales of his product from Jan’2009 to Dec’2018.”

Don’t worry about the discounts and supply chain part, just predict the demand for now...

For this example an artificial data was curated, notebook of which is here. If you don’t understand parts of it, its quite all right, i will be covering them in detail in articles to come.

Context Setting : It is always beneficial to understand the context behind the any modelling procedure. For this example let me set the context..

Evolving Aalborg Town [Credits@Background Image]

Aalborg town used to be a rural environment, in 2009 government decided to invest in it to make it an Urban Tech Centre. Seeing the investment growth and the large scale opportunities people from nearby town started migrating to Aalborg, increasing the population of the town over the years [see the blue dotted line in the above plot w.r.t time], in tandem increasing the retail sales as well → For now associate this phenomenon with Trend. As the population increased, the retail sales increased.

Aalborg is a lively town, where numerous festivals are celebrated, especially in the month of January , April, October & December.

During these seasons the retails sales go through the roof → Associate this with Seasonality. Whenever there is a festival, people will shop more. Obvious, right?

For ex in 2014,

Festival months : Try to see the bumps in sales corresponding to these months.

Right there, you now have two variables

  1. ) Population → Having a high correlation with the Retail Sales Trend
  2. ) Number of Festivals in a month → Having a high correlation with the Seasonality of the Retail Sales.

Here our Target Variable is Retail Sales, and External Regressors (or) Features are Population & Number Of Festivals.

Note that the features though might not be highly correlated with the actual target variable’s time series, can have a high correlation with a Component of the Target Variables time series → Yes.

This leads to a part of my next article on the subject of “Exploratory Data Analysis & Decomposition Of a Time Series” i.e a Time Series can be divided into its fundamental components → Seasonality, Trend and Noise and in some cases Cyclicity as well. If these terms don’t make much sense now, it’s alright, i will be covering them in much detail in the next article, with much more.

This was entirely theoretical, i know. I wanted to set the context and give a brief overview of the subject. but from next article on, it would be programming intensive — Python & R. So get ready to get your hands dirty..

Please leave your thoughts in a comment below. Would love to read your feedback 😃

--

--

Achintya Gupta

Someone who romanticises about the origins, the history. Teaching Enthusiast. Firmly believe in AI for social good.