Puzzling for the relevant data

The blogs of some of my colleagues have inspired me to put my thoughts in black and white. I am Jari Marijnissen, Data Scientist at Smart Profile and part of the Smart Profile Labs team. Today I would like to explain why I find data so interesting.

About nine years ago, the famous article was published in the Harvard Business Review, Data Scientist: The Sexiest Job of the 21st Century. At the time, I was still sitting with a Binas in my hands, sweating calculating the average speed of a vehicle to get my first SET in physics.

A few years later, I found out that I find data sexier than the speed of a Fiat Multipla. For many a young man, cars are of course much more interesting than some numbers and letters on your screen. So I can hear you thinking: “Jari, what makes data so fun? Nice of you to ask!

Puzzling

It all started during the second period of the second year of my HBO study in Computer Science. After almost one and a half years of coding, the period came when we started to analyse data. Something that I compare to a puzzle after this.

The well-known Contoso demo dataset was turned upside down to discover the undiscovered facts. First, the ‘known’ facts were visualised, the corners and edges of the puzzle, to subsequently use this information to find the unknown unknowns, the solid-coloured pieces of the puzzle.

In addition to the data from Contoso, insights based on external data were also requested. Think of it as an expansion set of 500 pieces. The 500 pieces gave us just the right insight, enabling us to extract more value from the data. Super cool!

External data

This extension set can be of all kinds of domains. Do you want to add weather data, for example, or demographic data to create a different insight? This in itself is either a quest or a, you guessed it, puzzle. Depending on the question, the internet is scoured in search of that one relevant dataset. But what I like at least as much is to pioneer a dataset that can unexpectedly add value to a project.

For example, I once came across an open-source (OpenAdresses) in which the geo-coordinates of various countries in the world are recorded. Including Taiwan and Chile, to name but a few exotic examples. This data could be used to find companies within a certain radius of a location, such as an organisation’s office. This way, the market of a certain office can be visualised.

Another nice aspect of the external data is the accessibility of this new data source. While you may occasionally find a relevant Excel or CSV sheet from Statistics Netherlands or Statbel, at other times, you find an API where you have to write a script to obtain the data. Again, it is a puzzle to get hold of the data. The source mentioned above is also an excellent example of this process. A file can be obtained from the website or retrieve a complete set via the API. Several roads lead to Rome.

For me, the puzzle of finding the relevant data and unlocking this data is one of the aspects that makes working with data so much fun. One of multiple, indeed. There are so many other cool things you can do and create with data. How about a graph of the speed of a Fiat Multipla.

Did you like this blog of Jari? Connect with him on LinkedIn, or subscribe to the newsletter to stay updated with the latest news.