Introducing ML Registry – v1.0

In the context of data science, we tend to see governance as a tedious activity any large scale organisation has to comply with. It is not rare for data scientists to see governance as a real blocker to innovation.  Nothing could be farther from the truth. Passionate data scientists only have one goal in mind, driving … Continue reading Introducing ML Registry – v1.0

Releasing gdelt-spark v2.0

Version 2.0 updates A couple of months ago I released the very first version of Gdelt Spark, my pet project to integrate Spark with the GDELT universe. See previous blog post - releasing-gdelt-spark-v1-0. Today, I am proud to release v2.0 that allows spark developers and scientists to download GDELT text content as well as article … Continue reading Releasing gdelt-spark v2.0

Connect Tableau Desktop to SparkSQL

Last (but not least) post of 2014, and a new Hacking challenge. Based on the work I've done on SQLDeveloper (, I was wondering how to connect Tableau Desktop to my SparkSQL cluster. Install Tableau Desktop I'm quite new to Tableau, but it's worth giving a try. However, spending $999 for a challenge isn't worth it, … Continue reading Connect Tableau Desktop to SparkSQL

Processing GDELT data using Hadoop InputFormat and SparkSQL

GDELT A quick overview of GDELT public data set: "GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organisations, counts, themes, sources, and events driving our global society every second of every day, creating a free open platform … Continue reading Processing GDELT data using Hadoop InputFormat and SparkSQL