Processing GDELT data using Hadoop InputFormat and SparkSQL

GDELT A quick overview of GDELT public data set: "GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organisations, counts, themes, sources, and events driving our global society every second of every day, creating a free open platform … Continue reading Processing GDELT data using Hadoop InputFormat and SparkSQL

Advertisements

Create a simple Spark job

MapReduce is dead, long live Spark ! Following big data new trends, the logical next step for me is to start getting my head around apache Spark. Yarn is yet another resource negotiator indeed, but definitely not yet another big data application... Although you can still execute MapReduce on Yarn, writing a Yarn application is … Continue reading Create a simple Spark job