Custom RecordReader – Processing String / Pattern delimited records

Now that both InputFormat and RecordReader are familiar concepts for you (if not, you can still refer to article Hadoop RecordReader and FileInputFormat), it is time to enter into the heart of the subject. The default implementation of TextInputFormat is based on a Line-by-Line approach. Each line found in data set will be supplied to MapReduce … Continue reading Custom RecordReader – Processing String / Pattern delimited records

Advertisements

WordCount with Custom Record Reader of TextInputFormat

Tutorials for Data Science , Machine Learning, AI & Big Data

In this hadoop tutorial we will have a look at the modification to our previous program wordcount with our own custom mapper and reducer by implementing a concept called as custom record reader. Before we attack the problem let us look at some theory required to understand the topic.

View original post 980 more words