Now that both InputFormat and RecordReader are familiar concepts for you (if not, you can still refer to article Hadoop RecordReader and FileInputFormat), it is time to enter into the heart of the subject. The default implementation of TextInputFormat is based on a Line-by-Line approach. Each line found in data set will be supplied to MapReduce … Continue reading Custom RecordReader – Processing String / Pattern delimited records
In this hadoop tutorial we will have a look at the modification to our previous program wordcount with our own custom mapper and reducer by implementing a concept called as custom record reader. Before we attack the problem let us look at some theory required to understand the topic.
View original post 980 more words
Today's new challenge... I want to create a custom MapReduce job that can handle more than 1 single line at a time. Actually, it took me some time to understand the implementation of default LineRecordReader class, not because of its implementation Vs. my Java skill set, but rather that I was not familiar with its … Continue reading RecordReader and FileInputFormat