Sunday, March 31, 2013

How to ensure that one file is completely processed by the same mapper ?

There are times when we want one particular file to be read/processed by the same mapper. This requirement may arise in situations when you have a sequential data in each file and you want to process all the records of a file in exact same sequence they appear in the input file.

So, basically what we are asking here is that : please don'e split our input files and distribute it among different mappers. Simple.

It is even simpler to achieve this :

You have to create your own version of FileInputFormat and override isSplittable(), like this:

Class NonSplittableFileInputFormat extends FileInputFormat{

@Override 
public boolean isSplitable(FileSystem fs, Path filename){ 
return false; 
}
}

And then use the above class to setInputFormatClass().

No comments:

Post a Comment

Any feedback, good or bad is most welcome.

Name

Email *

Message *