Hadoop : A wordcount without explicit mapper/reducer

So I was trying out my first hadoop program and I was little wary of writing mapper and reducer. But I still wanted to write the program to give me the word count for all words in the input files

So I wrote a driver program of hadoop with map class as a TokenCounterMapper Class. This class is provided by hadoop and it tokenizes the input text and emits each word with count 1.

Just the recipe that I ordered …

Now I needed a reducer which could actually count .. So I used IntSumReducer class which sums the values in the input list of the reducer and outputs to context.

Bingo and the program does what it is supposed to do . Counting words…

Here is the listing…

public class wordcount extends Configured implements Tool{
  public static void main(String[] args) throws Exception {
    Configuration configuration = new Configuration();
    ToolRunner.run(configuration, new wordcount(),args);
  }

  @Override
  public int run(String[] arg0) throws Exception {
    Job job = new Job();
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    job.setJarByClass(wordcount.class);
    job.setMapperClass(TokenCounterMapper.class);
    job.setReducerClass(IntSumReducer.class);
    FileInputFormat.addInputPath(job, new Path(arg0[0]));
    FileOutputFormat.setOutputPath(job,new Path(arg0[1]));
    job.submit();
    int rc = (job.waitForCompletion(true)?1:0);
    return rc;
  }
}
Advertisements

3 thoughts on “Hadoop : A wordcount without explicit mapper/reducer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s