Friday, December 28, 2012

EMR Streaming job using Java code for mapper and reducer [Creatingcustom jar]

Here is a basic sample of how to create a custom jar for an EMR streaming job.

Let's assume that the mapper code needs to reads from a csv file (which will be read into EMR's distributed cache) as well as it reads from the input s3 bucket which also has some csv files, does some calculations and prints a csv output lines to standard output.
There will be one Main class which would contain one implementation each of the following classes:

org.apache.hadoop.mapreduce.Mapper;
org.apache.hadoop.mapreduce.Reducer;


Each of these have to override methods map() and reduce() to do the desired job.

The Java class for Mapper would look like following:

public class SomeJob extends Configured implements Tool {

private static final String JOB_NAME = "My Job";

/**
* This is Mapper.
*/
public static class MapJob extends Mapper {

private Text outputKey = new Text();
private Text outputValue = new Text();

@Override
protected void setup(Context context) throws IOException, InterruptedException {

// Get the cached file
Path file = DistributedCache.getLocalCacheFiles(context.getConfiguration())[0];

File fileObject = new File (file.toString());
// Do whatever required with file data
}

@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
outputKey.set("Some key calculated or derived");
outputVey.set("Some Value calculated or derived");
context.write(outputKey, outputValue);
}
}

/**
* This is Reducer.
*/
public static class ReduceJob extends Reducer {

private Text outputKey = new Text();
private Text outputValue = new Text();

@Override
protected void reduce(Text key, Iterable values, Context context) throws IOException,
InterruptedException {
outputKey.set("Some key calculated or derived");
outputVey.set("Some Value calculated or derived");
context.write(outputKey, outputValue);
}
}

@Override
public int run(String[] args) throws Exception {

try {
Configuration conf = getConf();
DistributedCache.addCacheFile(new URI(args[2]), conf);
Job job = new Job(conf);

job.setJarByClass(TaxonomyOverviewReportingStepOne.class);
job.setJobName(JOB_NAME);

job.setMapperClass(MapJob.class);
job.setReducerClass(ReduceJob.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[1]));

boolean success = job.waitForCompletion(true);
return success ? 0 : 1;
} catch (Exception e) {
e.printStackTrace();
return 1;
}

}

public static void main(String[] args) throws Exception {

if (args.length < 3) {
System.out
.println("Usage: SomeJob  ");
System.exit(-1);
}

int result = ToolRunner.run(new TaxonomyOverviewReportingStepOne(), args);
System.exit(result);
}

}

Now in order to spawn the cluster the command should look like:

ruby elastic-mapreduce --create --alive --plain-output --master-instance-type m1.xlarge --slave-instance-type m1.xlarge --num-instances 11 --name "Java Pipeline" --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "--mapred-config-file, s3://com.versata.emr/conf/mapred-site-tuned.xml"

This command should return a job ID, which shall be used in order to add steps to be executed in orderly fashion by the cluster in distributed fashion.

To add Job Steps:

Step 1:

ruby elastic-mapreduce --jobflow --jar s3://somepath/job-one.jar --arg s3://somepath/input-one --arg s3://somepath/output-one --args -m,mapred.min.split.size=52880 -m,mapred.task.timeout=0
Step2:

ruby elastic-mapreduce --jobflow --jar s3://somepath/job-two.jar --arg s3://somepath/output-one --arg s3://somepath/output-two --args -m,mapred.min.split.size=52880 -m,mapred.task.timeout=0

53 comments:

 1. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

  Big Data Training | Big Data Course in Chennai

  ReplyDelete
  Replies
  1. với phương pháp này , thần âm hệ sẽ gặp rất nhiều nguy hiểm , nếu ‘đoản chiến’ thì sẽ có một cơ hội mong manh nhưng khi chọn ‘quần lang’ , một long kỵ binh sẽ có thể dễ dàng hạ gục một thần âm sư yếu ớt , nói không chừng bên trọng kỵ binh hệ chỉ cần phái ra một người cũng đủ để hạ gục toàn bộ thành viên của thần âm hệ .
   Trên đài , Phất Cách Sâm nở ra một nụ cười , gật đầu tán thưởng : “ Quả là một phương án lựa chọn tối ưu , tránh được sự phối hợp của các long kỵ binh . Có lẽ lần này Diệp Âm Trúc sẽ lại đem cho chúng ta một kỳ tích đây “ . Thật ra đại đa số các học viên trong Mễ lan học viện đều không biết , trọng kỵ binh lợi hại nhất là ‘đoản chiến’ .
   “ Hải Dương học tả , chúng ta thật sự không cần trợ giúp Âm Trúc sao ? “ . Lam Hi có chút lo lắng , thấp giọng hỏi .
   Hải Dương liếc mắt nhìn nàng đầy ngụ ý : “ Ngươi cho rằng chúng tađồng tâm
   game mu
   cho thuê nhà trọ
   cho thuê phòng trọ
   nhac san cuc manh
   số điện thoại tư vấn pháp luật miễn phí
   văn phòng luật
   tổng đài tư vấn pháp luật
   dịch vụ thành lập công ty trọn gói
   http://we-cooking.com/
   chém gió có thể hỗ trợ được cho hắn sao ? “ .
   Ngẩng đầu lên nhìn thân hình to lớn của những long kỵ binh , sắc mặt Lam Hi trở nên trắng bệch , khe khẽ lắc đầu . Lúc này sự kiêu ngạo của Khổng Tước cũng đã biến mất , cúi đầu không nói năng gì .
   Trước khi trận đấu bắt đầu , Diệp Âm Trúc đã xác định chọn phương án ‘quần lang’ , nhưng

   Delete
  2. I have read your blog its very attractive and impressive. I like it your blog.


   JavaEE Training in Chennai JavaEE Training in Chennai

   Java Training in Chennai Core Java Training in Chennai Core Java Training in Chennai

   Java Online Training Java Online Training Core Java 8 Training in Chennai Java 8 Training in Chennai

   Delete
  3. JMS Training Institutes in Chennai JMS Training Institutes in Chennai | JSP Training Institutes in Chennai | Spring Training Institutes in Chennai Spring Training Institutes in ChennaiMicroServices Training Institutes In Chennai Java MicroServices Training Institutes In Chennai
   Java EE Training Institutes in Chennai Java EE Training Institutes in Chennai

   Delete
  4. Java Online Training Java Online Training Java Online Training Java Online Training Java Online Training Java Online Training

   Hibernate Online Training Hibernate Online Training Spring Online Training Spring Online Training Spring Batch Training Online Spring Batch Training Online

   Delete
 2. Excellent post!!! Your article helped to under the future of java development. Being an open source platform, java is integrated in most of the software development industries to create rich featured applications. J2EE Training in Chennai | JAVA Training in Chennai

  ReplyDelete
 3. Upgrading ourselves to the upcoming technology is the best way to survive in this modern and fast paced technology world. Reading contents like this will create a positive impact within me. Thanks for writing such a valuable content. Keep up this work.

  JAVA Training in Chennai | JAVA Training Chennai | JAVA J2EE Training in Chennai | J2EE Training in Chennai

  ReplyDelete
 4. Best woo sms for your Online shopping store.....

  ReplyDelete
 5. Thank you intended for Pleasant and Educational Publish blog management

  ReplyDelete
 6. • I love all the posts, I really enjoyed, I would like more information about this, because it is very nice., Thanks for sharing.

  qtp training in chennai

  ReplyDelete
 7. Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

  iOS Training in Chennai

  ReplyDelete
 8. Hi, you have given really informative post. Thanks for sharing this post to our vision. Learn Hadoop Online Training will helps you to reach your goal.Selenium Online Training

  ReplyDelete
 9. Great post! I am actually getting ready to across this information, It's very helpful for this blog.Also great with all of the valuable information
  Selenium Training in Chennai
  Selenium Course in Chennai

  ReplyDelete
 10. You have done really great job. Your blog is very unique and informative. Thanks. Devops Online Training | Data Science Online Training

  ReplyDelete
 11. Grateful informative blog posting article! Selenium Training Institute in Chennai I'm read this information, It's my first command of this blog sites. We share very great knowledgeable information post here. Selenium Training in Chennai | Selenium Course in Chennai

  ReplyDelete
 12. How to handle if CSV file size is very huge? is it take care by system itself? Big data can help to process. Big data and Hadoop training in Chennai

  Android Training in Chennai


  ReplyDelete
 13. Thanks for sharing the EMR Streaming ........ importance.I get more knowledge in your blog.keep in blogging.i am waiting for your next blog............ Selenium Training in Chennai
  Dot Net Training in Chennai
  Android Training in Chennai
  Hadoop Training in Chennai

  ReplyDelete
 14. Thanks for sharing your valuable ideas on EMR Streaming, it is very useful.
  keep rocks.
  Android Training in chennai | Best Android Training in velachery

  ReplyDelete
 15. I wish to show thanks to you just for bailing me out of this particular
  trouble.As a result of checking through the net and meeting
  techniques that were not productive, I thought my life was done.


  java training in chennai

  ReplyDelete
 16. It was so good to read and useful to improve my knowledge as updated one.Thanks to Sharing.

  Informatica Online Training|ETL Testing Online Training|Hadoop online Training

  ReplyDelete
 17. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
  aws training in Chennai

  ReplyDelete
 18. Excellent information with unique content and it is very useful to know about the information based on blogs...Embedded Project Center in Chennai | Embedded Project Center in Velachery

  ReplyDelete
 19. Very informative blog.Thanks for sharing such good information and keep on updating..Embedded Project Center in Chennai | Embedded Project Center in Velachery

  ReplyDelete
 20. Very good informative article. Thanks for sharing such nice article, keep on up dating such good articles.
  VMware Exam Centers in Chennai | VMware Exam Centers in Velachery

  ReplyDelete
 21. Good and nice blog post, thanks for sharing your information.. it is very useful to me.. keep rocks and updating.
  Citrix Exams in Chennai | Xenapp exam center in Chennai

  ReplyDelete
 22. Nice..You have clearly explained about it ...Its very useful for me to know about new things..Keep on blogging..
  VMware Exam Centers in Chennai | VMware Exam Centers in Velachery

  ReplyDelete
 23. Interesting information and attractive.This blog is really rocking... Yes, the post is very interesting and I really like it.I never seen articles like this. I meant it's so knowledgeable, informative, and good looking site. I appreciate your hard work. Good job.
  Kindly visit us @
  Sathya Online Shopping
  Online AC Price | Air Conditioner Online | AC Offers Online | AC Online Shopping
  Inverter AC | Best Inverter AC | Inverter Split AC
  Buy Split AC Online | Best Split AC | Split AC Online
  LED TV Sale | Buy LED TV Online | Smart LED TV | LED TV Price
  Laptop Price | Laptops for Sale | Buy Laptop | Buy Laptop Online
  Full HD TV Price | LED HD TV Price
  Buy Ultra HD TV | Buy Ultra HD TV Online
  Buy Mobile Online | Buy Smartphone Online in India

  ReplyDelete
 24. The article is very interesting and very understood to be read, may be useful for the people. I wanted to thank you for this great read!! I definitely enjoyed every little bit of it. I have to bookmarked to check out new stuff on your post. Thanks for sharing the information keep updating, looking forward for more posts..
  Kindly visit us @
  Madurai Travels
  Best Travels in Madurai
  Cabs in Madurai
  Tours and Travels in Madurai

  ReplyDelete
 25. Excellent Blog. I really want to admire the quality of this post. I like the way of your presentation of ideas, views and valuable content. No doubt you are doing great work. I’ll be waiting for your next post. Thanks .Keep it up! Kindly visit us @
  Christmas Gift Boxes | Wallet Box
  Perfume Box Manufacturer | Candle Packaging Boxes | Luxury Leather Box | Luxury Clothes Box | Luxury Cosmetics Box
  Shoe Box Manufacturer | Luxury Watch Box

  ReplyDelete
 26. Wow, what an awesome spot to spend hours and hours! It's beautiful and I'm also surprised that you had it all to yourselves!
  Kindly visit us @ Best HIV Treatment in India | Top HIV Hospital in India
  HIV AIDS Treatment in Mumbai | HIV Specialist in Bangalore
  HIV Positive Treatment in India | Medicine for AIDS in India

  ReplyDelete

Any feedback, good or bad is most welcome.

Name

Email *

Message *