Saturday, January 5, 2013

What are Combiners in Hadoop ?

Combiners are one of those things every Hadoop developer wants to use but seldom knows when and how to use it. Combiners are basically mini-reducers. They essentially lessen the workload which is passed on further to the reducers. Your mapper may be emitting more than one record per key and they would ultimately be aggregated and passed as a single call to reducer method. So, if these records per key can be combined even before passing them to reducers then amount of data which will be shuffled across the network in order to get it to the right reducer will be reduced and ultimately enhancing our job's performance. Also the sorting in the reduce phase will be quicker. Data flow among mapper, combiner and reducer is shown by a super-simple diagram below: Image
 How to use? A combiner is nothing but 100% same as a reducer implementation. If you are writing in Java then it will be a class which must either extend org.apache.hadoop.mapreduce.Reducer while using the new API or implement org.apache.hadoop.mapred.Reducer while using the older API and override/implement the reduce() method. The standard convention is to use the reducer itself as the combiner, but it may not be desired in every scenario and you may write another class doing some mapper-wide aggregation which may be different from what you are going to perform in your reducers.

When to use? As the name itself suggests combiners should only be used when there is any possibility to combine. Generally, it shall be applied on the functions that are commutative(a.b = b.a) and associative {a.(b.c) = (a.b).c} . But this is just for caution, there is no hard and fast rule that it has to be commutative and associative. Combiners may operate only on a subset of your keys and values or may not execute at all. So if there are very less amount of duplicate keys in your mapper output then at times using combiners may backfire and instead become a useless burden. So use combiners only when there are enough scope of combining.
Quoting from Chuck Lam's 'Hadoop in Action': "A combiner doesn't necessarily improve performance. You should monitor the job's behavior to see if the number of records outputted by the combiner is meaningfully less than the number of records going in. The reduction must justify the extra execution time of running a combiner. "
So, go ahead and see if you can optimize your hadoop job using a combiner or not and do share your thoughts/inputs below. Cheers.

46 comments:

  1. You have certainly explained that Big data analytics is the process of examining big data to uncover hidden patterns, unknown correlations and other useful information that can be used to make better decisions..The big data analytics is the major part to be understood regarding Hadoop Training in Chennai program. Via your quality content i get to know about that in deep. Thanks for sharing this here.

    ReplyDelete
  2. Hi admin thanks for sharing informative article on hadoop technology. In coming years, hadoop and big data handling is going to be future of computing world. This field offer huge career prospects for talented professionals. Thus, taking Hadoop Training in Chennai will help you to enter big data technology.

    ReplyDelete
  3. Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training in Chennai | Big Data Training Chennai | Big Data Training

    ReplyDelete
  4. very nice information it will be very useful for my career
    careerbix

    ReplyDelete
  5. I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

    Software testing training in chennai | Software testing training | Software testing course chennai

    ReplyDelete
  6. The expansion of internet and other business intelligence leads to large volume of data. Industries are looking for talented professionals to maintain and process huge volume of data with latest tools available in the market. Taking Hadoop Training in Chennai | Big Data Training in Chennai will ensure better career prospects for talented professionals.

    ReplyDelete
  7. SAS stands for statistical analysis system which is a analysis tool developed by SAS institute and with the help of this tool data driven decisions can be taken which is helpful for the bsuiness.
    SAS training in Chennai | SAS course in Chennai | SAS training institute in Chennai

    ReplyDelete
  8. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    cloud computing training in chennai | cloud computing courses in chennai

    ReplyDelete
  9. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Python Training in Chennai | Python Course in Chennai

    ReplyDelete
  10. Thanks for splitting your comprehension with us. It’s really useful to me & I hope it helps the people who in need of this vital information.
    Regards,
    JAVA Training in Chennai|Best JAVA Training in Chennai|JAVA Training

    ReplyDelete
  11. Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    salesforce training in chennai | salesforce training institute in chennai

    ReplyDelete
  12. Thanks for the informative article. This is one of the best resources I have found in quite some time. Nicely written and great info. I really cannot thank you enough for sharing.
    Regards,
    SAS Training in Chennai | SAS Training in Velachery | SAS course in Velachery

    ReplyDelete

  13. Extraordinary data, I like this sort of online journal data truly extremely pleasant and more I can without much of stretch new aptitudes are create in the wake of perusing that post.
    Regards,
    SAS Training Institute in Chennai | SAS Training in Chennai | SAS Courses in Chennai

    ReplyDelete
  14. Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
    Salesforce Training in Chennai | Salesforce Training Institute in Chennai

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. Thank you very much. The post was lengthy, but I loved to read it till the last word. It was so nice blog and useful to Informatics learners.
    Oracle Fusion HCM Technical Training

    ReplyDelete
  17. I have to voice my passion for your kindness giving support to those people that should have guidance on this important matter.
    hadoop training in bangalore
    big data training in chennai

    ReplyDelete
  18. Thanks for your marvelous posting! It is very useful and good. Come on. I want to introduce an get app installs, I try it and I feel it is so good to rank app to top in app store search results, have you ever heard it?

    ReplyDelete
  19. I wish to show thanks to you just for bailing me out of this particular
    trouble.As a result of checking through the net and meeting
    techniques that were not productive, I thought my life was done.

    java training in chennai

    ReplyDelete
  20. You made some good points there. I did a search on the topic and found most people will agree with your blog.
    Final Year Project Center in Chennai | Final Year Project Center in Velachery

    ReplyDelete
  21. best information its very useful to read your blog. We provide best Digital Transformation Services

    ReplyDelete
  22. Very nice information and explanation about Hadoop. Thanks for sharing. keep on updating such a nice information.

    NO.1 APP DEVELOPMENT SERVICES | MASSIL TECHNOLOGIES

    ReplyDelete
  23. REALLY VERY EXCELLENT INFORMATION. I AM VERY GLAD TO SEE YOUR BLOG FOR THIS INFORMATION. THANKS FOR SHARING. KEEP UPDATING.

    NO.1 SYSTEM INTEGRATION SERVICES | SYSTEM INTEGRATION MIDDLEWARE | MASSIL TECHNOLOGIES

    ReplyDelete
  24. Great information with excellent explanation about Combiners and how to use in Hadoop. Really very helpful to me. Thanks for sharing.

    NO.1 CLOUD SERVICES | Oracle Cloud PAAS | MASSIL TECHNOLOGIES

    ReplyDelete
  25. Those guidelines additionally worked to become a good way to
    recognize that other people online have the identical fervor like mine
    to grasp great deal more around this condition.

    white label website builder

    ReplyDelete
  26. Simply wish to say your article is as astonishing. The clarity in your post is simply great, and I could assume you are an expert on this subject. Well with your permission let me grab your RSS feed to keep updated with forthcoming post. Thanks a million and please keep up the gratifying work.

    Devops Training in Chennai

    ReplyDelete
  27. Nice blog with excellent information. Thank you, keep sharing.

    Join in Avinash College Of Commerce for Best career in commerce

    ReplyDelete
  28. Excellent information you made in this blog, very helpful information. Thanks for sharing.

    Software Testing | Austere Technology

    ReplyDelete
  29. Great article, really very helpful content you made. Thank you, keep sharing.

    chartered accountant | Avinash college of commerce

    ReplyDelete
  30. Hi Thanks for the nice information its very useful to read your blog. We provide best Block Chain Services

    ReplyDelete
  31. Thank you for sharing this valuable information. But get out of this busy life and find some peace with a beautiful trip book Andaman family tour packages

    ReplyDelete
  32. Thank you for sharing this valuable information. But get out this busy life and find some peace with a beautiful trip. book ANDAMAN BUDGET PACKAGES @ 4999/-

    ReplyDelete
  33. Thank you for sharing this valuable information. But get out this busy life and find some peace with a beautiful trip. book Andaman Tourism

    ReplyDelete
  34. Thank you for sharing this valuable information. But get out this busy life and find some peace with a beautiful trip. book Best Travel Agency In India

    ReplyDelete
  35. Hi Thanks for the nice information its very useful to read your blog. We provide best Find All Isfs Courses

    ReplyDelete
  36. Hi Thanks for the nice information its very useful to read your blog. We provide best Massil Technologies

    ReplyDelete

Any feedback, good or bad is most welcome.

Name

Email *

Message *