Sunday, April 28, 2013

Reduce EMR costs : use spot instances and yes, without risk!

As aws states, Spot Instances can significantly lower your computing costs for time-flexible, interruption-tolerant tasks. But, hadoop jobs being run on EMR aren't generally interruption-tolerant. So, how do we use spot instances and still do not lose our clusters while being in between of a running task?

The answer lies in not having all the machines in a cluster to be spot-instances : some on-demand and some spot instances.

Now, how to achieve this?

There are 3 essential instance groups into which the nodes launched as a part of cluster falls into:
  1. master instance group : This is a must have with a single master node
  2. core instance group : If there are any slave node then there must be at least one node in this instance group. Nodes in this group act as both Tasknode and Datanode.
  3. task instance group : If a cluster has core instance group, it can also have a task instance group containing one or more Tasknodes. They do not have Datanodes (do not contain HDFS).

You can choose to use either on-demand or spot-instances for each of your job flows. This is valid for all of the above instance groups. However, from the definition above if you lose a master or core machine then your job is bound to fail. Theoretically, you can have something like:

elastic-mapreduce –create –alive –plain-output
–instance-group master –instance-type m1.small –instance-count 1 –bid-price 0.091 \
–instance-group core –instance-type m1.small –instance-count 10 –bid-price 0.031 \
–instance-group task –instance-type m1.small –instance-count 30 –bid-price 0.021

But realistically, as you know, if you request spot instances, keep in mind that if the current spot price exceeds your max bid, either instances will not be provisioned or will be removed from the current job flow. Thus, if at any time the bid price goes higher and you lose any of your CORE or MASTER node then the job will fail. Both CORE and TASKS nodes run TaskTrackers but only CORE nodes run DataNodes so you would need at least one CORE node.

To hedge the complete loss of a jobflow, multiple instance groups can be created where the `CORE` group is a smaller complement of traditional on-demand systems and the `TASK` group is the group of spot instances. In this configuration, the `TASK` group will only benefit the mapper phases of a job flow as work from the `TASK` group is “hand back up” to the `CORE` group for reduction.

So, say if you have to run a job which would ideally need 40 slave machines, then you can have, say 10 machines(CORE group) as the traditional instance while other 30 as spot instances(TASK group). The syntax for creating the multiple instance groups is below:

elastic-mapreduce –create –alive –plain-output
–instance-group master –instance-type m1.small –instance-count 1 \
–instance-group core –instance-type m1.small –instance-count 10 \
–instance-group task –instance-type m1.small –instance-count 30 –bid-price 0.021.

This will help you to save cost by running spot instances as your nodes and at the same time make sure that job does not fail.

However, keep in mind that it is possible, depending upon your price and the time taken to complete the job, the spot instances may come and go so might in the worst case end up incurring the same cost and taking longer time to complete the job. It will all depend on your bid price so choose the price wisely. We have also been successful in running short tasks (20 minutes) with all the machines as spot instances!


  1. There are lots of information about latest technology and how to get trained in them, like Big Data Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Big Data Training). By the way you are running a great blog. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training in Chennai

  2. The way you have explained about the latest technology was really impressive. Thanks for sharing this useful content in here.

    Salesforce course in chennai
    Salesforce course in chennai

    1. I am technology Enthusiast. Your blog is really awesome, attractive and impressive. I like the way you think. it is very useful for Java SE & Java EE Learners. Your article adds best knowledge to our Java Online Training in India. or learn thru Java Online Training in India Students. or learn thru JavaScript Online Training in India. Appreciating the persistence you put into your blog and detailed information you provide. Kindly keep blogging.

  3. What a great post amazing sms marketing with lots of features thanks for cool sharing SMS Marketing Applications

  4. nice article to sharing

  5. Very good points you wrote here..Great stuff...I think you've made some truly interesting points.Keep up the good work. bulk sms ahmedabad provider

  6. Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    Hadoop Training Chennai | PHP Training in Chennai

  7. betting sites
    This is really useful info and helpful for me. I am very much happy for using the nice technology in his blog that to sharing the nice info is visible in this blog
    free pokers

    free casinos

  8. I am very much happy for using the great info is visible in this blog that to sharing the nice services in this blog.

    freeonline gamblinggames
    golden casinoonline
    gambling cardgames

  9. Useful article for a young business which does not use it for keeping in touch with your favorite client. I've been working a long time with this service This helps the client to be with you, it will help to get more profit.

  10. Hi, Really your post was very informative. Today's internet era learn Hadoop Online Training will helps you to reach your goal.Selenium Training

  11. Nice sharing. R is a language and environment for statistical computing and graphics. Want to make a career in R Programming. Learn R Programming Training course @ GangBoard. We are the best provider of online training on evergreen technologies.

  12. You have done really great job. Your blog is very unique and informative. Thanks. Devops Online Training | Data Science Online Training

  13. In sightly useful post.This post is much helpful for us. This is really very massive value to all the readers and it will be the only reason for the post to get popular with great authority. Selenium Training in Chennai | Selenium Training Center in Chennai

  14. Wonderful article.It defined the concepts very well.Explanation is quite clear. It has more valuable information for encourage me to achieve my career goal.
    BE/B.Tech Project Center in Chennai | ME/M.Tech Project Center in Chennai | Final Year Project Center in Chennai

  15. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.

    sas training in bangalore

  16. I wish to show thanks to you just for bailing me out of this particular
    trouble.As a result of checking through the net and meeting
    techniques that were not productive, I thought my life was done.

    java training in chennai

  17. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    dotnet training in jayanagar

  18. I really enjoyed while reading your article, the information you have delivered in this post was damn good. Keep sharing your post with efficient news. RPA Training Institute in Chennai | UI Path Training Institute in Chennai | Blue Prism Training Institute in Chennai

  19. Nice post.. Really you are done a wonderful job. Thanks for sharing such wonderful information with us. Please keep on updating..
    Best VMware Training Institute in Chennai | Best VMware Training Institute in Velachery

  20. Excellent information with unique content and it is very useful to know about the information based on blogs..
    Best AWS Training Institute in Chennai | Best AWS Training Institute in Velachery

  21. This content creates a new hope and inspiration with in me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Automation Anywhere Training in Chennai | RPA Training Institute in Chennai | UI Path Training Institute in Chennai | Blue Prism Training Institute in Chennai

  22. Post is very informative… It helped me with great information so I really believe you will do much better in the future.
    Dot-net Summer Course training Institute in Chennai|Dot-net Summer Course training Institute in Ashok Nagar

  23. Amazing post.Thanks for your details and explanations..I want more information from your side.Thank you
    manufacturing erp software in chennai

  24. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    java training in chennai | java training in bangalore

    java training in tambaram | java training in velachery

    java training in omr

  25. mytectra placement Portal is a Web based portal brings Potentials Employers and myTectra Candidates on a common platform for placement assistance.

  26. Greetings. I know this is somewhat off-topic, but I was wondering if you knew where I could get a captcha plugin for my comment form? I’m using the same blog platform like yours, and I’m having difficulty finding one? Thanks a lot.

    AWS Training in Bangalore | Amazon Web Services Training in Bangalore

    Amazon Web Services Training in Pune | Best AWS Training in Pune

    AWS Online Training | Online AWS Certification Course - Gangboard

  27. I’ve desired to post about something similar to this on one of my blogs and this has given me an idea. Cool Mat.
    python online training
    python training in OMR
    python training course in chennai

  28. This is most informative and also this post most user friendly and super navigation to all posts... Thank you so much for giving this information to me.. 

    Devops Training in pune
    DevOps online Training

  29. Really I Appreciate The Effort You Made To Share The Knowledge. This Is Really A Great Stuff For Sharing. Keep It Up . Thanks For Sharing.

    Cloud Training
    Cloud Training in Chennai

  30. Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.

    rpa interview questions and answers
    automation anywhere interview questions and answers
    blueprism interview questions and answers
    uipath interview questions and answers
    rpa training in chennai

  31. I am so proud of you and your efforts and work make me realize that anything can be done with patience and sincerity. Well I am here to say that your work has inspired me without a doubt.
    angularjs online Training

    angularjs Training in marathahalli

    angularjs interview questions and answers

    angularjs Training in bangalore

    angularjs Training in bangalore

    angularjs Training in chennai


Any feedback, good or bad is most welcome.


Email *

Message *