Sunday, April 28, 2013

Reduce EMR costs : use spot instances and yes, without risk!

As aws states, Spot Instances can significantly lower your computing costs for time-flexible, interruption-tolerant tasks. But, hadoop jobs being run on EMR aren't generally interruption-tolerant. So, how do we use spot instances and still do not lose our clusters while being in between of a running task?

The answer lies in not having all the machines in a cluster to be spot-instances : some on-demand and some spot instances.

Now, how to achieve this?

There are 3 essential instance groups into which the nodes launched as a part of cluster falls into:
  1. master instance group : This is a must have with a single master node
  2. core instance group : If there are any slave node then there must be at least one node in this instance group. Nodes in this group act as both Tasknode and Datanode.
  3. task instance group : If a cluster has core instance group, it can also have a task instance group containing one or more Tasknodes. They do not have Datanodes (do not contain HDFS).

You can choose to use either on-demand or spot-instances for each of your job flows. This is valid for all of the above instance groups. However, from the definition above if you lose a master or core machine then your job is bound to fail. Theoretically, you can have something like:

elastic-mapreduce –create –alive –plain-output
...
–instance-group master –instance-type m1.small –instance-count 1 –bid-price 0.091 \
–instance-group core –instance-type m1.small –instance-count 10 –bid-price 0.031 \
–instance-group task –instance-type m1.small –instance-count 30 –bid-price 0.021
...

But realistically, as you know, if you request spot instances, keep in mind that if the current spot price exceeds your max bid, either instances will not be provisioned or will be removed from the current job flow. Thus, if at any time the bid price goes higher and you lose any of your CORE or MASTER node then the job will fail. Both CORE and TASKS nodes run TaskTrackers but only CORE nodes run DataNodes so you would need at least one CORE node.

To hedge the complete loss of a jobflow, multiple instance groups can be created where the `CORE` group is a smaller complement of traditional on-demand systems and the `TASK` group is the group of spot instances. In this configuration, the `TASK` group will only benefit the mapper phases of a job flow as work from the `TASK` group is “hand back up” to the `CORE` group for reduction.

So, say if you have to run a job which would ideally need 40 slave machines, then you can have, say 10 machines(CORE group) as the traditional instance while other 30 as spot instances(TASK group). The syntax for creating the multiple instance groups is below:

elastic-mapreduce –create –alive –plain-output
...
–instance-group master –instance-type m1.small –instance-count 1 \
–instance-group core –instance-type m1.small –instance-count 10 \
–instance-group task –instance-type m1.small –instance-count 30 –bid-price 0.021.

This will help you to save cost by running spot instances as your nodes and at the same time make sure that job does not fail.

PS:
However, keep in mind that it is possible, depending upon your price and the time taken to complete the job, the spot instances may come and go so might in the worst case end up incurring the same cost and taking longer time to complete the job. It will all depend on your bid price so choose the price wisely. We have also been successful in running short tasks (20 minutes) with all the machines as spot instances!

30 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Big Data Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Big Data Training). By the way you are running a great blog. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training in Chennai

    ReplyDelete
  2. The way you have explained about the latest technology was really impressive. Thanks for sharing this useful content in here.

    Salesforce course in chennai
    Salesforce course in chennai

    ReplyDelete
    Replies
    1. I am technology Enthusiast. Your blog is really awesome, attractive and impressive. I like the way you think. it is very useful for Java SE & Java EE Learners. Your article adds best knowledge to our Java Online Training in India. or learn thru Java Online Training in India Students. or learn thru JavaScript Online Training in India. Appreciating the persistence you put into your blog and detailed information you provide. Kindly keep blogging.

      Delete
  3. What a great post amazing sms marketing with lots of features thanks for cool sharing SMS Marketing Applications

    ReplyDelete
  4. nice article to sharing
    http://hadooptraininginhyderabad.co.in/apache-spark-and-scala-training/

    ReplyDelete
  5. Very good points you wrote here..Great stuff...I think you've made some truly interesting points.Keep up the good work. bulk sms ahmedabad provider

    ReplyDelete
  6. Excellent post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    Hadoop Training Chennai | PHP Training in Chennai

    ReplyDelete
  7. betting sites
    This is really useful info and helpful for me. I am very much happy for using the nice technology in his blog that to sharing the nice info is visible in this blog
    free pokers

    free casinos

    ReplyDelete
  8. I am very much happy for using the great info is visible in this blog that to sharing the nice services in this blog.

    freeonline gamblinggames
    golden casinoonline
    gambling cardgames

    ReplyDelete
  9. Useful article for a young business which does not use it for keeping in touch with your favorite client. I've been working a long time with this service https://www.intistele.com/blog/work-with-contact-lists/. This helps the client to be with you, it will help to get more profit.

    ReplyDelete
  10. Hi, Really your post was very informative. Today's internet era learn Hadoop Online Training will helps you to reach your goal.Selenium Training

    ReplyDelete
  11. Nice sharing. R is a language and environment for statistical computing and graphics. Want to make a career in R Programming. Learn R Programming Training course @ GangBoard. We are the best provider of online training on evergreen technologies.

    ReplyDelete
  12. You have done really great job. Your blog is very unique and informative. Thanks. Devops Online Training | Data Science Online Training

    ReplyDelete
  13. In sightly useful post.This post is much helpful for us. This is really very massive value to all the readers and it will be the only reason for the post to get popular with great authority. Selenium Training in Chennai | Selenium Training Center in Chennai

    ReplyDelete
  14. Wonderful article.It defined the concepts very well.Explanation is quite clear. It has more valuable information for encourage me to achieve my career goal.
    BE/B.Tech Project Center in Chennai | ME/M.Tech Project Center in Chennai | Final Year Project Center in Chennai

    ReplyDelete
  15. Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.

    sas training in bangalore

    ReplyDelete
  16. I wish to show thanks to you just for bailing me out of this particular
    trouble.As a result of checking through the net and meeting
    techniques that were not productive, I thought my life was done.

    java training in chennai

    ReplyDelete
  17. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.
    dotnet training in jayanagar

    ReplyDelete
  18. I really enjoyed while reading your article, the information you have delivered in this post was damn good. Keep sharing your post with efficient news. RPA Training Institute in Chennai | UI Path Training Institute in Chennai | Blue Prism Training Institute in Chennai

    ReplyDelete
  19. Nice post.. Really you are done a wonderful job. Thanks for sharing such wonderful information with us. Please keep on updating..
    Best VMware Training Institute in Chennai | Best VMware Training Institute in Velachery

    ReplyDelete
  20. Excellent information with unique content and it is very useful to know about the information based on blogs..
    Best AWS Training Institute in Chennai | Best AWS Training Institute in Velachery

    ReplyDelete
  21. This content creates a new hope and inspiration with in me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Automation Anywhere Training in Chennai | RPA Training Institute in Chennai | UI Path Training Institute in Chennai | Blue Prism Training Institute in Chennai

    ReplyDelete
  22. Post is very informative… It helped me with great information so I really believe you will do much better in the future.
    Dot-net Summer Course training Institute in Chennai|Dot-net Summer Course training Institute in Ashok Nagar

    ReplyDelete
  23. Amazing post.Thanks for your details and explanations..I want more information from your side.Thank you
    manufacturing erp software in chennai

    ReplyDelete
  24. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.
    java training in chennai | java training in bangalore

    java training in tambaram | java training in velachery

    java training in omr

    ReplyDelete

Any feedback, good or bad is most welcome.

Name

Email *

Message *