Monday, March 25, 2013

Amazon EMR: Can we resize a cluster having a running job?


Can we increase or decrease the number of nodes in a running job flow using Amazon’s EMR?
Hell yeah!
To quote Amazon:
You can modify the size of a running job flow using either the API or the CLI. The AWS Management Console allows you to monitor job flows that you resized, but it does not provide the option to resize job flows.
But I will cover only the CLI part here.
There are 3 essential instance groups into which the nodes launched as a part of cluster falls into:
  1. master instance group : This is a must have with a single master node
  2. core instance group : If there are any slave node then there must be at least one node in this instance group. Nodes in this group act as both Tasknode and Datanode.
  3. task instance group : If a cluster has core instance group, it can also have a task instance group containing one or more Tasknodes. They do not have Datanodes (do not contain HDFS).
While trying to resize your cluster, you must bear in mind the following:
  • The number of core nodes can only be increased and not decreased. Reason behind this being if you remove a core node you may lose the contents the HDFS might be having.
  • The number of task nodes can be both increased and decreased.
  • The number of master has to be always just one, so no fiddling with its number.
Now, the EMR CLI provide with following parameters to handle resizing of clusters:
--modify-instance-group INSTANCE_GROUP_IDModify an existing instance group.
--instance-count INSTANCE_COUNTSet the count of nodes for an instance group.
Say, you have a cluster with 5 core nodes and 2 task nodes.
Now, in order to increase the number of core nodes to 10, you shall first find out the INSTANCE_GROUP_ID of the cluster’s core instance group. To do that, you can use CLI’s –describe :
ruby elastic-mapreduce --jobflow JobFlowID --describe
This would return a JSON which should have the “InstanceGroupId” of the “Core Instance Group”.
Now, execute the following :
ruby elastic-mapreduce --modify-instance-group InstanceGroupID --instance-count COUNT
In our case the COUNT will be 10.
NOTE: The count isn’t by-how-many-to-increase, but the new total count of nodes in that instance group. Hence in our case while increasing the number of nodes from 5 to 10, the count has to be 10 and not 5.
Similarly, if we want to decrease the number of task nodes to 2, after finding out the ”InstanceGroupId” of the “Task Instance Group”, we shall execute the following:
ruby elastic-mapreduce --modify-instance-group TaskInstanceGroupID --instance-count 2

7 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hello AmarKant,
    Nice Info, I was pulling my hair out how we can resize the running job flow.
    it gives great clue .

    Can you please give some idea how it will call these CLI scripts once I terminated the cluster and created with same AMI.

    Can you please bit explain more the above CLI Command.

    My situation is like When we have number of EMR task are in waiting state , in that case we need to add more task nodes to process the data. How AWS will watch and decide when to increase or decrease the task nodes.

    Thank you .
    Sanjeev

    ReplyDelete
  3. Resizing Pictures involves 3 basic things, the right height, width and quality of the image, setting these three things correctly will resize the picture properly every time.
    Thumbnail

    ReplyDelete
  4. "Thanks for sharing this information I really enjoyed reading this article if you are looking for
    Daclatasvir 60mg please visit us."

    ReplyDelete
  5. Buy Sovihep V 400 mg with Velpaclear 100 mg tablets manufactured by Million Health Pharmaceuticals. Get the latest discounted price of Sovihep in India.

    ReplyDelete

Any feedback, good or bad is most welcome.

Name

Email *

Message *