PrepAway - Latest Free Exam Questions & Answers

How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?

You observe that the number of spilled records from map tasks for exceeds the number of map
output records. You child heap size is 1 GB and your io.sort.mb value is set to 100MB. How would
you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?

PrepAway - Latest Free Exam Questions & Answers

A.
Tune io.sort.mb value until you observe that the number of spilled records equals (or is as close
to equals) the number of map output records.

B.
Decrease the io.sort.mb value below 100MB.

C.
Increase the IO.sort.mb as high you can, as close to 1GB as possible.

D.
For 1GB child heap size an io.sort.mb of 128MB will always maximum memory to disk I/O.

Explanation:
here are a few tradeoffs to consider.
1. the number of seeks being done when merging files. If you increase the merge factor too high,
then the seek cost on disk will exceed the savings from doing a parallel merge (note that OS
cache might mitigate this somewhat).
2. Increasing the sort factor decreases the amount of data in each partition. I believe the number is
io.sort.mb / io.sort.factor for each partition of sorted data. I believe the general rule of thumb is to
have io.sort.mb = 10 * io.sort.factor (this is based on the seek latency of the disk on the transfer
speed, I believe. I’m sure this could be tuned better if it was your bottleneck. If you keep these in
line with each other, then the seek overhead from merging should be minimized
3. you increase io.sort.mb, then you increase memory pressure on the cluster, leaving less
memory available for job tasks. Memory usage for sorting is mapper tasks * io.sort.mb — so you
could find yourself causing extra GCs if this is too high

Essentially,
If you find yourself swapping heavily, then there’s a good chance you have set the sort factor too
high.
If the ratio between io.sort.mb and io.sort.factor isn’t correct, then you may need to change
io.sort.mb (if you have the memory) or lower the sort factor.
If you find that you are spending more time in your mappers than in your reducers, then you may
want to increase the number of map tasks and decrease the sort factor (assuming there is
memory pressure).
How could I tell if my hadoop config parameter io.sort.factor is too small or too big?
http://stackoverflow.com/questions/8642566/how-could-i-tell-if-my-hadoop-config-parameter-iosort-factor-is-too-small-or-to

One Comment on “How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?


Leave a Reply

Your email address will not be published. Required fields are marked *