You have a table with 5 TB of data, 10 RegionServers, and a region size of 256MB. You want to
continue with puts to widely disbursed row ids in your table. Which of the following will improve
Increase your buffer cache in the RegionServers
Increase the number of RegionServers to 15
Decrease your number of RegionServers to 5
Decrease your region size to 128MB
Determining the “right” region size can be tricky, and there are a few factors to consider:
HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data,
on a 20 node machine your data will be concentrated on just a few machines – nearly the entire
cluster will be idle. This really cant be stressed enough, since a common problem is loading 200M
B data into HBase then wondering why your awesome 10 node cluster isn’t doing anything.
On the other hand, high region count has been known to make things slow. This is getting better
with each release of HBase, but it is probably better to have 700 regions than 3000 for the same
amount of data.
There is not much memory footprint difference between 1 region and 10 in terms of indexes, etc,
held by the RegionServer.