Your client application needs to scan s region for the row key value 104.
Given a store that contains the following list of Row Key values:
100, 101, 102, 103, 104, 105, 106, 107
A bloom filter would return which of the following?
Confirmation that 104 may be contained in the set
Confirmation that 104 is contained in the set
The hash of column family
The file offset of the value 104
* When a HFile is opened, typically when a region is deployed to a RegionServer, the bloom filter
is loaded into memory and used to determine if a given key is in that store file.
* Get/Scan(Row) currently does a parallel N-way get of that Row from all StoreFiles in a Region.
This means that you are doing N read requests from disk. BloomFilters provide a lightweight inmemory structure to reduce those N disk reads to only the files likely to contain that Row (N-B).
* Keep in mind that HBase only has a block index per file, which is rather course grained and tells
the reader that a key may be in the file because it falls into a start and end key range in the block
index. But if the key is actually present can only be determined by loading that block and scanning
it. This also places a burden on the block cache and you may create a lot of unnecessary churn
that the bloom filters would help avoid.