How many sets of physical files will be read during a scan of the entire dataset immediately
following a major compaction?
There are two columns families (Managers and Skills) so there will be two files.
* Physically, all column family members are stored together on the filesystem. Because tunings
and storage specifications are done at the column family level, it is advised that all column family
members have the same general access pattern and size characteristics.
* HBase currently does not do well with anything above two or three column families so keep the
number of column families in your schema low. Currently, flushing and compactions are done on a
per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the
adjacent families will also be flushed though the amount of data they carry is small. When many
column families the flushing and compaction interaction can make for a bunch of needless i/o
loading (To be addressed by changing flushing and compaction to work on a per column family
* When changes are made to either Tables or ColumnFamilies (e.g., region size, block size), these
changes take effect the next time there is a major compaction and the StoreFiles get re-written.
* StoreFiles are composed of blocks. The blocksize is configured on a per-ColumnFamily basis.
Compression happens at the block level within StoreFiles.