Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume
that the two tables are formatted as comma-separated file in HDFS.
Yes, but only if one of the tables fits into memory.
Yes, so long as both tables fit into memory.
No, MapReduce cannot perform relational operations.
No, but it can be done with either Pig or Hive.
* Join Algorithms in MapReduce
A) Reduce-side join
B) Map-side join
C) In-memory join
/ Striped Striped variant variant
/ Memcached variant
* Which join to use?
/ In-memory join > map-side join > reduce-side join
/ Limitations of each?
In-memory join: memory
Map-side join: sort order and partitioning
Reduce-side join: general purpose