You need to analyze 60,000,000 images stored in JPEG format, each of which is
approximately 25 KB. Because you Hadoop cluster isn’t optimized for storing and
processing many small files, you decide to do the following actions: 1. Group the individual
images into a set of larger files 2. Use the set of larger files as input for a MapReduce job
that processes them directly with python using Hadoop streaming. Which data serialization
system gives the flexibility to do this?

A.
JSON
B.
SequenceFiles
C.
Avro
D.
HTML
E.
CSV
F.
XML
B
0
0
C is the correct answer.
0
0
B because Avro is better for queries and schema
and sequence is good for putting lots of small files in one bigger file
0
0
B is the best answer
0
0
B or C
0
0
right answer is C
0
0