Which format should you use to store this data in HDFS?

seenagapeDecember 2, 2016

You want to perform analysis on a large collection of images. You want to store this data in HDFS
and process it with MapReduce but you also want to give your data analysts and data scientists
the ability to process the data directly from HDFS with an interpreted high-level programming
language like Python. Which format should you use to store this data in HDFS?

PrepAway - Latest Free Exam Questions & Answers

A.
SequenceFiles

B.
Avro

C.
JSON

D.
HTML

E.
XML

F.
CSV

Explanation:
Using Hadoop Sequence Files
So what should we do in order to deal with huge amount of images? Use hadoop sequence files!
Those are map files that inherently can be read by map reduce applications – there is an input
format especially for sequence files – and are splitable by map reduce, so we can have one huge
file that will be the input of many map tasks. By using those sequence files we are letting hadoop
use its advantages. It can split the work into chunks so the processing is parallel, but the chunks
are big enough that the process stays efficient.
Since the sequence file are map file the desired format will be that the key will be text and hold the
HDFS filename and the value will be BytesWritable and will contain the image content of the file.
Reference: Hadoop binary files processing introduced by image duplicates finder

5 Comments on “Which format should you use to store this data in HDFS?”

deepa says:

September 9, 2014 at 6:11 pm

In order to support multiple programming languages, B should be the answer.

0

0

akhil says:

February 20, 2015 at 8:18 am

A is correct, python supports sequencefileformat

http://nullege.com/codes/show/src@j@y@jydoop-HEAD@pylib@jydoop.py/87/org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat

0

0

Jack says:

March 18, 2015 at 6:18 pm

I think A is correct answer because using streaming – python, sequnce file can be used

0

0

Vadivelkumar Palanisamy says:

June 11, 2015 at 9:20 am

Avro Should be the right answer.

0

0

mr_tienvu says:

December 3, 2016 at 12:56 am

Correct answer is A

0

0

Get 50% Discount on All Your Purchases
at PrepAway.com - Latest Exam Questions

This is ONE TIME OFFER

Enter your email address to receive your 50% off dicount code:

SPECIAL OFFER: GET 50% OFF

Use Discount Code:

Briefing Cloudera Knowledge

Free Cloudera Study Guide

Which format should you use to store this data in HDFS?

5 Comments on “Which format should you use to store this data in HDFS?”

Leave a Reply Cancel reply