How will you gather this data for your analysis?

seenagapeMay 27, 2015

You want to understand more about how users browse your public website, such as which pages
they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How
will you gather this data for your analysis?

PrepAway - Latest Free Exam Questions & Answers

A.
Ingest the server web logs into HDFS using Flume.

B.
Write a MapReduce job, with the web servers for mappers, and the Hadoop cluster nodes for
reduces.

C.
Import all users’ clicks from your OLTP databases into Hadoop, using Sqoop.

D.
Channel these clickstreams inot Hadoop using Hadoop Streaming.

E.
Sample the weblogs from the web servers, copying them into Hadoop using curl.

Explanation:
Hadoop MapReduce for Parsing Weblogs
Here are the steps for parsing a log file using Hadoop MapReduce:
Load log files into the HDFS location using this Hadoop command:
hadoop fs -put <local file path of weblogs> <hadoop HDFS location>
The Opencsv2.3.jar framework is used for parsing log records.
Below is the Mapper program for parsing the log file from the HDFS location.
public static class ParseMapper
extends Mapper<Object, Text, NullWritable,Text >{
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
CSVParser parse = new CSVParser(‘ ‘,’\”‘);
String sp[]=parse.parseLine(value.toString());
int spSize=sp.length;
StringBuffer rec= new StringBuffer();
for(int i=0;i<spSize;i++){
rec.append(sp[i]);
if(i!=(spSize-1))
rec.append(“,”);
}
word.set(rec.toString());
context.write(NullWritable.get(), word);
}
}
The command below is the Hadoop-based log parse execution. TheMapReduce program is
attached in this article. You can add extra parsing methods in the class. Be sure to create a new
JAR with any change and move it to the Hadoop distributed job tracker system.
hadoop jar <path of logparse jar> <hadoop HDFS logfile path> <output path of parsed log file>
The output file is stored in the HDFS location, and the output file name starts with “part-“.

8 Comments on “How will you gather this data for your analysis?”

Pradeep says:

April 23, 2014 at 10:47 am

Answer is A

Lu says:

June 12, 2014 at 9:54 pm

prefer A

Laxmikanth says:

August 7, 2014 at 11:07 am

Answer is A. Flume is known to read streams of Data on to HDFS.

Abhishek says:

September 5, 2014 at 2:32 pm

It should be B

Jerry says:

October 4, 2014 at 1:32 am

Why not A ??

Sameer says:

February 24, 2015 at 8:34 am

A.
Can anyone explain me what is meant by option B?

chetan says:

April 23, 2015 at 6:34 am

Answer is A

Ravindra Kumar says:

July 5, 2015 at 7:33 pm

answer is A . Flume used to load server logs into HDFS.

Pradeep says:

April 23, 2014 at 10:47 am

Answer is A

0

0

Lu says:

June 12, 2014 at 9:54 pm

prefer A

0

0

Laxmikanth says:

August 7, 2014 at 11:07 am

Answer is A. Flume is known to read streams of Data on to HDFS.

0

0

Abhishek says:

September 5, 2014 at 2:32 pm

It should be B

0

0

Jerry says:

October 4, 2014 at 1:32 am

Why not A ??

0

0

Sameer says:

February 24, 2015 at 8:34 am

A.
Can anyone explain me what is meant by option B?

0

0

chetan says:

April 23, 2015 at 6:34 am

Answer is A

0

0

Ravindra Kumar says:

July 5, 2015 at 7:33 pm

answer is A . Flume used to load server logs into HDFS.

0

0

Get 50% Discount on All Your Purchases
at PrepAway.com - Latest Exam Questions

This is ONE TIME OFFER

Enter your email address to receive your 50% off dicount code:

SPECIAL OFFER: GET 50% OFF

Use Discount Code:

Briefing Cloudera Knowledge

Free Cloudera Study Guide

How will you gather this data for your analysis?

8 Comments on “How will you gather this data for your analysis?”

Leave a Reply Cancel reply