PrepAway - Latest Free Exam Questions & Answers

How will you gather this data for your analysis?

You want to understand more about how users browse your public website, such as which pages
they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How
will you gather this data for your analysis?

PrepAway - Latest Free Exam Questions & Answers

A.
Ingest the server web logs into HDFS using Flume.

B.
Write a MapReduce job, with the web servers for mappers, and the Hadoop cluster nodes for
reduces.

C.
Import all users’ clicks from your OLTP databases into Hadoop, using Sqoop.

D.
Channel these clickstreams inot Hadoop using Hadoop Streaming.

E.
Sample the weblogs from the web servers, copying them into Hadoop using curl.

Explanation:
Hadoop MapReduce for Parsing Weblogs
Here are the steps for parsing a log file using Hadoop MapReduce:
Load log files into the HDFS location using this Hadoop command:
hadoop fs -put <local file path of weblogs> <hadoop HDFS location>
The Opencsv2.3.jar framework is used for parsing log records.
Below is the Mapper program for parsing the log file from the HDFS location.
public static class ParseMapper
extends Mapper<Object, Text, NullWritable,Text >{
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {

CSVParser parse = new CSVParser(‘ ‘,’\”‘);
String sp[]=parse.parseLine(value.toString());
int spSize=sp.length;
StringBuffer rec= new StringBuffer();
for(int i=0;i<spSize;i++){
rec.append(sp[i]);
if(i!=(spSize-1))
rec.append(“,”);
}
word.set(rec.toString());
context.write(NullWritable.get(), word);
}
}
The command below is the Hadoop-based log parse execution. TheMapReduce program is
attached in this article. You can add extra parsing methods in the class. Be sure to create a new
JAR with any change and move it to the Hadoop distributed job tracker system.
hadoop jar <path of logparse jar> <hadoop HDFS logfile path> <output path of parsed log file>
The output file is stored in the HDFS location, and the output file name starts with “part-“.

8 Comments on “How will you gather this data for your analysis?


Leave a Reply

Your email address will not be published. Required fields are marked *