Which is the most efficient process to gather these web server logs into your Hadoop cluster for ana

seenagape

8 years ago

You want to understand more about how users browse you public website. For example, you want
to know which pages they visit prior to placing an order. You have a server farm of 200 web
servers hosting your website. Which is the most efficient process to gather these web server logs
into your Hadoop cluster for analysis?

A.
Sample the web server logs web servers and copy them into HDFS using curl

B.
Ingest the server web logs into HDFS using Flume

C.
Import all users clicks from your OLTP databases into Hadoop using Sqoop

D.
Write a MApReduce job with the web servers from mappers and the Hadoop cluster nodes
reducers

E.
Channel these clickstream into Hadoop using Hadoop Streaming