which method in the Mapper you should use to implement code for reading the file and populating the associativ

seenagapeMay 28, 2015

You want to populate an associative array in order to perform a map-side join. You’ve decided to
put this information in a text file, place that file into the DistributedCache and read it in your
Mapper before any records are processed.
Indentify which method in the Mapper you should use to implement code for reading the file and
populating the associative array?

PrepAway - Latest Free Exam Questions & Answers

A.
combine

B.
map

C.
init

D.
configure

Explanation:
See 3) below.
Here is an illustrative example on how to use the DistributedCache:
// Setting up the cache for the application
1. Copy the requisite files to the FileSystem:
$ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat
$ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip
$ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar
$ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar
$ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz
$ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz
2. Setup the application’s JobConf:
JobConf job = new JobConf();
DistributedCache.addCacheFile(new URI(“/myapp/lookup.dat#lookup.dat”),
job);
DistributedCache.addCacheArchive(new URI(“/myapp/map.zip”, job);
DistributedCache.addFileToClassPath(new Path(“/myapp/mylib.jar”), job);
DistributedCache.addCacheArchive(new URI(“/myapp/mytar.tar”, job);
DistributedCache.addCacheArchive(new URI(“/myapp/mytgz.tgz”, job);
DistributedCache.addCacheArchive(new URI(“/myapp/mytargz.tar.gz”, job);
3. Use the cached files in the Mapper
or Reducer:
public static class MapClass extends MapReduceBase
implements Mapper<K, V, K, V> {
private Path[] localArchives;
private Path[] localFiles;
public void configure(JobConf job) {
// Get the cached archives/files
localArchives = DistributedCache.getLocalCacheArchives(job);
localFiles = DistributedCache.getLocalCacheFiles(job);
}
public void map(K key, V value,
OutputCollector<K, V> output, Reporter reporter)
throws IOException {
// Use data from the cached archives/files here
// …
// …
output.collect(k, v);
}
}
Reference: org.apache.hadoop.filecache , Class DistributedCache

4 Comments on “which method in the Mapper you should use to implement code for reading the file and populating the associativ”

Lu says:

June 12, 2014 at 9:50 pm

it should be “B”, since configure is only used in the old MapReduce API (refer to Page 294 in hadoop the definitive guide version 3)
also in Question 43, almost the same question

Westby says:

January 23, 2015 at 6:17 am

in new API, the function is setup() not map

0

0

Reply

Sameer says:

February 24, 2015 at 8:30 am

configure().

In new API – setup().

Vishal says:

August 24, 2015 at 2:25 pm

D.configure

Lu says:

June 12, 2014 at 9:50 pm

it should be “B”, since configure is only used in the old MapReduce API (refer to Page 294 in hadoop the definitive guide version 3)
also in Question 43, almost the same question

0

0

1. Westby says:
  
  January 23, 2015 at 6:17 am
  
  in new API, the function is setup() not map
  
  0
  
  0
  
Sameer says:

February 24, 2015 at 8:30 am

configure().

In new API – setup().

0

0

Vishal says:

August 24, 2015 at 2:25 pm

D.configure

0

0

Get 50% Discount on All Your Purchases
at PrepAway.com - Latest Exam Questions

This is ONE TIME OFFER

Enter your email address to receive your 50% off dicount code:

SPECIAL OFFER: GET 50% OFF

Use Discount Code:

Briefing Cloudera Knowledge

Free Cloudera Study Guide

which method in the Mapper you should use to implement code for reading the file and populating the associativ

4 Comments on “which method in the Mapper you should use to implement code for reading the file and populating the associativ”

Leave a Reply Cancel reply