PrepAway - Latest Free Exam Questions & Answers

What is the cause of the error?

A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is
a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named
DriverClass.

She runs the command:
Hadoop jar j.jar DriverClass /data/input/data/output
The error message returned includes the line:
PriviligedActionException as:training (auth:SIMPLE)
cause:org.apache.hadoop.mapreduce.lib.input.invalidInputException:
Input path does not exist: file:/data/input
What is the cause of the error?

PrepAway - Latest Free Exam Questions & Answers

A.
The user is not authorized to run the job on the cluster

B.
The output directory already exists

C.
The name of the driver has been spelled incorrectly on the command line

D.
The directory name is misspelled in HDFS

E.
The Hadoop configuration files on the client do not point to the cluster

10 Comments on “What is the cause of the error?

  1. Gaurav says:

    Answer is D.
    There’s no space between input and output directory.
    Command should be like :-
    Hadoop jar j.jar DriverClass /data/input /data/output




    0



    0
  2. b says:

    I believe it is E.

    The error says that “file:/data/input” was not found. This path is a local path. Thus, hadoop is not configured to use”hdfs://namenode:port/data/input” by default. -> Answer E




    0



    0
  3. Hitesh says:

    Ans: D

    Ok to testify this , i actually tried to reproduce this error , where the input directory “input1” doesn’t exist on the hdfs filesystem

    [training@elephant conf]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input1 b
    15/08/23 23:06:16 INFO client.RMProxy: Connecting to ResourceManager at horse/192.168.123.3:8032
    15/08/23 23:06:19 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/training/.staging/job_1440218632040_0008
    15/08/23 23:06:19 WARN security.UserGroupInformation: PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://mycluster/user/training/input1
    org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://mycluster/user/training/input1
    [training@elephant conf]$

    Now when i tried to run it with proper input directory “input”, it ran successfully

    [training@elephant conf]$ hadoop fs -ls
    Found 2 items
    drwx—— – training supergroup 0 2015-08-23 23:06 .staging
    drwxr-xr-x – training supergroup 0 2015-08-20 03:30 elephant

    [training@elephant conf]$
    [training@elephant conf]$ hadoop fs -mkdir input
    [training@elephant conf]$ hadoop fs -put shakespeare.txt input
    [training@elephant conf]$ hadoop fs -ls input
    Found 1 items
    -rw-r–r– 3 training supergroup 5447165 2015-08-23 23:40 input/shakespeare.txt
    [training@elephant conf]$

    [training@elephant conf]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output
    15/08/23 23:42:46 INFO client.RMProxy: Connecting to ResourceManager at horse/192.168.123.3:8032
    15/08/23 23:42:48 INFO input.FileInputFormat: Total input paths to process : 1
    15/08/23 23:42:49 INFO mapreduce.JobSubmitter: number of splits:1
    15/08/23 23:42:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1440218632040_0009
    15/08/23 23:42:52 INFO impl.YarnClientImpl: Submitted application application_1440218632040_0009
    15/08/23 23:42:52 INFO mapreduce.Job: The url to track the job: http://horse:8088/proxy/application_1440218632040_0009/
    15/08/23 23:42:52 INFO mapreduce.Job: Running job: job_1440218632040_0009
    15/08/23 23:43:19 INFO mapreduce.Job: Job job_1440218632040_0009 running in uber mode : false
    15/08/23 23:43:19 INFO mapreduce.Job: map 0% reduce 0%
    15/08/23 23:43:45 INFO mapreduce.Job: map 100% reduce 0%
    15/08/23 23:44:05 INFO mapreduce.Job: map 100% reduce 100%
    15/08/23 23:44:07 INFO mapreduce.Job: Job job_1440218632040_0009 completed successfully
    15/08/23 23:44:07 INFO mapreduce.Job: Counters: 49
    File System Counters
    FILE: Number of bytes read=973070
    FILE: Number of bytes written=2127061
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=5447282
    HDFS: Number of bytes written=713496
    HDFS: Number of read operations=6
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=2
    Job Counters
    Launched map tasks=1
    Launched reduce tasks=1
    Data-local map tasks=1
    Total time spent by all maps in occupied slots (ms)=21492
    Total time spent by all reduces in occupied slots (ms)=16917
    Total time spent by all map tasks (ms)=21492
    Total time spent by all reduce tasks (ms)=16917
    Total vcore-seconds taken by all map tasks=21492
    Total vcore-seconds taken by all reduce tasks=16917
    Total megabyte-seconds taken by all map tasks=22007808
    Total megabyte-seconds taken by all reduce tasks=17323008
    Map-Reduce Framework
    Map input records=124192
    Map output records=899594
    Map output bytes=8528715
    Map output materialized bytes=973070
    Input split bytes=117
    Combine input records=899594
    Combine output records=67108
    Reduce input groups=67108
    Reduce shuffle bytes=973070
    Reduce input records=67108
    Reduce output records=67108
    Spilled Records=134216
    Shuffled Maps =1
    Failed Shuffles=0
    Merged Map outputs=1
    GC time elapsed (ms)=514
    CPU time spent (ms)=7260
    Physical memory (bytes) snapshot=268238848
    Virtual memory (bytes) snapshot=1298903040
    Total committed heap usage (bytes)=132149248
    Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=5447165
    File Output Format Counters
    Bytes Written=713496
    [training@elephant conf]$

    [training@elephant conf]$ hadoop fs -ls output
    Found 2 items
    -rw-r–r– 3 training supergroup 0 2015-08-23 23:44 output/_SUCCESS
    -rw-r–r– 3 training supergroup 713496 2015-08-23 23:44 output/part-r-00000
    [training@elephant conf]$

    ===================================================
    Now if you re-run this job with the same output directory it will fail because output directory exists

    [training@elephant conf]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output
    15/08/23 23:45:49 INFO client.RMProxy: Connecting to ResourceManager at horse/192.168.123.3:8032
    15/08/23 23:45:50 WARN security.UserGroupInformation: PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://mycluster/user/training/output already exists
    org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://mycluster/user/training/output already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
    at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
    [training@elephant conf]$




    0



    0
  4. Dev says:

    E is the correct answer. The error says “Input path does not exist: file:/data/input”

    when its issue with HDFS, error would be “Input path does not exist: hdfs://cdh-X:9000/LERNT.txt”




    0



    0
  5. San says:

    Definitely it is not D as there is space between input and output in real exam
    ie.,Hadoop jar j.jar DriverClass /data/input /data/output.




    0



    0

Leave a Reply

Your email address will not be published. Required fields are marked *