What is the cause of the error?

seenagapeMarch 26, 2017

A user comes to you, complaining that when she attempts to submit a Hadoop job, it fails. There is
a Directory in HDFS named /data/input. The Jar is named j.jar, and the driver class is named
DriverClass.

She runs the command:
Hadoop jar j.jar DriverClass /data/input/data/output
The error message returned includes the line:
PriviligedActionException as:training (auth:SIMPLE)
cause:org.apache.hadoop.mapreduce.lib.input.invalidInputException:
Input path does not exist: file:/data/input
What is the cause of the error?

PrepAway - Latest Free Exam Questions & Answers

A.
The user is not authorized to run the job on the cluster

B.
The output directory already exists

C.
The name of the driver has been spelled incorrectly on the command line

D.
The directory name is misspelled in HDFS

E.
The Hadoop configuration files on the client do not point to the cluster

10 Comments on “What is the cause of the error?”

Jessy says:

November 22, 2014 at 6:34 pm

Answer is D.

Aneesh Mohan says:

June 15, 2015 at 12:07 am

Correct Answer is “A” [ Privileged Exception ]

Gaurav says:

July 10, 2015 at 3:20 pm

Answer is D.
There’s no space between input and output directory.
Command should be like :-
Hadoop jar j.jar DriverClass /data/input /data/output

b says:

July 22, 2015 at 12:41 pm

I believe it is E.

The error says that “file:/data/input” was not found. This path is a local path. Thus, hadoop is not configured to use”hdfs://namenode:port/data/input” by default. -> Answer E

John says:

August 8, 2015 at 5:25 pm

The answer is E, the user tries to work on local filesystem and not on HDFS.

Hitesh says:

August 24, 2015 at 7:37 am

Ans: D

Ok to testify this , i actually tried to reproduce this error , where the input directory “input1” doesn’t exist on the hdfs filesystem

[training@elephant conf]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input1 b
15/08/23 23:06:16 INFO client.RMProxy: Connecting to ResourceManager at horse/192.168.123.3:8032
15/08/23 23:06:19 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/training/.staging/job_1440218632040_0008
15/08/23 23:06:19 WARN security.UserGroupInformation: PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://mycluster/user/training/input1
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://mycluster/user/training/input1
[training@elephant conf]$

Now when i tried to run it with proper input directory “input”, it ran successfully

[training@elephant conf]$ hadoop fs -ls
Found 2 items
drwx—— – training supergroup 0 2015-08-23 23:06 .staging
drwxr-xr-x – training supergroup 0 2015-08-20 03:30 elephant

[training@elephant conf]$
[training@elephant conf]$ hadoop fs -mkdir input
[training@elephant conf]$ hadoop fs -put shakespeare.txt input
[training@elephant conf]$ hadoop fs -ls input
Found 1 items
-rw-r–r– 3 training supergroup 5447165 2015-08-23 23:40 input/shakespeare.txt
[training@elephant conf]$

[training@elephant conf]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output
15/08/23 23:42:46 INFO client.RMProxy: Connecting to ResourceManager at horse/192.168.123.3:8032
15/08/23 23:42:48 INFO input.FileInputFormat: Total input paths to process : 1
15/08/23 23:42:49 INFO mapreduce.JobSubmitter: number of splits:1
15/08/23 23:42:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1440218632040_0009
15/08/23 23:42:52 INFO impl.YarnClientImpl: Submitted application application_1440218632040_0009
15/08/23 23:42:52 INFO mapreduce.Job: The url to track the job: http://horse:8088/proxy/application_1440218632040_0009/
15/08/23 23:42:52 INFO mapreduce.Job: Running job: job_1440218632040_0009
15/08/23 23:43:19 INFO mapreduce.Job: Job job_1440218632040_0009 running in uber mode : false
15/08/23 23:43:19 INFO mapreduce.Job: map 0% reduce 0%
15/08/23 23:43:45 INFO mapreduce.Job: map 100% reduce 0%
15/08/23 23:44:05 INFO mapreduce.Job: map 100% reduce 100%
15/08/23 23:44:07 INFO mapreduce.Job: Job job_1440218632040_0009 completed successfully
15/08/23 23:44:07 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=973070
FILE: Number of bytes written=2127061
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5447282
HDFS: Number of bytes written=713496
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=21492
Total time spent by all reduces in occupied slots (ms)=16917
Total time spent by all map tasks (ms)=21492
Total time spent by all reduce tasks (ms)=16917
Total vcore-seconds taken by all map tasks=21492
Total vcore-seconds taken by all reduce tasks=16917
Total megabyte-seconds taken by all map tasks=22007808
Total megabyte-seconds taken by all reduce tasks=17323008
Map-Reduce Framework
Map input records=124192
Map output records=899594
Map output bytes=8528715
Map output materialized bytes=973070
Input split bytes=117
Combine input records=899594
Combine output records=67108
Reduce input groups=67108
Reduce shuffle bytes=973070
Reduce input records=67108
Reduce output records=67108
Spilled Records=134216
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=514
CPU time spent (ms)=7260
Physical memory (bytes) snapshot=268238848
Virtual memory (bytes) snapshot=1298903040
Total committed heap usage (bytes)=132149248
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=5447165
File Output Format Counters
Bytes Written=713496
[training@elephant conf]$

[training@elephant conf]$ hadoop fs -ls output
Found 2 items
-rw-r–r– 3 training supergroup 0 2015-08-23 23:44 output/_SUCCESS
-rw-r–r– 3 training supergroup 713496 2015-08-23 23:44 output/part-r-00000
[training@elephant conf]$

===================================================
Now if you re-run this job with the same output directory it will fail because output directory exists

[training@elephant conf]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output
15/08/23 23:45:49 INFO client.RMProxy: Connecting to ResourceManager at horse/192.168.123.3:8032
15/08/23 23:45:50 WARN security.UserGroupInformation: PriviledgedActionException as:training (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://mycluster/user/training/output already exists
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://mycluster/user/training/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:343)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1295)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1292)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1292)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1313)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
[training@elephant conf]$