読者です 読者をやめる 読者になる 読者になる

スタンドアロンモードでの実行まで

sun-java6-jdkのインストール

$ sudo apt-get install sun-java6-jdk
$ java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)

http://www.apache.org/dyn/closer.cgi/hadoop/core/

$ wget http://www.meisei-u.ac.jp/mirror/apache/dist//hadoop/core/stable/hadoop-0.20.2.tar.gz
$ tar zxvf hadoop-0.20.2.tar.gz
$ cd hadoop-0.20.2/conf/
$ which java
/usr/bin/java
$ vi hadoop-env.sh
export JAVA_HOME=/usr/
$ cd ../bin/
$ ./hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  mradmin              run a Map-Reduce admin client
  fsck                 run a DFS filesystem checking utility
  fs                   run a generic filesystem user client
  balancer             run a cluster balancing utility
  jobtracker           run the MapReduce job Tracker node
  pipes                run a Pipes job
  tasktracker          run a MapReduce task Tracker node
  job                  manipulate MapReduce jobs
  queue                get information regarding JobQueues
  version              print the version
  jar             run a jar file
  distcp   copy file or directories recursively
  archive -archiveName NAME *  create a hadoop archive
  daemonlog            get/set the log level for each daemon
 or
  CLASSNAME            run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

スタンドアロンモードでの実行

$ cd ..
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
11/04/21 23:59:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/04/21 23:59:29 INFO mapred.FileInputFormat: Total input paths to process : 5
11/04/21 23:59:29 INFO mapred.FileInputFormat: Total input paths to process : 5
11/04/21 23:59:29 INFO mapred.JobClient: Running job: job_local_0001
11/04/21 23:59:29 INFO mapred.MapTask: numReduceTasks: 1
11/04/21 23:59:29 INFO mapred.MapTask: io.sort.mb = 100
11/04/21 23:59:30 INFO mapred.JobClient:  map 0% reduce 0%
11/04/21 23:59:30 INFO mapred.MapTask: data buffer = 79691776/99614720
11/04/21 23:59:30 INFO mapred.MapTask: record buffer = 262144/327680
11/04/21 23:59:30 INFO mapred.MapTask: Starting flush of map output
11/04/21 23:59:31 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/04/21 23:59:31 INFO mapred.LocalJobRunner: file:/home/ubuntu/hadoop-0.20.2/input/capacity-scheduler.xml:0+3936
11/04/21 23:59:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/04/21 23:59:31 INFO mapred.MapTask: numReduceTasks: 1
11/04/21 23:59:31 INFO mapred.MapTask: io.sort.mb = 100
11/04/21 23:59:31 INFO mapred.MapTask: data buffer = 79691776/99614720
11/04/21 23:59:31 INFO mapred.MapTask: record buffer = 262144/327680
11/04/21 23:59:31 INFO mapred.MapTask: Starting flush of map output
11/04/21 23:59:31 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
11/04/21 23:59:31 INFO mapred.LocalJobRunner: file:/home/ubuntu/hadoop-0.20.2/input/core-site.xml:0+178
11/04/21 23:59:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
11/04/21 23:59:31 INFO mapred.MapTask: numReduceTasks: 1
11/04/21 23:59:31 INFO mapred.MapTask: io.sort.mb = 100
11/04/21 23:59:31 INFO mapred.MapTask: data buffer = 79691776/99614720
11/04/21 23:59:31 INFO mapred.MapTask: record buffer = 262144/327680
11/04/21 23:59:31 INFO mapred.MapTask: Starting flush of map output
11/04/21 23:59:31 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
11/04/21 23:59:31 INFO mapred.LocalJobRunner: file:/home/ubuntu/hadoop-0.20.2/input/hdfs-site.xml:0+178
11/04/21 23:59:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done.
11/04/21 23:59:31 INFO mapred.MapTask: numReduceTasks: 1
11/04/21 23:59:31 INFO mapred.MapTask: io.sort.mb = 100
11/04/21 23:59:31 INFO mapred.JobClient:  map 100% reduce 0%
11/04/21 23:59:32 INFO mapred.MapTask: data buffer = 79691776/99614720
11/04/21 23:59:32 INFO mapred.MapTask: record buffer = 262144/327680
11/04/21 23:59:32 INFO mapred.MapTask: Starting flush of map output
11/04/21 23:59:32 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting
11/04/21 23:59:32 INFO mapred.LocalJobRunner: file:/home/ubuntu/hadoop-0.20.2/input/mapred-site.xml:0+178
11/04/21 23:59:32 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000003_0' done.
11/04/21 23:59:32 INFO mapred.MapTask: numReduceTasks: 1
11/04/21 23:59:32 INFO mapred.MapTask: io.sort.mb = 100
11/04/21 23:59:33 INFO mapred.MapTask: data buffer = 79691776/99614720
11/04/21 23:59:33 INFO mapred.MapTask: record buffer = 262144/327680
11/04/21 23:59:33 INFO mapred.MapTask: Starting flush of map output
11/04/21 23:59:33 INFO mapred.MapTask: Finished spill 0
11/04/21 23:59:33 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000004_0 is done. And is in the process of commiting
11/04/21 23:59:33 INFO mapred.LocalJobRunner: file:/home/ubuntu/hadoop-0.20.2/input/hadoop-policy.xml:0+4190
11/04/21 23:59:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000004_0' done.
11/04/21 23:59:33 INFO mapred.LocalJobRunner: 
11/04/21 23:59:33 INFO mapred.Merger: Merging 5 sorted segments
11/04/21 23:59:33 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
11/04/21 23:59:33 INFO mapred.LocalJobRunner: 
11/04/21 23:59:33 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/04/21 23:59:33 INFO mapred.LocalJobRunner: 
11/04/21 23:59:33 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/04/21 23:59:33 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to file:/home/ubuntu/hadoop-0.20.2/grep-temp-859935326
11/04/21 23:59:33 INFO mapred.LocalJobRunner: reduce > reduce
11/04/21 23:59:33 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/04/21 23:59:34 INFO mapred.JobClient:  map 100% reduce 100%
11/04/21 23:59:34 INFO mapred.JobClient: Job complete: job_local_0001
11/04/21 23:59:34 INFO mapred.JobClient: Counters: 13
11/04/21 23:59:34 INFO mapred.JobClient:   FileSystemCounters
11/04/21 23:59:34 INFO mapred.JobClient:     FILE_BYTES_READ=969833
11/04/21 23:59:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1030667
11/04/21 23:59:34 INFO mapred.JobClient:   Map-Reduce Framework
11/04/21 23:59:34 INFO mapred.JobClient:     Reduce input groups=1
11/04/21 23:59:34 INFO mapred.JobClient:     Combine output records=1
11/04/21 23:59:34 INFO mapred.JobClient:     Map input records=219
11/04/21 23:59:34 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/04/21 23:59:34 INFO mapred.JobClient:     Reduce output records=1
11/04/21 23:59:34 INFO mapred.JobClient:     Spilled Records=2
11/04/21 23:59:34 INFO mapred.JobClient:     Map output bytes=17
11/04/21 23:59:34 INFO mapred.JobClient:     Map input bytes=8660
11/04/21 23:59:34 INFO mapred.JobClient:     Combine input records=1
11/04/21 23:59:34 INFO mapred.JobClient:     Map output records=1
11/04/21 23:59:34 INFO mapred.JobClient:     Reduce input records=1
11/04/21 23:59:34 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
11/04/21 23:59:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/04/21 23:59:34 INFO mapred.FileInputFormat: Total input paths to process : 1
11/04/21 23:59:34 INFO mapred.JobClient: Running job: job_local_0002
11/04/21 23:59:34 INFO mapred.FileInputFormat: Total input paths to process : 1
11/04/21 23:59:34 INFO mapred.MapTask: numReduceTasks: 1
11/04/21 23:59:34 INFO mapred.MapTask: io.sort.mb = 100
11/04/21 23:59:35 INFO mapred.MapTask: data buffer = 79691776/99614720
11/04/21 23:59:35 INFO mapred.MapTask: record buffer = 262144/327680
11/04/21 23:59:35 INFO mapred.MapTask: Starting flush of map output
11/04/21 23:59:35 INFO mapred.MapTask: Finished spill 0
11/04/21 23:59:35 INFO mapred.TaskRunner: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
11/04/21 23:59:35 INFO mapred.LocalJobRunner: file:/home/ubuntu/hadoop-0.20.2/grep-temp-859935326/part-00000:0+111
11/04/21 23:59:35 INFO mapred.TaskRunner: Task 'attempt_local_0002_m_000000_0' done.
11/04/21 23:59:35 INFO mapred.LocalJobRunner: 
11/04/21 23:59:35 INFO mapred.Merger: Merging 1 sorted segments
11/04/21 23:59:35 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 21 bytes
11/04/21 23:59:35 INFO mapred.LocalJobRunner: 
11/04/21 23:59:35 INFO mapred.TaskRunner: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
11/04/21 23:59:35 INFO mapred.LocalJobRunner: 
11/04/21 23:59:35 INFO mapred.TaskRunner: Task attempt_local_0002_r_000000_0 is allowed to commit now
11/04/21 23:59:35 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to file:/home/ubuntu/hadoop-0.20.2/output
11/04/21 23:59:35 INFO mapred.LocalJobRunner: reduce > reduce
11/04/21 23:59:35 INFO mapred.TaskRunner: Task 'attempt_local_0002_r_000000_0' done.
11/04/21 23:59:35 INFO mapred.JobClient:  map 100% reduce 100%
11/04/21 23:59:35 INFO mapred.JobClient: Job complete: job_local_0002
11/04/21 23:59:35 INFO mapred.JobClient: Counters: 13
11/04/21 23:59:35 INFO mapred.JobClient:   FileSystemCounters
11/04/21 23:59:35 INFO mapred.JobClient:     FILE_BYTES_READ=640527
11/04/21 23:59:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=684253
11/04/21 23:59:35 INFO mapred.JobClient:   Map-Reduce Framework
11/04/21 23:59:35 INFO mapred.JobClient:     Reduce input groups=1
11/04/21 23:59:35 INFO mapred.JobClient:     Combine output records=0
11/04/21 23:59:35 INFO mapred.JobClient:     Map input records=1
11/04/21 23:59:35 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/04/21 23:59:35 INFO mapred.JobClient:     Reduce output records=1
11/04/21 23:59:35 INFO mapred.JobClient:     Spilled Records=2
11/04/21 23:59:35 INFO mapred.JobClient:     Map output bytes=17
11/04/21 23:59:35 INFO mapred.JobClient:     Map input bytes=25
11/04/21 23:59:35 INFO mapred.JobClient:     Combine input records=0
11/04/21 23:59:35 INFO mapred.JobClient:     Map output records=1
11/04/21 23:59:35 INFO mapred.JobClient:     Reduce input records=1
$ cat output/*
1	dfsadmin

参考
http://oss.infoscience.co.jp/hadoop/common/docs/current/quickstart.html