hadoop スタンドアロンモードでの実行までからの続き
疑似分散モード
$vi conf/core-site.xml
fs.default.name
hdfs://localhost:9000
$ vi conf/hdfs-site.xml
dfs.replication
1
$ vi conf/mapred-site.xml
mapred.job.tracker
localhost:9001
$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 99:94:0d:0f:fb:10:bd:7d:7e:89:0f:6e:21:71:50:80.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
ubuntu@localhost's password:
Permission denied, please try again.
ubuntu@localhost's password:
Permission denied, please try again.
ubuntu@localhost's password:
Permission denied (publickey,password).
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /home/ubuntu/.ssh/id_dsa.
Your public key has been saved in /home/ubuntu/.ssh/id_dsa.pub.
The key fingerprint is:
f4:82:92:0c:55:af:07:c4:00:60:5b:39:a8:cf:fd:2e ubuntu@ubuntu-VirtualBox
The key's randomart image is:
--[ DSA 1024]----
.ooo++o |
..o+ ... |
... . . o |
. o . = . |
o .+ o S . |
o .. . . |
. |
E . |
o. |
$ cat ~/.
ssh/id_dsa.pub >> ~/.
ssh/authorized_keys
$ bin/hadoop namenode -format
11/04/23 01:02:49 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu-VirtualBox/192.168.1.4
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
11/04/23 01:02:49 INFO namenode.FSNamesystem: fsOwner=ubuntu,ubuntu,adm,dialout,cdrom,plugdev,lpadmin,admin,sambashare
11/04/23 01:02:49 INFO namenode.FSNamesystem: supergroup=supergroup
11/04/23 01:02:49 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/04/23 01:02:50 INFO common.Storage: Image file of size 96 saved in 0 seconds.
11/04/23 01:02:50 INFO common.Storage: Storage directory /tmp/hadoop-ubuntu/dfs/name has been successfully formatted.
11/04/23 01:02:50 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu-VirtualBox/192.168.1.4
************************************************************/
$ bin/start-all.sh
starting namenode, logging to /home/ubuntu/hadoop-0.20.2/bin/../logs/hadoop-ubuntu-namenode-ubuntu-VirtualBox.out
localhost: starting datanode, logging to /home/ubuntu/hadoop-0.20.2/bin/../logs/hadoop-ubuntu-datanode-ubuntu-VirtualBox.out
localhost: starting secondarynamenode, logging to /home/ubuntu/hadoop-0.20.2/bin/../logs/hadoop-ubuntu-secondarynamenode-ubuntu-VirtualBox.out
starting jobtracker, logging to /home/ubuntu/hadoop-0.20.2/bin/../logs/hadoop-ubuntu-jobtracker-ubuntu-VirtualBox.out
localhost: starting tasktracker, logging to /home/ubuntu/hadoop-0.20.2/bin/../logs/hadoop-ubuntu-tasktracker-ubuntu-VirtualBox.out
webでの確認
NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
ファイルコピー
$ bin/hadoop fs -put conf input
サンプル実行
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
11/04/23 17:09:30 INFO mapred.FileInputFormat: Total input paths to process : 13
11/04/23 17:09:31 INFO mapred.JobClient: Running job: job_201104230103_0001
11/04/23 17:09:32 INFO mapred.JobClient: map 0% reduce 0%
11/04/23 17:10:00 INFO mapred.JobClient: map 15% reduce 0%
11/04/23 17:10:12 INFO mapred.JobClient: map 30% reduce 0%
11/04/23 17:10:23 INFO mapred.JobClient: map 30% reduce 10%
11/04/23 17:10:30 INFO mapred.JobClient: map 46% reduce 10%
11/04/23 17:10:43 INFO mapred.JobClient: map 61% reduce 15%
11/04/23 17:10:52 INFO mapred.JobClient: map 76% reduce 15%
11/04/23 17:10:59 INFO mapred.JobClient: map 76% reduce 25%
11/04/23 17:11:02 INFO mapred.JobClient: map 92% reduce 25%
11/04/23 17:11:08 INFO mapred.JobClient: map 100% reduce 25%
11/04/23 17:11:14 INFO mapred.JobClient: map 100% reduce 33%
11/04/23 17:11:20 INFO mapred.JobClient: map 100% reduce 100%
11/04/23 17:11:23 INFO mapred.JobClient: Job complete: job_201104230103_0001
11/04/23 17:11:24 INFO mapred.JobClient: Counters: 18
11/04/23 17:11:24 INFO mapred.JobClient: Job Counters
11/04/23 17:11:24 INFO mapred.JobClient: Launched reduce tasks=1
11/04/23 17:11:24 INFO mapred.JobClient: Launched map tasks=13
11/04/23 17:11:24 INFO mapred.JobClient: Data-local map tasks=13
11/04/23 17:11:24 INFO mapred.JobClient: FileSystemCounters
11/04/23 17:11:24 INFO mapred.JobClient: FILE_BYTES_READ=158
11/04/23 17:11:24 INFO mapred.JobClient: HDFS_BYTES_READ=18249
11/04/23 17:11:24 INFO mapred.JobClient: FILE_BYTES_WRITTEN=804
11/04/23 17:11:24 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=280
11/04/23 17:11:24 INFO mapred.JobClient: Map-Reduce Framework
11/04/23 17:11:24 INFO mapred.JobClient: Reduce input groups=7
11/04/23 17:11:24 INFO mapred.JobClient: Combine output records=7
11/04/23 17:11:24 INFO mapred.JobClient: Map input records=555
11/04/23 17:11:24 INFO mapred.JobClient: Reduce shuffle bytes=230
11/04/23 17:11:24 INFO mapred.JobClient: Reduce output records=7
11/04/23 17:11:24 INFO mapred.JobClient: Spilled Records=14
11/04/23 17:11:24 INFO mapred.JobClient: Map output bytes=193
11/04/23 17:11:24 INFO mapred.JobClient: Map input bytes=18249
11/04/23 17:11:24 INFO mapred.JobClient: Combine input records=10
11/04/23 17:11:24 INFO mapred.JobClient: Map output records=10
11/04/23 17:11:24 INFO mapred.JobClient: Reduce input records=7
11/04/23 17:11:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/04/23 17:11:24 INFO mapred.FileInputFormat: Total input paths to process : 1
11/04/23 17:11:27 INFO mapred.JobClient: Running job: job_201104230103_0002
11/04/23 17:11:28 INFO mapred.JobClient: map 0% reduce 0%
11/04/23 17:11:41 INFO mapred.JobClient: map 100% reduce 0%
11/04/23 17:11:53 INFO mapred.JobClient: map 100% reduce 100%
11/04/23 17:11:55 INFO mapred.JobClient: Job complete: job_201104230103_0002
11/04/23 17:11:55 INFO mapred.JobClient: Counters: 18
11/04/23 17:11:55 INFO mapred.JobClient: Job Counters
11/04/23 17:11:55 INFO mapred.JobClient: Launched reduce tasks=1
11/04/23 17:11:55 INFO mapred.JobClient: Launched map tasks=1
11/04/23 17:11:55 INFO mapred.JobClient: Data-local map tasks=1
11/04/23 17:11:55 INFO mapred.JobClient: FileSystemCounters
11/04/23 17:11:55 INFO mapred.JobClient: FILE_BYTES_READ=158
11/04/23 17:11:55 INFO mapred.JobClient: HDFS_BYTES_READ=280
11/04/23 17:11:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=348
11/04/23 17:11:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=96
11/04/23 17:11:55 INFO mapred.JobClient: Map-Reduce Framework
11/04/23 17:11:55 INFO mapred.JobClient: Reduce input groups=3
11/04/23 17:11:55 INFO mapred.JobClient: Combine output records=0
11/04/23 17:11:55 INFO mapred.JobClient: Map input records=7
11/04/23 17:11:55 INFO mapred.JobClient: Reduce shuffle bytes=158
11/04/23 17:11:55 INFO mapred.JobClient: Reduce output records=7
11/04/23 17:11:55 INFO mapred.JobClient: Spilled Records=14
11/04/23 17:11:55 INFO mapred.JobClient: Map output bytes=138
11/04/23 17:11:55 INFO mapred.JobClient: Map input bytes=194
11/04/23 17:11:55 INFO mapred.JobClient: Combine input records=0
11/04/23 17:11:55 INFO mapred.JobClient: Map output records=7
11/04/23 17:11:55 INFO mapred.JobClient: Reduce input records=7
ファイルコピー
$ bin/hadoop fs -get output output
確認
$ bin/hadoop fs -cat output/*
cat: File does not exist: output/output
3 dfs.class
2 dfs.period
1 dfs.file
1 dfs.replication
1 dfs.servers
1 dfsadmin
1 dfsmetrics.log
デーモン停止
$ bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode