Production Level Knowledge & Tips

  1. プログラミング
  2. 37 view

Apache Hadoop インストール

Apache Hadoop インストール
以前Hadoopの検証した時のインストールログ。
個人的には、現在はCDH版のHadoopをインストールした方が安定して使えると思う。
CDH版のインストール方法はいつか投稿する予定。

スタンドアロンモードの環境整備

環境

Amazon EC2 t1.micro instance (Ubuntu 12.04 LTS)

JDKのインストール

$sudo apt-get install default-jdk

Apache Hadoopのインストール

[user1@node1]
$sudo adduser hadoop
$wget http://ftp.jaist.ac.jp/pub/apache/hadoop/common/hadoop-2.0.3-alpha/hadoop-2.0.3-alpha.tar.gz
$tar zxvf hadoop-2.0.3-alpha.tar.gz
$sudo mv hadoop-2.0.3-alpha /usr/local/hadoop
$sudo rm hadoop-2.0.3-alpha.tar.gz
$sudo chown hadoop:hadoop -R /usr/local/hadoop/
$sudo su hadoop
[hadoop@node1]
$cd ~
$ which java
/usr/bin/java
$vim .bashrc
export JAVA_HOME=/usr
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$HADOOP_INSTALL/bin:$JAVA_HOME/bin:$PATH
$source .bashrc
$hadoop version
Hadoop 2.0.3-alpha
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.0.3-alpha/hadoop-common-project/hadoop-common -r 1443299
Compiled by hortonmu on Thu Feb  7 03:33:19 UTC 2013
From source with checksum 30d3d872f9f4a8d4c53d8cfaa17393f4

サンプル(wordcount)の実行

[hadoop@node1]
$cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
$vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export HADOOP_CLIENT_OPTS="-Xmx1024m $HADOOP_CLIENT_OPTS"
(128m -> 1024m)
$mkdir -p hadoop-job/input
$vim hadoop-job/input/a
a b c
$vim hadoop-job/input/b
a a b c c c
$cd hadoop-job/
$hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.3-alpha.jar wordcount input output
13/03/23 14:26:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/23 14:26:41 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
13/03/23 14:26:41 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
13/03/23 14:26:42 INFO input.FileInputFormat: Total input paths to process : 2
13/03/23 14:26:43 INFO mapreduce.JobSubmitter: number of splits:2
13/03/23 14:26:43 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/03/23 14:26:43 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/03/23 14:26:43 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
13/03/23 14:26:43 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/03/23 14:26:43 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/03/23 14:26:43 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/03/23 14:26:43 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/03/23 14:26:43 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/03/23 14:26:43 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/03/23 14:26:43 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/03/23 14:26:43 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/03/23 14:26:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local780961261_0001
13/03/23 14:26:43 WARN conf.Configuration: file:/tmp/hadoop-hadoop/mapred/staging/hadoop780961261/.staging/job_local780961261_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
13/03/23 14:26:43 WARN conf.Configuration: file:/tmp/hadoop-hadoop/mapred/staging/hadoop780961261/.staging/job_local780961261_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
13/03/23 14:26:44 WARN conf.Configuration: file:/tmp/hadoop-hadoop/mapred/local/localRunner/job_local780961261_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
13/03/23 14:26:44 WARN conf.Configuration: file:/tmp/hadoop-hadoop/mapred/local/localRunner/job_local780961261_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
13/03/23 14:26:44 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
13/03/23 14:26:44 INFO mapreduce.Job: Running job: job_local780961261_0001
13/03/23 14:26:44 INFO mapred.LocalJobRunner: OutputCommitter set in config null
13/03/23 14:26:44 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
13/03/23 14:26:44 INFO mapred.LocalJobRunner: Waiting for map tasks
13/03/23 14:26:44 INFO mapred.LocalJobRunner: Starting task: attempt_local780961261_0001_m_000000_0
13/03/23 14:26:44 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
13/03/23 14:26:44 INFO mapred.MapTask: Processing split: file:/home/hadoop/hadoop-job/input/b:0+12
13/03/23 14:26:44 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
13/03/23 14:26:45 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
13/03/23 14:26:45 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
13/03/23 14:26:45 INFO mapred.MapTask: soft limit at 83886080
13/03/23 14:26:45 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
13/03/23 14:26:45 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
13/03/23 14:26:45 INFO mapred.LocalJobRunner:
13/03/23 14:26:45 INFO mapred.MapTask: Starting flush of map output
13/03/23 14:26:45 INFO mapred.MapTask: Spilling map output
13/03/23 14:26:45 INFO mapred.MapTask: bufstart = 0; bufend = 36; bufvoid = 104857600
13/03/23 14:26:45 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
13/03/23 14:26:45 INFO mapred.MapTask: Finished spill 0
13/03/23 14:26:45 INFO mapreduce.Job: Job job_local780961261_0001 running in uber mode : false
13/03/23 14:26:45 INFO mapreduce.Job:  map 0% reduce 0%
13/03/23 14:26:45 INFO mapred.Task: Task:attempt_local780961261_0001_m_000000_0 is done. And is in the process of committing
13/03/23 14:26:45 INFO mapred.LocalJobRunner: map
13/03/23 14:26:45 INFO mapred.Task: Task 'attempt_local780961261_0001_m_000000_0' done.
13/03/23 14:26:45 INFO mapred.LocalJobRunner: Finishing task: attempt_local780961261_0001_m_000000_0
13/03/23 14:26:45 INFO mapred.LocalJobRunner: Starting task: attempt_local780961261_0001_m_000001_0
13/03/23 14:26:45 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
13/03/23 14:26:45 INFO mapred.MapTask: Processing split: file:/home/hadoop/hadoop-job/input/a:0+6
13/03/23 14:26:45 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
13/03/23 14:26:45 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
13/03/23 14:26:45 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
13/03/23 14:26:45 INFO mapred.MapTask: soft limit at 83886080
13/03/23 14:26:45 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
13/03/23 14:26:45 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
13/03/23 14:26:45 INFO mapred.LocalJobRunner:
13/03/23 14:26:45 INFO mapred.MapTask: Starting flush of map output
13/03/23 14:26:45 INFO mapred.MapTask: Spilling map output
13/03/23 14:26:45 INFO mapred.MapTask: bufstart = 0; bufend = 18; bufvoid = 104857600
13/03/23 14:26:45 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214388(104857552); length = 9/6553600
13/03/23 14:26:45 INFO mapred.MapTask: Finished spill 0
13/03/23 14:26:45 INFO mapred.Task: Task:attempt_local780961261_0001_m_000001_0 is done. And is in the process of committing
13/03/23 14:26:45 INFO mapred.LocalJobRunner: map
13/03/23 14:26:45 INFO mapred.Task: Task 'attempt_local780961261_0001_m_000001_0' done.
13/03/23 14:26:45 INFO mapred.LocalJobRunner: Finishing task: attempt_local780961261_0001_m_000001_0
13/03/23 14:26:45 INFO mapred.LocalJobRunner: Map task executor complete.
13/03/23 14:26:45 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
13/03/23 14:26:45 INFO mapred.Merger: Merging 2 sorted segments
13/03/23 14:26:45 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 44 bytes
13/03/23 14:26:45 INFO mapred.LocalJobRunner:
13/03/23 14:26:45 WARN conf.Configuration: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
13/03/23 14:26:45 INFO mapred.Task: Task:attempt_local780961261_0001_r_000000_0 is done. And is in the process of committing
13/03/23 14:26:45 INFO mapred.LocalJobRunner:
13/03/23 14:26:45 INFO mapred.Task: Task attempt_local780961261_0001_r_000000_0 is allowed to commit now
13/03/23 14:26:45 INFO output.FileOutputCommitter: Saved output of task 'attempt_local780961261_0001_r_000000_0' to file:/home/hadoop/hadoop-job/output/_temporary/0/task_local780961261_0001_r_000000
13/03/23 14:26:45 INFO mapred.LocalJobRunner: reduce > reduce
13/03/23 14:26:45 INFO mapred.Task: Task 'attempt_local780961261_0001_r_000000_0' done.
13/03/23 14:26:46 INFO mapreduce.Job:  map 100% reduce 100%
13/03/23 14:26:46 INFO mapreduce.Job: Job job_local780961261_0001 completed successfully
13/03/23 14:26:46 INFO mapreduce.Job: Counters: 27
        File System Counters
                FILE: Number of bytes read=815788
                FILE: Number of bytes written=1328068
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=2
                Map output records=9
                Map output bytes=54
                Map output materialized bytes=60
                Input split bytes=202
                Combine input records=9
                Combine output records=6
                Reduce input groups=3
                Reduce shuffle bytes=0
                Reduce input records=6
                Reduce output records=3
                Spilled Records=12
                Shuffled Maps =0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=209
                CPU time spent (ms)=0
                Physical memory (bytes) snapshot=0
                Virtual memory (bytes) snapshot=0
                Total committed heap usage (bytes)=437268480
        File Input Format Counters
                Bytes Read=18
        File Output Format Counters
                Bytes Written=24
$vim output/part-r-00000
a       3
b       2
c       4

擬似分散モードの環境整備

SSH Keyの準備

[hadoop@node1]
$cd ~
$ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_dsa): /home/hadoop/.ssh/id_dsa
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):  --- no passphrase ---
Enter same passphrase again: --- no passphrase ---
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
Your public key has been saved in /home/hadoop/.ssh/id_dsa.pub.
The key fingerprint is:
The key's randomart image is:
$mv .ssh/id_dsa.pub .ssh/authorized_keys
$chmod 600 .ssh/authorized_keys
$ssh localhost
$exit

設定

$vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export HADOOP_CLIENT_OPTS="-Xmx1024m $HADOOP_CLIENT_OPTS"
(128m -> 1024m)
export JAVA_HOME=/usr
$vim /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
    <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9010</value>
    </property>
 </configuration>
$cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
$vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9011</value>
  </property>
</configuration>
$vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

ファイルの準備

$hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

13/03/24 08:12:01 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ip-10-249-88-174.us-west-2.compute.internal/10.249.88.174
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.0.3-alpha
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/jsr305-1.3.9.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.4.0a.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.2.4.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.42.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kfs-0.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jline-0.9.94.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.5.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-el-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-httpclient-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.6.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.6.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.0.3-alpha-tests.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.4.0a.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-el-1.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.3-alpha-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/junit-4.8.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/hadoop-annotations-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.4.0a.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.2.4.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/avro-1.5.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/snappy-java-1.0.3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.0.3-alpha-tests.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-site-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.8.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.2.4.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.5.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.0.3-alpha.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.3-alpha-tests.jar
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.0.3-alpha/hadoop-common-project/hadoop-common -r 1443299; compiled by 'hortonmu' on Thu Feb  7 03:33:19 UTC 2013
STARTUP_MSG:   java = 1.6.0_27
************************************************************/
13/03/24 08:12:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-b82f0b89-aa2c-4f49-a08a-60e2f7e41ead
13/03/24 08:12:03 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
13/03/24 08:12:03 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
13/03/24 08:12:03 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
13/03/24 08:12:03 INFO blockmanagement.BlockManager: defaultReplication         = 1
13/03/24 08:12:03 INFO blockmanagement.BlockManager: maxReplication             = 512
13/03/24 08:12:03 INFO blockmanagement.BlockManager: minReplication             = 1
13/03/24 08:12:03 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
13/03/24 08:12:03 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
13/03/24 08:12:03 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
13/03/24 08:12:03 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
13/03/24 08:12:03 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
13/03/24 08:12:03 INFO namenode.FSNamesystem: supergroup          = supergroup
13/03/24 08:12:03 INFO namenode.FSNamesystem: isPermissionEnabled = true
13/03/24 08:12:03 INFO namenode.FSNamesystem: HA Enabled: false
13/03/24 08:12:03 INFO namenode.FSNamesystem: Append Enabled: true
13/03/24 08:12:03 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/03/24 08:12:03 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
13/03/24 08:12:03 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
13/03/24 08:12:03 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
13/03/24 08:12:04 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
13/03/24 08:12:04 INFO namenode.FSImage: Saving image file /tmp/hadoop-hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
13/03/24 08:12:04 INFO namenode.FSImage: Image file of size 121 saved in 0 seconds.
13/03/24 08:12:04 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/03/24 08:12:04 INFO util.ExitUtil: Exiting with status 0
13/03/24 08:12:04 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-10-249-88-174.us-west-2.compute.internal/10.249.88.174
************************************************************/
$ls /tmp/hadoop-hadoop/dfs/name/current/
fsimage_0000000000000000000      seen_txid
fsimage_0000000000000000000.md5  VERSION
$/usr/local/hadoop/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
13/03/24 08:15:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-ip-10-249-88-174.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-ip-10-249-88-174.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-ip-10-249-88-174.out
13/03/24 08:15:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-ip-10-249-88-174.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-ip-10-249-88-174.out
$jps
1659 DataNode
2201 NodeManager
1891 SecondaryNameNode
2231 Jps
2031 ResourceManager
1470 NameNode
$hadoop fs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2013-03-24 07:39 /tmp
$cd hadoop-job
$hadoop fs -put input input
$hadoop fs -ls input
Found 2 items
-rw-r--r--   1 hadoop supergroup          6 2013-03-24 07:40 /user/hadoop/input/a
-rw-r--r--   1 hadoop supergroup         12 2013-03-24 07:40 /user/hadoop/input/b

実行

$hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.3-alpha.jar wordcount input output
$hadoop fs -cat output/part-00000
a       3
b       2
c       4

※証明書キー作成方法

コマンド

// cc NewMaxTemperature Application to find the maximum temperature  in the weather dataset using the new context objects MapReduce API
import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

// vv NewMaxTemperature
public class NewMaxTemperature {

  static class NewMaxTemperatureMapper
    /*[*/extends Mapper<LongWritable, Text, Text, IntWritable>/*]*/ {

    private static final int MISSING = 9999;

    public void map(LongWritable key, Text value, /*[*/Context context/*]*/)
        throws IOException, /*[*/InterruptedException/*]*/ {

      String line = value.toString();
      String year = line.substring(15, 19);
      int airTemperature;
      if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
        airTemperature = Integer.parseInt(line.substring(88, 92));
      } else {
        airTemperature = Integer.parseInt(line.substring(87, 92));
      }
      String quality = line.substring(92, 93);
      if (airTemperature != MISSING && quality.matches("[01459]")) {
        /*[*/context.write/*]*/(new Text(year), new IntWritable(airTemperature));
      }
    }
  }

  static class NewMaxTemperatureReducer
    /*[*/extends Reducer<Text, IntWritable, Text, IntWritable>/*]*/ {

    public void reduce(Text key, /*[*/Iterable/*]*/<IntWritable> values,
        /*[*/Context context/*]*/)
        throws IOException, /*[*/InterruptedException/*]*/ {

      int maxValue = Integer.MIN_VALUE;
      for (IntWritable value : values) {
        maxValue = Math.max(maxValue, value.get());
      }
      /*[*/context.write/*]*/(key, new IntWritable(maxValue));
    }
  }

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.err.println("Usage: NewMaxTemperature <input path> <output path>");
      System.exit(-1);
    }

    /*[*/Job job = new Job();
    job.setJarByClass(NewMaxTemperature.class);/*]*/

   FileInputFormat.addInputPath(job, new Path(args[0]));
   FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(NewMaxTemperatureMapper.class);
    job.setReducerClass(NewMaxTemperatureReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    /*[*/System.exit(job.waitForCompletion(true) ? 0 : 1);/*]*/
  }
}
// ^^ NewMaxTemperature  
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
$ls sample/
NewMaxTemperature.java   sample.txt
$hadoop fs -put sample/sample.txt sample.txt
$javac -classpath /usr/local/hadoop/hadoop-core-1.1.2.jar sample/NewMaxTemperature.java
$jar -cvf sample/newmaxtemperature.jar sample
$hadoop jar sample/newmaxtemperature.jar sample.NewMaxTemperature sample.txt sample-out
$hadoop fs -cat sample-out/part-r-00000
1949    111
1950    22

プログラミングの最近記事

  1. How to bring columns not GROUP BY key from ne…

  2. HiveでGROUP BYを伴うサブクエリのネストからGROUP BYのキー以外のカラムを…

  3. How to drop Hive’s External Table

  4. HiveでExternal Tableを削除する方法メモ

  5. [触ってみた]Microsoft Quantum Development Kit

関連記事

PAGE TOP