Mahout provides machine learning functions and we can use them without any knowledge about it.
In this article, hadoop is already installed.
Environment
- OS
- Linux version 2.6.32-279.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Jun 22 12:19:21 UTC 2012
- Mahout
- mahout-0.5-cdh3u6
- Hadoop
- hadoop-0.20.2-cdh3u5
Install Mahout
$ cd /usr/local $ wget http://archive-primary.cloudera.com/cdh/3/mahout-0.5-cdh3u6.tar.gz $ tar zxvf mahout-0.5-cdh3u6.tar.gz $ ln -s mahout-0.5-cdh3u6 mahout $ rm mahout-0.5-cdh3u6.tar.gz $ vim ~/.bashrc --- # JAVA export JAVA_HOME=/usr/local/jdk1.7 # Hadoop/Mahout export HADOOP_HOME=/usr/local/hadoop export MAHOUT_HOME=/usr/local/mahout export HADOOP_CONF_DIR=$HADOOP_HOME/conf --- $ source ~/.bashrc $ /usr/local/mahout/bin/mahout
Collaborative Filterling
Run mahout command with options after preparing input data. Input data is following.Result is following.# Input File Format User ID,Item Id,Rating $ vim input.txt 1,100,5 1,200,3 1,300,3 2,100,5 2,200,5 2,400,3 2,900,4 3,200,4 3,300,5 3,700,4 4,200,5 4,400,4 4,600,5 4,900,5 5,100,5 5,200,3 5,400,4 5,500,6 5,700,5 6,100,3 6,200,4 6,400,2 6,700,3 7,100,4 7,700,5 7,800,5 7,900,5 $ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout $ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout/input $ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout $ /usr/local/hadoop/bin/hadoop fs -put input.txt /user/test/mahout/input $ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout/input $ /usr/local/mahout/bin/mahout recommenditembased --help $ /usr/local/mahout/bin/mahout recommenditembased --input /user/test/mahout/input --tempDir /user/test/mahout/temp --output /usr/test/mahout/output --similarityClassname SIMILARITY_PEARSON_CORRELATION Running on hadoop, using HADOOP_HOME=xxx HADOOP_CONF_DIR=xxx MAHOUT-JOB: xxx 15/06/16 16:40:06 INFO common.AbstractJob: Command line arguments: {--booleanData=false, --endPhase=2147483647, --input=/user/test/tmp/mahout/input, --maxCooccurrencesPerItem=100, --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1, --numRecommendations=2, --output=/usr/test/tmp/mahout/output, --similarityClassname=SIMILARITY_PEARSON_CORRELATION, --startPhase=0, --tempDir=/user/test/tmp/mahout/temp} 15/06/16 16:40:08 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 4383157 for test on 100.67.80.17:58020 15/06/16 16:40:08 INFO security.TokenCache: Got dt for xxx 15/06/16 16:40:08 INFO input.FileInputFormat: Total input paths to process : 1 15/06/16 16:40:08 WARN snappy.LoadSnappy: Snappy native library is available 15/06/16 16:40:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library 15/06/16 16:40:08 INFO snappy.LoadSnappy: Snappy native library loaded 15/06/16 16:40:08 INFO mapred.JobClient: Running job: job_201502041505_1343701 15/06/16 16:40:09 INFO mapred.JobClient: map 0% reduce 0% 15/06/16 16:40:20 INFO mapred.JobClient: map 100% reduce 0% ... # Output format is User Id\t[Item Id:how good to recommend, ...] $ /usr/local/hadoop/bin/hadoop fs -text /user/test/mahout/output/part-r-00000 2 [700:3.9172406] 4 [700:4.19286] 5 [900:4.375] 6 [900:2.375] 7 [400:4.6099067]