zuqqhi2のIT日記

プログラミング + アカデミック + 何か面白いこと

   Jun 17

Collaborative Filtering by Mahout

by zuqqhi2 at 2015年6月17日
Pocket

Try to do collaborative Filtering by Mahout.
Mahout provides machine learning functions and we can use them without any knowledge about it.
In this article, hadoop is already installed.

Environment

  • OS
    • Linux version 2.6.32-279.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Jun 22 12:19:21 UTC 2012
  • Mahout
    • mahout-0.5-cdh3u6
  • Hadoop
    • hadoop-0.20.2-cdh3u5

Install Mahout

$ cd /usr/local
$ wget http://archive-primary.cloudera.com/cdh/3/mahout-0.5-cdh3u6.tar.gz
$ tar zxvf mahout-0.5-cdh3u6.tar.gz
$ ln -s mahout-0.5-cdh3u6 mahout
$ rm mahout-0.5-cdh3u6.tar.gz
$ vim ~/.bashrc
---
# JAVA
export JAVA_HOME=/usr/local/jdk1.7

# Hadoop/Mahout
export HADOOP_HOME=/usr/local/hadoop
export MAHOUT_HOME=/usr/local/mahout
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
---
$ source ~/.bashrc
$ /usr/local/mahout/bin/mahout

Collaborative Filterling

Run mahout command with options after preparing input data.

Input data is following.
input table

# Input File Format User ID,Item Id,Rating
$ vim input.txt
1,100,5
1,200,3
1,300,3
2,100,5
2,200,5
2,400,3
2,900,4
3,200,4
3,300,5
3,700,4
4,200,5
4,400,4
4,600,5
4,900,5
5,100,5
5,200,3
5,400,4
5,500,6
5,700,5
6,100,3
6,200,4
6,400,2
6,700,3
7,100,4
7,700,5
7,800,5
7,900,5
$ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout
$ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout/input
$ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout
$ /usr/local/hadoop/bin/hadoop fs -put input.txt /user/test/mahout/input
$ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout/input
$ /usr/local/mahout/bin/mahout recommenditembased --help
$ /usr/local/mahout/bin/mahout recommenditembased --input /user/test/mahout/input --tempDir /user/test/mahout/temp --output /usr/test/mahout/output --similarityClassname SIMILARITY_PEARSON_CORRELATION

Running on hadoop, using HADOOP_HOME=xxx
HADOOP_CONF_DIR=xxx
MAHOUT-JOB: xxx
15/06/16 16:40:06 INFO common.AbstractJob: Command line arguments: {--booleanData=false, --endPhase=2147483647, --input=/user/test/tmp/mahout/input, --maxCooccurrencesPerItem=100, --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1, --numRecommendations=2, --output=/usr/test/tmp/mahout/output, --similarityClassname=SIMILARITY_PEARSON_CORRELATION, --startPhase=0, --tempDir=/user/test/tmp/mahout/temp}
15/06/16 16:40:08 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 4383157 for test on 100.67.80.17:58020
15/06/16 16:40:08 INFO security.TokenCache: Got dt for xxx
15/06/16 16:40:08 INFO input.FileInputFormat: Total input paths to process : 1
15/06/16 16:40:08 WARN snappy.LoadSnappy: Snappy native library is available
15/06/16 16:40:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/06/16 16:40:08 INFO snappy.LoadSnappy: Snappy native library loaded
15/06/16 16:40:08 INFO mapred.JobClient: Running job: job_201502041505_1343701
15/06/16 16:40:09 INFO mapred.JobClient:  map 0% reduce 0%
15/06/16 16:40:20 INFO mapred.JobClient:  map 100% reduce 0%
...


# Output format is User Id\t[Item Id:how good to recommend, ...]
$ /usr/local/hadoop/bin/hadoop fs -text /user/test/mahout/output/part-r-00000
2     [700:3.9172406]
4     [700:4.19286]
5     [900:4.375]
6     [900:2.375]
7     [400:4.6099067]

Result is following.
result table

Related Posts

Pocket

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.