zuqqhi2のIT日記

プログラミング + アカデミック + 何か面白いこと

   Jun 17

Collaborative Filtering by Mahout

by zuqqhi2 at 2015年6月17日
Pocket

Try to do collaborative Filtering by Mahout.
Mahout provides machine learning functions and we can use them without any knowledge about it.
In this article, hadoop is already installed.

Environment

  • OS
    • Linux version 2.6.32-279.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Jun 22 12:19:21 UTC 2012
  • Mahout
    • mahout-0.5-cdh3u6
  • Hadoop
    • hadoop-0.20.2-cdh3u5

Install Mahout

$ cd /usr/local
$ wget http://archive-primary.cloudera.com/cdh/3/mahout-0.5-cdh3u6.tar.gz
$ tar zxvf mahout-0.5-cdh3u6.tar.gz
$ ln -s mahout-0.5-cdh3u6 mahout
$ rm mahout-0.5-cdh3u6.tar.gz
$ vim ~/.bashrc
---
# JAVA
export JAVA_HOME=/usr/local/jdk1.7

# Hadoop/Mahout
export HADOOP_HOME=/usr/local/hadoop
export MAHOUT_HOME=/usr/local/mahout
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
---
$ source ~/.bashrc
$ /usr/local/mahout/bin/mahout

Collaborative Filterling

Run mahout command with options after preparing input data.

Input data is following.
input table

# Input File Format User ID,Item Id,Rating
$ vim input.txt
1,100,5
1,200,3
1,300,3
2,100,5
2,200,5
2,400,3
2,900,4
3,200,4
3,300,5
3,700,4
4,200,5
4,400,4
4,600,5
4,900,5
5,100,5
5,200,3
5,400,4
5,500,6
5,700,5
6,100,3
6,200,4
6,400,2
6,700,3
7,100,4
7,700,5
7,800,5
7,900,5
$ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout
$ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout/input
$ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout
$ /usr/local/hadoop/bin/hadoop fs -put input.txt /user/test/mahout/input
$ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout/input
$ /usr/local/mahout/bin/mahout recommenditembased --help
$ /usr/local/mahout/bin/mahout recommenditembased --input /user/test/mahout/input --tempDir /user/test/mahout/temp --output /usr/test/mahout/output --similarityClassname SIMILARITY_PEARSON_CORRELATION

Running on hadoop, using HADOOP_HOME=xxx
HADOOP_CONF_DIR=xxx
MAHOUT-JOB: xxx
15/06/16 16:40:06 INFO common.AbstractJob: Command line arguments: {--booleanData=false, --endPhase=2147483647, --input=/user/test/tmp/mahout/input, --maxCooccurrencesPerItem=100, --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1, --numRecommendations=2, --output=/usr/test/tmp/mahout/output, --similarityClassname=SIMILARITY_PEARSON_CORRELATION, --startPhase=0, --tempDir=/user/test/tmp/mahout/temp}
15/06/16 16:40:08 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 4383157 for test on 100.67.80.17:58020
15/06/16 16:40:08 INFO security.TokenCache: Got dt for xxx
15/06/16 16:40:08 INFO input.FileInputFormat: Total input paths to process : 1
15/06/16 16:40:08 WARN snappy.LoadSnappy: Snappy native library is available
15/06/16 16:40:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/06/16 16:40:08 INFO snappy.LoadSnappy: Snappy native library loaded
15/06/16 16:40:08 INFO mapred.JobClient: Running job: job_201502041505_1343701
15/06/16 16:40:09 INFO mapred.JobClient:  map 0% reduce 0%
15/06/16 16:40:20 INFO mapred.JobClient:  map 100% reduce 0%
...


# Output format is User Id\t[Item Id:how good to recommend, ...]
$ /usr/local/hadoop/bin/hadoop fs -text /user/test/mahout/output/part-r-00000
2     [700:3.9172406]
4     [700:4.19286]
5     [900:4.375]
6     [900:2.375]
7     [400:4.6099067]

Result is following.
result table

Related Posts

  • mahout2015年3月21日 Memo for Installing Mahout on Ubuntu 12.04 LTS コマンドのみ Only Command
  • 2015年3月26日 Collaborative Filtering 協調フィルタリング とは 協調フィルタリング […]
  • 2015年3月29日 Install Apache Hadoop Apache Hadoop […]
  • [Machine Learning]Created docker image including python ML libraries2017年1月8日 [Machine Learning]Created docker image including python ML libraries [PCL][Python][CPP]Install Python PCL(Point Cloud Library) And Run Sample Program 概要 PCL(点群処理ライブラリ)のが出たということで触って見た。 まだ python pcl でできる部分は少ないみたい。 Env Linux ubuntu 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 […]
  • <!--:ja-->[node.js][express]CPUとヒープのスナップショットを撮る<!--:--><!--:en-->[node.js][exprees]How to take snapshot of CPU and Heap<!--:-->2014年8月31日 [node.js][exprees]How to take snapshot of CPU and Heap 概要 node.jsでアプリを作成しているとメモリリーク(やCPUの負荷)に悩まされることが多い。 多くはスナップショットを撮ってボトルネックを見つけて改善することできたから、 ここではその方法を載せる。 スナップショットの結果はライブラリの関係上Chromeブラウザの […]
Pocket

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.