Tech Tips

  1. Uncategorized
  2. 250 view

Collaborative Filtering by Mahout

mahout
Try to do collaborative Filtering by Mahout.
Mahout provides machine learning functions and we can use them without any knowledge about it.
In this article, hadoop is already installed.

Environment

  • OS
    • Linux version 2.6.32-279.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Jun 22 12:19:21 UTC 2012
  • Mahout
    • mahout-0.5-cdh3u6
  • Hadoop
    • hadoop-0.20.2-cdh3u5

Install Mahout

$ cd /usr/local
$ wget http://archive-primary.cloudera.com/cdh/3/mahout-0.5-cdh3u6.tar.gz
$ tar zxvf mahout-0.5-cdh3u6.tar.gz
$ ln -s mahout-0.5-cdh3u6 mahout
$ rm mahout-0.5-cdh3u6.tar.gz
$ vim ~/.bashrc
---
# JAVA
export JAVA_HOME=/usr/local/jdk1.7

# Hadoop/Mahout
export HADOOP_HOME=/usr/local/hadoop
export MAHOUT_HOME=/usr/local/mahout
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
---
$ source ~/.bashrc
$ /usr/local/mahout/bin/mahout

Collaborative Filterling

Run mahout command with options after preparing input data.
input table
Input data is following.
# Input File Format User ID,Item Id,Rating
$ vim input.txt
1,100,5
1,200,3
1,300,3
2,100,5
2,200,5
2,400,3
2,900,4
3,200,4
3,300,5
3,700,4
4,200,5
4,400,4
4,600,5
4,900,5
5,100,5
5,200,3
5,400,4
5,500,6
5,700,5
6,100,3
6,200,4
6,400,2
6,700,3
7,100,4
7,700,5
7,800,5
7,900,5
$ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout
$ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout/input
$ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout
$ /usr/local/hadoop/bin/hadoop fs -put input.txt /user/test/mahout/input
$ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout/input
$ /usr/local/mahout/bin/mahout recommenditembased --help
$ /usr/local/mahout/bin/mahout recommenditembased --input /user/test/mahout/input --tempDir /user/test/mahout/temp --output /usr/test/mahout/output --similarityClassname SIMILARITY_PEARSON_CORRELATION

Running on hadoop, using HADOOP_HOME=xxx
HADOOP_CONF_DIR=xxx
MAHOUT-JOB: xxx
15/06/16 16:40:06 INFO common.AbstractJob: Command line arguments: {--booleanData=false, --endPhase=2147483647, --input=/user/test/tmp/mahout/input, --maxCooccurrencesPerItem=100, --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1, --numRecommendations=2, --output=/usr/test/tmp/mahout/output, --similarityClassname=SIMILARITY_PEARSON_CORRELATION, --startPhase=0, --tempDir=/user/test/tmp/mahout/temp}
15/06/16 16:40:08 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 4383157 for test on 100.67.80.17:58020
15/06/16 16:40:08 INFO security.TokenCache: Got dt for xxx
15/06/16 16:40:08 INFO input.FileInputFormat: Total input paths to process : 1
15/06/16 16:40:08 WARN snappy.LoadSnappy: Snappy native library is available
15/06/16 16:40:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/06/16 16:40:08 INFO snappy.LoadSnappy: Snappy native library loaded
15/06/16 16:40:08 INFO mapred.JobClient: Running job: job_201502041505_1343701
15/06/16 16:40:09 INFO mapred.JobClient:  map 0% reduce 0%
15/06/16 16:40:20 INFO mapred.JobClient:  map 100% reduce 0%
...

# Output format is User Id\t[Item Id:how good to recommend, ...]
$ /usr/local/hadoop/bin/hadoop fs -text /user/test/mahout/output/part-r-00000
2     [700:3.9172406]
4     [700:4.19286]
5     [900:4.375]
6     [900:2.375]
7     [400:4.6099067]
result table
Result is following.

Uncategorized recent post

  1. Run Amazon FreeRTOS on M5Stack Core2 for AWS …

  2. Udacity Self-Driving Car Engineer Nanodegree …

  3. Install sbt 1.0.0 and run sample template

  4. Visualization of Neural Network and its Train…

  5. [Machine Learning]Created docker image includ…

関連記事

PAGE TOP