Uncategorized
319 view

Collaborative Filtering by Mahout

Try to do collaborative Filtering by Mahout.
Mahout provides machine learning functions and we can use them without any knowledge about it.
In this article, hadoop is already installed.

Table of Contents

Environment

OS
- Linux version 2.6.32-279.el6.x86_64 (mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) ) #1 SMP Fri Jun 22 12:19:21 UTC 2012
Mahout
- mahout-0.5-cdh3u6
Hadoop
- hadoop-0.20.2-cdh3u5

Install Mahout

$ cd /usr/local
$ wget http://archive-primary.cloudera.com/cdh/3/mahout-0.5-cdh3u6.tar.gz
$ tar zxvf mahout-0.5-cdh3u6.tar.gz
$ ln -s mahout-0.5-cdh3u6 mahout
$ rm mahout-0.5-cdh3u6.tar.gz
$ vim ~/.bashrc
---
# JAVA
export JAVA_HOME=/usr/local/jdk1.7

# Hadoop/Mahout
export HADOOP_HOME=/usr/local/hadoop
export MAHOUT_HOME=/usr/local/mahout
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
---
$ source ~/.bashrc
$ /usr/local/mahout/bin/mahout

Collaborative Filterling

Run mahout command with options after preparing input data.

Input data is following.

# Input File Format User ID,Item Id,Rating
$ vim input.txt
1,100,5
1,200,3
1,300,3
2,100,5
2,200,5
2,400,3
2,900,4
3,200,4
3,300,5
3,700,4
4,200,5
4,400,4
4,600,5
4,900,5
5,100,5
5,200,3
5,400,4
5,500,6
5,700,5
6,100,3
6,200,4
6,400,2
6,700,3
7,100,4
7,700,5
7,800,5
7,900,5
$ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout
$ /usr/local/hadoop/bin/hadoop fs -mkdir /user/test/mahout/input
$ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout
$ /usr/local/hadoop/bin/hadoop fs -put input.txt /user/test/mahout/input
$ /usr/local/hadoop/bin/hadoop fs -ls /user/test/mahout/input
$ /usr/local/mahout/bin/mahout recommenditembased --help
$ /usr/local/mahout/bin/mahout recommenditembased --input /user/test/mahout/input --tempDir /user/test/mahout/temp --output /usr/test/mahout/output --similarityClassname SIMILARITY_PEARSON_CORRELATION

Running on hadoop, using HADOOP_HOME=xxx
HADOOP_CONF_DIR=xxx
MAHOUT-JOB: xxx
15/06/16 16:40:06 INFO common.AbstractJob: Command line arguments: {--booleanData=false, --endPhase=2147483647, --input=/user/test/tmp/mahout/input, --maxCooccurrencesPerItem=100, --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1, --numRecommendations=2, --output=/usr/test/tmp/mahout/output, --similarityClassname=SIMILARITY_PEARSON_CORRELATION, --startPhase=0, --tempDir=/user/test/tmp/mahout/temp}
15/06/16 16:40:08 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 4383157 for test on 100.67.80.17:58020
15/06/16 16:40:08 INFO security.TokenCache: Got dt for xxx
15/06/16 16:40:08 INFO input.FileInputFormat: Total input paths to process : 1
15/06/16 16:40:08 WARN snappy.LoadSnappy: Snappy native library is available
15/06/16 16:40:08 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/06/16 16:40:08 INFO snappy.LoadSnappy: Snappy native library loaded
15/06/16 16:40:08 INFO mapred.JobClient: Running job: job_201502041505_1343701
15/06/16 16:40:09 INFO mapred.JobClient:  map 0% reduce 0%
15/06/16 16:40:20 INFO mapred.JobClient:  map 100% reduce 0%
...

# Output format is User Id\t[Item Id:how good to recommend, ...]
$ /usr/local/hadoop/bin/hadoop fs -text /user/test/mahout/output/part-r-00000
2     [700:3.9172406]
4     [700:4.19286]
5     [900:4.375]
6     [900:2.375]
7     [400:4.6099067]

Result is following.

Author: zuqqhi2
Uncategorized

How to install R which is free ware for …Prev post

Browser Side Unit test by Mocha and Chai…Next post

Collaborative Filtering by Mahout

Environment

Install Mahout

Collaborative Filterling

Uncategorized recent post

Run Amazon FreeRTOS on M5Stack Core2 for AWS …

Udacity Self-Driving Car Engineer Nanodegree …

Install sbt 1.0.0 and run sample template

Visualization of Neural Network and its Train…

[Machine Learning]Created docker image includ…

関連記事

[Haskell] Make PGM File part3

[node.js]Insert restaurant inf…

[Javascript][node.js]Try to us…

Install zsh and Set to show ad…

[Scala]Data scaling with linea…

[Java]Install JDK 7