Next is mapper.cat uu_input/inputuser_id | head -30 65762 64935 89528 21825 82598 39593 35551 54719 22605 19995 26569 48185 13155 57038 20898 29540 10589 69593 90652 75378 49446 70353 24496 63605 95314 4112 86155 27084 55029 39381 ・・・
ARGF.each do |user_id|
user_id.chomp!
puts "#{user_id}\t1"
end
Next it reducer.
pre_user_id = nil
ARGF.each do |log|
log.chomp!
user_id = log.split(/\t/)[0]
if pre_user_id
if pre_user_id == user_id
else
puts "#{user_id}\t1"
pre_user_id = user_id
end
else
puts "#{user_id}\t1"
pre_user_id = user_id
end
end
Check the scripts before using Hadoop.
It seems to be OK.$ cat inputuser_id | ruby uu_mapper.rb | ruby uu_reducer.rb
Let’s run Hadoop.
Good!$ hadoop fs -put keyword.log $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.1.2.jar -D mapred.child.env='PATH=$PATH:/home/hadoop/.rvm/bin' -input input -output output -mapper 'ruby uu_mapper.rb' -reducer 'ruby uu_reducer.rb' -file uu_mapper.rb -file uu_reducer.rb $ hadoop fs -cat output/part-00000 99836 1 99837 1 99838 1 99839 1 9984 1 99840 1 99841 1 99842 1 99843 1 99844 1 99845 1 99846 1 99847 1 99848 1 99849 1 9985 1 99850 1 99851 1 99852 1 99853 1 ・・・