Next is mapper.cat uu_input/inputuser_id | head -30 65762 64935 89528 21825 82598 39593 35551 54719 22605 19995 26569 48185 13155 57038 20898 29540 10589 69593 90652 75378 49446 70353 24496 63605 95314 4112 86155 27084 55029 39381 ・・・
Next it reducer.ARGF.each do |user_id| user_id.chomp! puts "#{user_id}\t1" end
Check the scripts before using Hadoop.pre_user_id = nil ARGF.each do |log| log.chomp! user_id = log.split(/\t/)[0] if pre_user_id if pre_user_id == user_id else puts "#{user_id}\t1" pre_user_id = user_id end else puts "#{user_id}\t1" pre_user_id = user_id end end
It seems to be OK.$ cat inputuser_id | ruby uu_mapper.rb | ruby uu_reducer.rb
Let’s run Hadoop.
Good!$ hadoop fs -put keyword.log $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.1.2.jar -D mapred.child.env='PATH=$PATH:/home/hadoop/.rvm/bin' -input input -output output -mapper 'ruby uu_mapper.rb' -reducer 'ruby uu_reducer.rb' -file uu_mapper.rb -file uu_reducer.rb $ hadoop fs -cat output/part-00000 99836 1 99837 1 99838 1 99839 1 9984 1 99840 1 99841 1 99842 1 99843 1 99844 1 99845 1 99846 1 99847 1 99848 1 99849 1 9985 1 99850 1 99851 1 99852 1 99853 1 ・・・