プログラミング + アカデミック + 何か面白いこと

  1. Programming
  2. 22 view

[Hadoop][Ruby]Hadoop Streaming Training2

Make user id for 30,000,000 records and output unique users.

cat uu_input/inputuser_id | head -30
65762
64935
89528
21825
82598
39593
35551
54719
22605
19995
26569
48185
13155
57038
20898
29540
10589
69593
90652
75378
49446
70353
24496
63605
95314
4112
86155
27084
55029
39381
・・・

Next is mapper.

ARGF.each do |user_id|
        user_id.chomp!
        puts "#{user_id}\t1"
end

Next it reducer.

pre_user_id = nil

ARGF.each do |log|
        log.chomp!
        user_id = log.split(/\t/)[0]
        if pre_user_id
                if pre_user_id == user_id
                else
                        puts "#{user_id}\t1"
                        pre_user_id = user_id
                end
        else
                puts "#{user_id}\t1"
                pre_user_id = user_id
        end
end

Check the scripts before using Hadoop.

$ cat inputuser_id | ruby uu_mapper.rb | ruby uu_reducer.rb

It seems to be OK.
Let’s run Hadoop.

$ hadoop fs -put keyword.log
$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-streaming-1.1.2.jar -D mapred.child.env='PATH=$PATH:/home/hadoop/.rvm/bin' -input input -output output -mapper 'ruby uu_mapper.rb' -reducer 'ruby uu_reducer.rb' -file uu_mapper.rb -file uu_reducer.rb
$ hadoop fs -cat output/part-00000
99836   1
99837   1
99838   1
99839   1
9984    1
99840   1
99841   1
99842   1
99843   1
99844   1
99845   1
99846   1
99847   1
99848   1
99849   1
9985    1
99850   1
99851   1
99852   1
99853   1
・・・

Good!

Programming recent post

  1. Install sbt 1.0.0 and run sample template

  2. [Machine Learning]Created docker image includ…

  3. [Node.js]How to write batch script with Node.…

  4. [Play][Scala]Develop Request Driven Batch Usi…

  5. [OpenCV][Ruby]Auto check web page design corr…

関連記事

PAGE TOP