Fluentd Twitter plugin didn’t work, so I tried to make a program which get tweets from twitter streaming API with ruby and insert them into mongoDB.
Japanese tweets gives me encode error with ruby, so instead of it I tried to get English tweets.
# -*- coding: utf-8 -*- require 'tweetstream' require 'mongo' # Connect Mongo connection = Mongo::Connection.new('localhost') db = connection.db('tweets') collection = db.collection('timeline_log') # Twitter Streaming API Setting TweetStream.configure do |config| config.consumer_key = '***' config.consumer_secret = '***' config.oauth_token = '***' config.oauth_token_secret = '***' config.auth_method = :oauth end # Insert Japanease tweets to mongodb client = TweetStream::Client.new client.sample do |status| if status.user.lang == 'en' && status.user.followers_count >;= 1000 name = status.user.name text = status.text r = /^[a-zA-Z0-9 ,.]*$/ if r =~ text && r =~ name doc = {'name' => name, 'tweet' => text, 'time' => Time.now} id = collection.insert(doc) end end end
Symbol is also wrong with encode, then I allowed alphabet and number.
Tweets are inserted to mongoDB like below.
{ "_id" : ObjectId("528d9aa7b7178f4c26088e74"), "name" : "ayy lmao", "tweet" : "IM SO TIRED OF THIS BULLSHIT ASS HISTORY CLASS UGHHHHHH", "time" : ISODate("2013-11-21T05:31:19.765Z") } { "_id" : ObjectId("528d9ab6b7178f4c26088e75"), "name" : "4pound", "tweet" : "I have this IDGAFness about me", "time" : ISODate("2013-11-21T05:31:34.740Z") } { "_id" : ObjectId("528d9ab6b7178f4c26088e76"), "name" : "khaki ramirez", "tweet" : "so please", "time" : ISODate("2013-11-21T05:31:34.741Z") } { "_id" : ObjectId("528d9ab6b7178f4c26088e77"), "name" : "", "tweet" : "RT @Dashokayyo: RT\"@MrSmoothNerd: I don't spit game at the ladies.\n\nI'm just myself with the ladies.\n\nAnd that's all the game\na real nigga …", "time" : ISODate("2013-11-21T05:31:34.757Z") } { "_id" : ObjectId("528d9ab6b7178f4c26088e78"), "name" : "ReinDeiR", "tweet" : "oH OHH\nOH \nOoOOOHH \nTHE SHOW WAS CALLED \"BETWEEN THE LIONS\"\nBC YOU /READ BETWEEEN THE LINES/\nOH", "time" : ISODate("2013-11-21T05:31:34.793Z") } { "_id" : ObjectId("528d9ac7b7178f4c26088e79"), "name" : "Cutter Smith", "tweet" : "turn down for what", "time" : ISODate("2013-11-21T05:31:51.722Z") } { "_id" : ObjectId("528d9ac8b7178f4c26088e7a"), "name" : "Lil Haiti ", "tweet" : "Dream big", "time" : ISODate("2013-11-21T05:31:52.728Z") } { "_id" : ObjectId("528d9acdb7178f4c26088e7b"), "name" : "Do It For Me", "tweet" : "i laugh a lot but i didn mean to hurt ya feelins.", "time" : ISODate("2013-11-21T05:31:57.751Z") }