{"id":126,"date":"2012-03-23T09:34:03","date_gmt":"2012-03-23T09:34:03","guid":{"rendered":"http:\/\/technicalelvis.com\/blog\/?p=126"},"modified":"2012-04-26T02:49:04","modified_gmt":"2012-04-26T02:49:04","slug":"twitter-mining-top-tweets-by-follower-count","status":"publish","type":"post","link":"https:\/\/technicalelvis.com\/blog\/2012\/03\/23\/twitter-mining-top-tweets-by-follower-count\/","title":{"rendered":"twitter mining: top tweets by follower count"},"content":{"rendered":"<p>We can find interesting tweets using the author's follower count and tweet timestamp. We store tweets using CouchDB and search for tweets using <a href=\"https:\/\/github.com\/tweepy\/tweepy\">tweepy streaming<\/a>. With these tools we can find the top N tweets per day. The code below uses the couchpy view server to write a view in python. The steps to setup couchpy are found <a href=\"http:\/\/packages.python.org\/CouchDB\/views.html\">here<\/a>. Basically, you add the following to \/etc\/couchdb\/local.ini and install couchpy.<\/p>\n<p>Install couchpy and couchdb-python with the following command.<\/p>\n<pre>\r\npip install couchdb\r\n<\/pre>\n<p>Test couchpy is installed.<\/p>\n<pre>\r\n$ which couchpy\r\n\/usr\/bin\/couchpy\r\n<\/pre>\n<p>Edit \/etc\/couchdb\/local.ini<\/p>\n<pre>\r\n[query_servers]\r\npython=\/usr\/bin\/couchpy\r\n<\/pre>\n<p>This a simple view mapper that maps each tweet to a timestamp so we can query by start and end time.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\n\r\nimport couchdb\r\nfrom couchdb.design import ViewDefinition\r\nimport sys\r\n\r\nserver = couchdb.Server('http:\/\/localhost:5984')\r\ndb = sys.argv&#x5B;1]\r\ndb = server&#x5B;db]\r\n\r\ndef tweets_by_created_at(doc):\r\n    if doc.get('created_at'):\r\n        _date = doc&#x5B;'created_at']\r\n    else:\r\n        _date = 0 # Jan 1 1970\r\n    \r\n    if doc.get('user'):\r\n        yield (_date, doc) \r\n        \r\nview = ViewDefinition('index', 'daily_tweets', tweets_by_created_at, language='python')\r\nview.sync(db)\r\n<\/pre>\n<p>The code below queries the view for all tweets within a date range. Then we sort in memory by the follower count. <\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport couchdb\r\nfrom datetime import datetime\r\n\r\ndef run(db, date, limit=10):\r\n    &quot;&quot;&quot;Query a couchdb view for tweets. Sort in memory by follower count.\r\n    Return the top 10 tweeters and their tweets&quot;&quot;&quot;\r\n    print &quot;Finding top %d tweeters&quot;%limit\r\n        \r\n    dt = datetime.strptime(date,&quot;%Y-%m-%d&quot;)\r\n    stime=int(time.mktime(dt.timetuple()))\r\n    etime=stime+86400-1\r\n    tweeters = {}\r\n    tweets = {}\r\n    for row in db.view('index\/daily_tweets', startkey=stime, endkey=etime):\r\n        status = row.value\r\n        screen_name = status&#x5B;'user']&#x5B;'screen_name']\r\n        followers_count = status&#x5B;'user']&#x5B;'followers_count']\r\n        tweeters&#x5B;screen_name] = int(followers_count)\r\n        if not tweets.has_key(screen_name):\r\n            tweets&#x5B;screen_name] = &#x5B;]\r\n        tweets&#x5B;screen_name].append(status&#x5B;'id_str'])\r\n        \r\n    # sort\r\n    di = tweeters.items() \r\n    di.sort(key=lambda x: x&#x5B;1], reverse=True)\r\n    out = {}\r\n    for i in range(limit):\r\n        screen_name = di&#x5B;i]&#x5B;0]\r\n        followers_count = di&#x5B;i]&#x5B;1]\r\n        out&#x5B;screen_name] = {}\r\n        out&#x5B;screen_name]&#x5B;'follower_count'] = followers_count\r\n        out&#x5B;screen_name]&#x5B;'tweets'] = {}\r\n        # print i,screen_name,followers_count\r\n        for tweetid in tweets&#x5B;screen_name]:\r\n            orig_text = db&#x5B;tweetid]&#x5B;'orig_text']\r\n            # print tweetid,orig_text\r\n            out&#x5B;screen_name]&#x5B;'tweets']&#x5B;tweetid] = orig_text\r\n\r\n    return out\r\n\r\nserver = couchdb.Server('http:\/\/localhost:5984')\r\ndb = server&#x5B;dbname]\r\ndate = '2012-03-05'\r\noutput = run(db, date)\r\n<\/pre>\n<p>Find the complete codebase on github at: <a title=\"twitter_mining github\" href=\"https:\/\/github.com\/telvis07\/twitter_mining\">https:\/\/github.com\/telvis07\/twitter_mining<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We can find interesting tweets using the author&#8217;s follower count and tweet timestamp. We store tweets using CouchDB and search for tweets using tweepy streaming. With these tools we can find the top N tweets per day. The code below uses the couchpy view server to write a view in python. The steps to setup &hellip; <a href=\"https:\/\/technicalelvis.com\/blog\/2012\/03\/23\/twitter-mining-top-tweets-by-follower-count\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">twitter mining: top tweets by follower count<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[7,11,12],"tags":[],"class_list":["post-126","post","type-post","status-publish","format-standard","hentry","category-python","category-tech","category-twitter_mining"],"_links":{"self":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/126","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/comments?post=126"}],"version-history":[{"count":25,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/126\/revisions"}],"predecessor-version":[{"id":236,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/126\/revisions\/236"}],"wp:attachment":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/media?parent=126"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/categories?post=126"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/tags?post=126"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}