{"id":153,"date":"2012-03-30T09:43:47","date_gmt":"2012-03-30T09:43:47","guid":{"rendered":"http:\/\/technicalelvis.com\/blog\/?p=153"},"modified":"2012-03-30T09:43:47","modified_gmt":"2012-03-30T09:43:47","slug":"twitter-mining-top-tweets-with-links","status":"publish","type":"post","link":"https:\/\/technicalelvis.com\/blog\/2012\/03\/30\/twitter-mining-top-tweets-with-links\/","title":{"rendered":"twitter mining: top tweets with links"},"content":{"rendered":"<p>It's useful to filter out \"conversational\" tweets and look for tweets with links to another page or picture, etc. <\/p>\n<p>We create a view that only map tweets with link entities.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport couchdb\r\nfrom couchdb.design import ViewDefinition\r\nimport sys\r\n\r\ndef url_tweets_by_created_at(doc):\r\n    if doc.get('created_at'):\r\n        _date = doc&#x5B;'created_at']\r\n    else:\r\n        _date = 0 # Jan 1 1970\r\n\r\n    if doc.get('entities') and doc&#x5B;'entities'].get('urls') \r\n      and len(doc&#x5B;'entities']&#x5B;'urls']):\r\n        if doc.get('user'):\r\n            yield (_date, doc)\r\n\r\nview = ViewDefinition('index', 'daily_url_tweets', \r\n                      url_tweets_by_created_at, language='python')\r\nview.sync(db)\r\n<\/pre>\n<p>Next we create an app that reads from this view and displays the results.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport couchdb\r\nfrom datetime import datetime\r\n\r\ndef run(db, date, limit=10):\r\n    &quot;&quot;&quot;Query a couchdb view for tweets. Sort in memory by follower count.\r\n    Return the top 10 tweeters and their tweets&quot;&quot;&quot;\r\n    print &quot;Finding top %d tweeters&quot;%limit\r\n\r\n    dt = datetime.strptime(date,&quot;%Y-%m-%d&quot;)\r\n    stime=int(time.mktime(dt.timetuple()))\r\n    etime=stime+86400-1\r\n    tweeters = {}\r\n    tweets = {}\r\n    # get screen_name, follower_counts and tweet ids for looking up later\r\n    for row in db.view('index\/daily_url_tweets', startkey=stime, endkey=etime):\r\n        status = row.value\r\n        screen_name = status&#x5B;'user']&#x5B;'screen_name']\r\n        followers_count = status&#x5B;'user']&#x5B;'followers_count']\r\n        tweeters&#x5B;screen_name] = int(followers_count)\r\n        if not tweets.has_key(screen_name):\r\n            tweets&#x5B;screen_name] = &#x5B;]\r\n        tweets&#x5B;screen_name].append(status&#x5B;'id_str'])\r\n\r\n    # sort\r\n    print len(tweeters.keys())\r\n    di = tweeters.items()\r\n    di.sort(key=lambda x: x&#x5B;1], reverse=True)\r\n    out = {}\r\n    for i in range(limit):\r\n        screen_name = di&#x5B;i]&#x5B;0]\r\n        followers_count = di&#x5B;i]&#x5B;1]\r\n        out&#x5B;screen_name] = {}\r\n        out&#x5B;screen_name]&#x5B;'follower_count'] = followers_count\r\n        out&#x5B;screen_name]&#x5B;'tweets'] = {}\r\n        # print i,screen_name,followers_count\r\n        for tweetid in tweets&#x5B;screen_name]:\r\n            status = db&#x5B;tweetid]\r\n            text = status&#x5B;'orig_text']\r\n            # print tweetid,orig_text\r\n            urls = status&#x5B;'entities']&#x5B;'urls']\r\n            #name = status&#x5B;'user']&#x5B;'name']\r\n            for url in urls:\r\n                text = text.replace(url&#x5B;'url'],url&#x5B;'expanded_url'])\r\n            out&#x5B;screen_name]&#x5B;'tweets']&#x5B;tweetid] = text\r\n\r\n    return out\r\n\r\nserver = couchdb.Server('http:\/\/localhost:5984')\r\ndb = server&#x5B;dbname]\r\ndate = '2012-03-05'\r\noutput = run(db, date)\r\n<\/pre>\n<p>Find the complete codebase on github at: <a title=\"twitter_mining github\" href=\"https:\/\/github.com\/telvis07\/twitter_mining\">https:\/\/github.com\/telvis07\/twitter_mining<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s useful to filter out &#8220;conversational&#8221; tweets and look for tweets with links to another page or picture, etc. We create a view that only map tweets with link entities. import couchdb from couchdb.design import ViewDefinition import sys def url_tweets_by_created_at(doc): if doc.get(&#8216;created_at&#8217;): _date = doc&#x5B;&#8217;created_at&#8217;] else: _date = 0 # Jan 1 1970 if doc.get(&#8216;entities&#8217;) &hellip; <a href=\"https:\/\/technicalelvis.com\/blog\/2012\/03\/30\/twitter-mining-top-tweets-with-links\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">twitter mining: top tweets with links<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[7,11,12],"tags":[],"class_list":["post-153","post","type-post","status-publish","format-standard","hentry","category-python","category-tech","category-twitter_mining"],"_links":{"self":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/153","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/comments?post=153"}],"version-history":[{"count":10,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/153\/revisions"}],"predecessor-version":[{"id":163,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/153\/revisions\/163"}],"wp:attachment":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/media?parent=153"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/categories?post=153"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/tags?post=153"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}