{"id":197,"date":"2012-04-13T09:53:58","date_gmt":"2012-04-13T09:53:58","guid":{"rendered":"http:\/\/technicalelvis.com\/blog\/?p=197"},"modified":"2012-04-13T09:53:58","modified_gmt":"2012-04-13T09:53:58","slug":"twitter-mining-count-hashtags-per-day","status":"publish","type":"post","link":"https:\/\/technicalelvis.com\/blog\/2012\/04\/13\/twitter-mining-count-hashtags-per-day\/","title":{"rendered":"twitter mining: count hashtags per day"},"content":{"rendered":"<p>We can use <a href=\"http:\/\/wiki.apache.org\/couchdb\/Introduction_to_CouchDB_views\">CouchDB views<\/a> to count twitter hashtags per day. I've used two views. The first view uses a mapper to map hashtags to a [YEAR, MONTH, DAY] tuple. The view can subsequently be queried hash tags for that date.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport couchdb\r\nfrom couchdb.design import ViewDefinition\r\n\r\ndef time_hashtag_mapper(doc):\r\n    &quot;&quot;&quot;Hash tag by timestamp&quot;&quot;&quot;\r\n    from datetime import datetime\r\n    if doc.get('created_at'):\r\n        _date = doc&#x5B;'created_at']\r\n    else:\r\n        _date = 0 # Jan 1 1970\r\n\r\n    if doc.get('entities') and doc&#x5B;'entities'].get('hashtags'):\r\n        dt = datetime.fromtimestamp(_date).utctimetuple()\r\n        for hashtag in (doc&#x5B;'entities']&#x5B;'hashtags']):\r\n            yield(&#x5B;dt.tm_year, dt.tm_mon, dt.tm_mday], \r\n                   hashtag&#x5B;'text'].lower())\r\n\r\nview = ViewDefinition('index',\r\n                      'time_hashtags',\r\n                      time_hashtag_mapper,\r\n                      language='python')\r\nview.sync(db)\r\n<\/pre>\n<p>The second view maps each tweet to a tuple containing the [YEAR, MONTH, DAY, HASHTAG]. Then a reducer is used to count the tweets matching the tuple.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport couchdb\r\nfrom couchdb.design import ViewDefinition\r\n\r\ndef date_hashtag_mapper(doc):\r\n    &quot;&quot;&quot;tweet by date+hashtag&quot;&quot;&quot;\r\n    from datetime import datetime\r\n    if doc.get('created_at'):\r\n        _date = doc&#x5B;'created_at']\r\n    else:\r\n        _date = 0 # Jan 1 1970\r\n\r\n    dt = datetime.fromtimestamp(_date).utctimetuple()\r\n    if doc.get('entities') and doc&#x5B;'entities'].get('hashtags'):\r\n        for hashtag in (doc&#x5B;'entities']&#x5B;'hashtags']):\r\n            yield (&#x5B;dt.tm_year, dt.tm_mon, dt.tm_mday, \r\n                    hashtag&#x5B;'text'].lower()], \r\n                   doc&#x5B;'_id'])\r\n\r\ndef sumreducer(keys, values, rereduce):\r\n    &quot;&quot;&quot;count then sum&quot;&quot;&quot;\r\n    if rereduce:\r\n        return sum(values)\r\n    else:\r\n        return len(values)\r\n\r\nview = ViewDefinition('index',\r\n                      'daily_tagcount',\r\n                      date_hashtag_mapper,\r\n                      reduce_fun=sumreducer,\r\n                      language='python')\r\nview.sync(db)\r\n<\/pre>\n<p>Finally, query the first view to find tags for the day and then query the second view for tweet counts per tag for the day.<\/p>\n<pre class=\"brush: python; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\nimport sys\r\nimport couchdb\r\nimport time\r\nfrom datetime import date, datetime\r\n\r\nserver = couchdb.Server('http:\/\/localhost:5984')\r\ndbname = sys.argv&#x5B;1]\r\ndb = server&#x5B;dbname]\r\n\r\n_date  = sys.argv&#x5B;2]\r\ndt = datetime.strptime(_date,&quot;%Y-%m-%d&quot;).utctimetuple()\r\n\r\n# get tags for this time interval\r\n_key = &#x5B;dt.tm_year, dt.tm_mon, dt.tm_mday]\r\ntags = &#x5B;row.value for row in db.view('index\/time_hashtags', key=_key)]\r\ntags = list(set(tags))\r\nprint &quot;Tags today&quot;,len(tags)\r\nprint &quot;&quot;\r\n\r\n# get count for date and hashtag\r\nfor tag in sorted(tags):\r\n    _key = &#x5B;dt.tm_year, dt.tm_mon, dt.tm_mday, tag]\r\n    tag_count = \\\r\n      &#x5B; (row.value) for row in db.view('index\/daily_tagcount', key=_key) ]\r\n    print &quot;Found %d %s on %s-%s-%s &quot;%\\\r\n      (tag_count&#x5B;0],tag,_key&#x5B;0],_key&#x5B;1],_key&#x5B;2])\r\n<\/pre>\n<p>This code will evolve over time.<br \/>\nFind the complete codebase on github at: <a title=\"twitter_mining github\" href=\"https:\/\/github.com\/telvis07\/twitter_mining\">https:\/\/github.com\/telvis07\/twitter_mining<\/a>. The develop branch has the latest stuff.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We can use CouchDB views to count twitter hashtags per day. I&#8217;ve used two views. The first view uses a mapper to map hashtags to a [YEAR, MONTH, DAY] tuple. The view can subsequently be queried hash tags for that date. import couchdb from couchdb.design import ViewDefinition def time_hashtag_mapper(doc): &quot;&quot;&quot;Hash tag by timestamp&quot;&quot;&quot; from datetime &hellip; <a href=\"https:\/\/technicalelvis.com\/blog\/2012\/04\/13\/twitter-mining-count-hashtags-per-day\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">twitter mining: count hashtags per day<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[8,7,11,12],"tags":[],"class_list":["post-197","post","type-post","status-publish","format-standard","hentry","category-nosql","category-python","category-tech","category-twitter_mining"],"_links":{"self":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/197","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/comments?post=197"}],"version-history":[{"count":26,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/197\/revisions"}],"predecessor-version":[{"id":228,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/197\/revisions\/228"}],"wp:attachment":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/media?parent=197"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/categories?post=197"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/tags?post=197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}