{"id":589,"date":"2013-10-02T02:16:17","date_gmt":"2013-10-02T02:16:17","guid":{"rendered":"http:\/\/technicalelvis.com\/blog\/?p=589"},"modified":"2013-10-02T02:18:51","modified_gmt":"2013-10-02T02:18:51","slug":"json-output-mahout-clusterdump","status":"publish","type":"post","link":"https:\/\/technicalelvis.com\/blog\/2013\/10\/02\/json-output-mahout-clusterdump\/","title":{"rendered":"Added JSON output to mahout clusterdump"},"content":{"rendered":"<p><a href=\"http:\/\/technicalelvis.com\/blog\/2013\/03\/28\/mahout-twitter-1\/\">In a prior post<\/a>, I used\u00a0<a href=\"http:\/\/mahout.apache.org\/\">mahout<\/a> to cluster religious tweeters by bible books found in the tweets. The <em>clusterdump<\/em> utility prints the kmeans cluster output in free text format. \u00a0<a href=\"https:\/\/issues.apache.org\/jira\/browse\/MAHOUT-1343\">I've submitted a patch <\/a>to mahout\u00a0that adds JSON output format to <em>clusterdump<\/em>. JSON is machine readable and makes it easy for an application developed in another framework (like django) to read the clusters.<\/p>\n<p>The code lives in my <a href=\"https:\/\/github.com\/telvis07\/mahout\/tree\/mahout_clusterdump_json\">mahout fork on github<\/a>. Run the commands below to build it.<\/p>\n<pre class=\"brush: bash; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\ngit clone git@github.com:telvis07\/mahout.git\r\ncd mahout\r\nmvn compile package -DskipTests\r\n\r\n# to (optionally) run the unittest for this feature\r\nmvn -pl integration \\\r\n -Dtest=*.TestClusterDumper#testJsonClusterDumper test\r\n\r\n.\/bin\/mahout clusterdump -d dictionary -dt \\\r\n  text -i clusters\/clusters-*-final -p clusters\/clusteredPoints \\\r\n  -n 10 -o clusterdump.json -of JSON\r\n<\/pre>\n<p>The command produces output similar to this...<\/p>\n<pre class=\"brush: jscript; collapse: false; title: ; wrap-lines: false; notranslate\" title=\"\">\r\n{\r\n  &quot;top_terms&quot;: &#x5B;\r\n    {\r\n      &quot;term&quot;: &quot;proverbs&quot;,\r\n      &quot;weight&quot;: 0.19125590817015531\r\n    },\r\n    {\r\n      &quot;term&quot;: &quot;romans&quot;,\r\n      &quot;weight&quot;: 0.16306549628629305\r\n    }\r\n  ],\r\n  &quot;points&quot;: &#x5B;\r\n    {\r\n      &quot;vector_name&quot;: &quot;ssbo&quot;,\r\n      &quot;weight&quot;: &quot;1.0&quot;,\r\n      &quot;point&quot;: &quot;ssbo = &#x5B;proverbs:1.000]&quot;\r\n    },\r\n    {\r\n      &quot;vector_name&quot;: &quot;37_DC&quot;,\r\n      &quot;weight&quot;: &quot;1.0&quot;,\r\n      &quot;point&quot;: &quot;37_DC = &#x5B;proverbs:1.000]&quot;\r\n    },\r\n    {\r\n      &quot;vector_name&quot;: &quot;3HHHs&quot;,\r\n      &quot;weight&quot;: &quot;1.0&quot;,\r\n      &quot;point&quot;: &quot;3HHHs = &#x5B;proverbs:1.000]&quot;\r\n    },\r\n    {\r\n      &quot;vector_name&quot;: &quot;EPUBC&quot;,\r\n      &quot;weight&quot;: &quot;1.0&quot;,\r\n      &quot;point&quot;: &quot;EPUBC = &#x5B;proverbs:1.000]&quot;\r\n    },\r\n    {\r\n      &quot;vector_name&quot;: &quot;ILJ_4&quot;,\r\n      &quot;weight&quot;: &quot;1.0&quot;,\r\n      &quot;point&quot;: &quot;ILJ_4 = &#x5B;romans:1.000]&quot;\r\n    }\r\n  ],\r\n  &quot;cluster_id&quot;: 10515,\r\n  &quot;cluster&quot;: &quot;VL-10515{n=5924 c=&#x5B;genesis:0.000, exodus:0.009, ...]}&quot;\r\n}\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In a prior post, I used\u00a0mahout to cluster religious tweeters by bible books found in the tweets. The clusterdump utility prints the kmeans cluster output in free text format. \u00a0I&#8217;ve submitted a patch to mahout\u00a0that adds JSON output format to clusterdump. JSON is machine readable and makes it easy for an application developed in another &hellip; <a href=\"https:\/\/technicalelvis.com\/blog\/2013\/10\/02\/json-output-mahout-clusterdump\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Added JSON output to mahout clusterdump<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[12],"tags":[],"class_list":["post-589","post","type-post","status-publish","format-standard","hentry","category-twitter_mining"],"_links":{"self":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/589","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/comments?post=589"}],"version-history":[{"count":33,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/589\/revisions"}],"predecessor-version":[{"id":624,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/posts\/589\/revisions\/624"}],"wp:attachment":[{"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/media?parent=589"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/categories?post=589"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/technicalelvis.com\/blog\/wp-json\/wp\/v2\/tags?post=589"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}