Love never fails

May 18, 2012 Telvis Leave a comment

I hope everyone loves what they do everyday. If so, you can't lose.

1 Corinthians 13:8 (NIV) Love never fails.

Love covers

May 11, 2012 Telvis Leave a comment

I hope we believe the best in every person. "Above all, love each other deeply, because love covers over a multitude of sins. (1 Peter 4:8 NIV)"

best life now, faith, joel osteen

Help someone

May 4, 2012 Telvis Leave a comment

I hope we are always on the lookout to help. Somebody needs what we have to give.

"The King will reply, 'Truly I tell you, whatever you did for one of the least of these brothers and sisters of mine, you did for me.' (Matthew 25:40 NIV)"

best life now, faith, joel osteen

Be teachable

April 20, 2012 Telvis Leave a comment

I hope we never get too smart or too proud to be teachable.

"Yet you, Lord, are our Father.
We are the clay, you are the potter;
we are all the work of your hand. (Isaiah 64:8 NIV)"

best life now, faith, joel osteen

Keep your swagger up

April 13, 2012 Telvis Leave a comment

Don't let life steal your swagger! "Cast not away therefore your confidence, which hath great recompence of reward. (Hebrews 10:35 KJV)"

nosql, python, tech, twitter_mining

twitter mining: count hashtags per day

April 13, 2012 Telvis Leave a comment

We can use CouchDB views to count twitter hashtags per day. I've used two views. The first view uses a mapper to map hashtags to a [YEAR, MONTH, DAY] tuple. The view can subsequently be queried hash tags for that date.

import couchdb
from couchdb.design import ViewDefinition

def time_hashtag_mapper(doc):
    """Hash tag by timestamp"""
    from datetime import datetime
    if doc.get('created_at'):
        _date = doc['created_at']
    else:
        _date = 0 # Jan 1 1970

    if doc.get('entities') and doc['entities'].get('hashtags'):
        dt = datetime.fromtimestamp(_date).utctimetuple()
        for hashtag in (doc['entities']['hashtags']):
            yield([dt.tm_year, dt.tm_mon, dt.tm_mday], 
                   hashtag['text'].lower())

view = ViewDefinition('index',
                      'time_hashtags',
                      time_hashtag_mapper,
                      language='python')
view.sync(db)

The second view maps each tweet to a tuple containing the [YEAR, MONTH, DAY, HASHTAG]. Then a reducer is used to count the tweets matching the tuple.

import couchdb
from couchdb.design import ViewDefinition

def date_hashtag_mapper(doc):
    """tweet by date+hashtag"""
    from datetime import datetime
    if doc.get('created_at'):
        _date = doc['created_at']
    else:
        _date = 0 # Jan 1 1970

    dt = datetime.fromtimestamp(_date).utctimetuple()
    if doc.get('entities') and doc['entities'].get('hashtags'):
        for hashtag in (doc['entities']['hashtags']):
            yield ([dt.tm_year, dt.tm_mon, dt.tm_mday, 
                    hashtag['text'].lower()], 
                   doc['_id'])

def sumreducer(keys, values, rereduce):
    """count then sum"""
    if rereduce:
        return sum(values)
    else:
        return len(values)

view = ViewDefinition('index',
                      'daily_tagcount',
                      date_hashtag_mapper,
                      reduce_fun=sumreducer,
                      language='python')
view.sync(db)

Finally, query the first view to find tags for the day and then query the second view for tweet counts per tag for the day.

import sys
import couchdb
import time
from datetime import date, datetime

server = couchdb.Server('http://localhost:5984')
dbname = sys.argv[1]
db = server[dbname]

_date  = sys.argv[2]
dt = datetime.strptime(_date,"%Y-%m-%d").utctimetuple()

# get tags for this time interval
_key = [dt.tm_year, dt.tm_mon, dt.tm_mday]
tags = [row.value for row in db.view('index/time_hashtags', key=_key)]
tags = list(set(tags))
print "Tags today",len(tags)
print ""

# get count for date and hashtag
for tag in sorted(tags):
    _key = [dt.tm_year, dt.tm_mon, dt.tm_mday, tag]
    tag_count = \
      [ (row.value) for row in db.view('index/daily_tagcount', key=_key) ]
    print "Found %d %s on %s-%s-%s "%\
      (tag_count[0],tag,_key[0],_key[1],_key[2])

This code will evolve over time.
Find the complete codebase on github at: https://github.com/telvis07/twitter_mining. The develop branch has the latest stuff.

python, tech, twitter_mining

twitter mining by geolocation

April 6, 2012 Telvis Leave a comment

Twitter's streaming api permits filtering tweets by geolocation. According to the api documentation, only tweets that are created using the Geotagging API can be filtered. The code below uses tweepy to filter tweets for the San Francisco area.

#!/usr/bin/env python
import tweepy
import ConfigParser
import os, sys

class Listener(tweepy.StreamListener):
    def on_status(self, status):
        print "screen_name='%s' tweet='%s'"%(status.author.screen_name, status.text)

def login(config):
    """Tweepy oauth dance
    The config file should contain:

    [auth]
    CONSUMER_KEY = ...
    CONSUMER_SECRET = ...
    ACCESS_TOKEN = ...
    ACCESS_TOKEN_SECRET = ...
    """     
    CONSUMER_KEY = config.get('auth','CONSUMER_KEY')
    CONSUMER_SECRET = config.get('auth','CONSUMER_SECRET')
    ACCESS_TOKEN = config.get('auth','ACCESS_TOKEN')
    ACCESS_TOKEN_SECRET = config.get('auth','ACCESS_TOKEN_SECRET')
    
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
    return auth


fn=sys.argv[1]
config = ConfigParser.RawConfigParser()
config.read(fn)
try:
    auth = login(config)
    streaming_api = tweepy.streaming.Stream(auth, Listener(), timeout=60)
    # San Francisco area.
    streaming_api.filter(follow=None, locations=[-122.75,36.8,-121.75,37.8]) 
except KeyboardInterrupt:
    print "got keyboardinterrupt"

Find the complete codebase on github at: https://github.com/telvis07/twitter_mining

best life now, faith, joel osteen

Get up!

April 6, 2012 Telvis Leave a comment

If we are down and out for a long time, we must stop making excuses and "get up and walk".

"One who was there had been an invalid for thirty-eight years. When Jesus saw him lying there ... he asked him, "Do you want to get well?" Then Jesus said to him, "Get up! Pick up your mat and walk." (John 5:5, 6, 8 NIV)

best life now, faith, joel osteen

Stand still and stand firm

March 30, 2012 Telvis Leave a comment

If its been a rough week, I hope we find the courage to stand still and stand firm.

"You will not have to fight this battle. Take up your positions; stand firm and see the deliverance the Lord will give you, Judah and Jerusalem. Do not be afraid; do not be discouraged. Go out to face them tomorrow, and the Lord will be with you.' (2 Chronicles 20:17 NIV)"

python, tech, twitter_mining

twitter mining: top tweets with links

March 30, 2012 Telvis Leave a comment

It's useful to filter out "conversational" tweets and look for tweets with links to another page or picture, etc.

We create a view that only map tweets with link entities.

import couchdb
from couchdb.design import ViewDefinition
import sys

def url_tweets_by_created_at(doc):
    if doc.get('created_at'):
        _date = doc['created_at']
    else:
        _date = 0 # Jan 1 1970

    if doc.get('entities') and doc['entities'].get('urls') 
      and len(doc['entities']['urls']):
        if doc.get('user'):
            yield (_date, doc)

view = ViewDefinition('index', 'daily_url_tweets', 
                      url_tweets_by_created_at, language='python')
view.sync(db)

Next we create an app that reads from this view and displays the results.

import couchdb
from datetime import datetime

def run(db, date, limit=10):
    """Query a couchdb view for tweets. Sort in memory by follower count.
    Return the top 10 tweeters and their tweets"""
    print "Finding top %d tweeters"%limit

    dt = datetime.strptime(date,"%Y-%m-%d")
    stime=int(time.mktime(dt.timetuple()))
    etime=stime+86400-1
    tweeters = {}
    tweets = {}
    # get screen_name, follower_counts and tweet ids for looking up later
    for row in db.view('index/daily_url_tweets', startkey=stime, endkey=etime):
        status = row.value
        screen_name = status['user']['screen_name']
        followers_count = status['user']['followers_count']
        tweeters[screen_name] = int(followers_count)
        if not tweets.has_key(screen_name):
            tweets[screen_name] = []
        tweets[screen_name].append(status['id_str'])

    # sort
    print len(tweeters.keys())
    di = tweeters.items()
    di.sort(key=lambda x: x[1], reverse=True)
    out = {}
    for i in range(limit):
        screen_name = di[i][0]
        followers_count = di[i][1]
        out[screen_name] = {}
        out[screen_name]['follower_count'] = followers_count
        out[screen_name]['tweets'] = {}
        # print i,screen_name,followers_count
        for tweetid in tweets[screen_name]:
            status = db[tweetid]
            text = status['orig_text']
            # print tweetid,orig_text
            urls = status['entities']['urls']
            #name = status['user']['name']
            for url in urls:
                text = text.replace(url['url'],url['expanded_url'])
            out[screen_name]['tweets'][tweetid] = text

    return out

server = couchdb.Server('http://localhost:5984')
db = server[dbname]
date = '2012-03-05'
output = run(db, date)

Find the complete codebase on github at: https://github.com/telvis07/twitter_mining