Dustin Davis
Developer, Entrepreneur

Lately I was getting this error frequently as I was using Django’s built in cache_page decorator to cache some views.

memcache in check_key
MemcachedKeyLengthError: Key length is > 250

Basically the problem is that Memcached only allows a 250 char key and some of my view names were pretty long and so it was creating keys greater than 250 chars.

I found a quick fix to hash the key with an md5 hash if the key is going to be over 250 characters. You can modify the function that creates the key.

In my settings file I added the following:

import hashlib

...

def hash_key(key, key_prefix, version):
    new_key = ':'.join([key_prefix, str(version), key])
    if len(new_key) > 250:
        m = hashlib.md5()
        m.update(new_key)
        new_key = m.hexdigest()
    return new_key

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
        'KEY_FUNCTION': hash_key,
    }
}

The reason why I only hash if the key is going to be over 250 characters is because 1) hashing is CPU intensive and I only want to do it when I have to; 2) I prefer to have my memcached keys human readable when possible; 3) less likely to have collision problems with duplicate hashes.

I thank Russell Keith-Magee for these tips.

Even after coding in Python for the past five years I’ve never really considered myself an expert in the language because I find the more I know, the more I know I don’t know. I generally keep my code simple on purpose until I have a good reason to be complex – which for most django sites, I haven’t had a good reason to be complex.

Today I had good reason. I’m currently building a number of key performance indicator (KPI) stats for Neutron. There are currently 46 different stats that I need to calculate for 5 different time periods.

For each state I need:

  • Stats for start of current day to current time with a comparison to yesterday start of day to the current time.
  • This week compared to last week delta
  • This month compared to last month delta
  • This quarter compare to last quarter delta
  • This year compared to last year delta

I will be building a view for each stat and associated time period to return these values in JSON format. So as it stand there will be 230 views. I needed to come up with something clever to save myself some lines of code. I opted for class based views.

First I built a base class that will return the JSON data in a consistent format:

class StatWithDelta(BaseDetailView):
    start = None
    end = None
    delta_start = None
    delta_end = None
    title = None
    subtitle = None

    def __init__(self):
        super(StatWithDelta, self).__init__()
        self.end = djtz.localtime(djtz.now())

    def value(self):
        raise NotImplementedError

    def delta(self):
        raise NotImplementedError

    def get(self, request, *args, **kwargs):
        value = self.value()
        delta_value = self.delta()
        try:
            delta_percent = round((((delta_value - value) / value) * 100), 2)
        except ZeroDivisionError:
            delta_percent = 0
        payload = {
            'value': value,
            'delta': delta_percent,
            'title': self.title,
            'subtitle': self.subtitle,
        }
        return self.render_to_response(payload)

    def render_to_response(self, context):
        return self.get_json_response(self.convert_context_to_json(context))

    def get_json_response(self, content, **httpresponse_kwargs):
        return http.HttpResponse(content,
                                 content_type='application/json',
                                 **httpresponse_kwargs)

    def convert_context_to_json(self, context):
        return json.dumps(context)

Next I built classes for each required time range. Here is my class for today compared to yesterday:

class TodayYesterday(StatWithDelta):
    subtitle = 'Today vs. Yesterday'

    def __init__(self):
        super(TodayYesterday, self).__init__()
        self.start = self.end.replace(hour=0, minute=0, second=0, microsecond=0)
        self.delta_start = self.start - datetime.timedelta(days=1)
        self.delta_end = self.end - datetime.timedelta(days=1)

Now for each stat I create a class that gets the main value and its delta value. Here is one example:

class GrossMarginPercent(StatWithDelta):
    title = 'Gross Margin Percent'

    def value(self):
        return functions.gross_margin_percent_within(self.start, self.end)

    def delta(self):
        return functions.gross_margin_percent_within(
            self.delta_start, self.delta_end)

I thought this was clever, but then I found myself writing a lot of similar code. I would create a class based view for each stat class and time period, then an associated url mapping. So for the stat class above I would have these five classes:

class GrossMarginPercentDay(GrossMarginPercent, TodayYesterday):
    pass


class GrossMarginPercentWeek(GrossMarginPercent, ThisWeekLastWeek):
    pass


class GrossMarginPercentMonth(GrossMarginPercent, ThisMonthLastMonth):
    pass


class GrossMarginPercentQuarter(GrossMarginPercent, ThisQuarterLastQuarter):
    pass


class GrossMarginPercentYear(GrossMarginPercent, ThisYearLastYear):
    pass

… and these urls:

    url(r'^edu/gmp-dtd/$', GrossMarginPercentDay.as_view()),
    url(r'^edu/gmp-wtd/$', GrossMarginPercentWeek.as_view()),
    url(r'^edu/gmp-mtd/$', GrossMarginPercentMonth.as_view()),
    url(r'^edu/gmp-qtd/$', GrossMarginPercentQuarter.as_view()),
    url(r'^edu/gmp-ytd/$', GrossMarginPercentYear.as_view()),

You can see the lines of code adding up. I was going to add 230+ lines of code to my urls.py file and 4600 lines of code to my views.py file (20 * 230) following PEP8 guidelines.

So I decided to use one url pattern to send to one view function to dynamically create each of the stat-period classes. Here is my new url pattern:

    url(r'^(?P<category>[\w\-]+)/(?P<period>day|week|month|quarter|year)/'
        r'(?P<base_class_name>\w+)/$', 'magic_view'),

And here is my “magic_view” function that where the *magic* happens:

def magic_view(request, category, period, base_class_name):
    """
    Builds a dynamic class subclassing the base class name passed in and a time 
    period class. It will return its as_view() method.

    URL structure: /category/period/KPI_Class/

    category: KPI category (edu, conversion, etc.) not really used at this point
    period: day, week, month, quarter, year
    KPI Class: One of the class names in this file
    """
    class_name = '{}{}'.format(base_class_name, period.capitalize())
    _module = sys.modules[__name__]
    base_cls = getattr(_module, base_class_name)
    if period == 'day':
        period_name = 'TodayYesterday'
    else:
        period_name = 'This{0}Last{0}'.format(period.capitalize())
    period_cls = getattr(_module, period_name)
    
    # Create a dynamic class based on the base class and time period class
    cls = type(class_name, (base_cls, period_cls), dict())
    return cls.as_view()(request)

So if you include all the comments lines to explain why I did, I’m only using 25 lines of code to save 4830 lines. That’s a lot of typing. Python, my fingers thank you!

A friend pointed me to this simple yet humorous website yesterday which essentially gives a new lazy coder excuse whenever the page is refreshed.

I couldn’t help but whip out a bot to plug in to our IRC channel. My lazy coder bot will give a random excuse whenever someone mentions the word “why”.

I used my Rollbot script as a base to write this up quickly.

requirements.txt

Twisted==13.1.0
beautifulsoup4==4.2.1
requests==1.2.3

because.py

from bs4 import BeautifulSoup
import requests
from twisted.words.protocols import irc
from twisted.internet import protocol, reactor

NICK = '_lazy_coder_'
CHANNEL = '#yourchannel'
PASSWORD = 'channel_password'

class MyBot(irc.IRCClient):
    def _get_nickname(self):
        return self.factory.nickname
    nickname = property(_get_nickname)

    def signedOn(self):
        self.join(self.factory.channel)
        print "Signed on as {}.".format(self.nickname)

    def joined(self, channel):
        print "Joined %s." % channel

    def privmsg(self, user, channel, msg):
        """
        Whenever someone says "why" give a lazy programmer response
        """
        if 'why' in msg.lower():
            # get lazy response
            because = self._get_because()

            # post message
            self.msg(CHANNEL, because)

    def _get_because(self):
        req = requests.get('http://developerexcuses.com/')
        soup = BeautifulSoup(req.text)
        elem = soup.find('a')
        return elem.text.encode('ascii', 'ignore')

class MyBotFactory(protocol.ClientFactory):
    protocol = MyBot

    def __init__(self, channel, nickname=NICK):
        self.channel = channel
        self.nickname = nickname

    def clientConnectionLost(self, connector, reason):
        print "Lost connection (%s), reconnecting." % reason
        connector.connect()

    def clientConnectionFailed(self, connector, reason):
        print "Could not connect: %s" % reason

if __name__ == "__main__":
    channel = CHANNEL
    if PASSWORD:
        channel += ' {}'.format(PASSWORD)
    reactor.connectTCP('irc.freenode.net', 6667, MyBotFactory(channel))
    reactor.run()

*UPDATE: I’ve made some minor modifications and posted the project on Github

I’m not afraid to admit, I’m a visual guy. I like GUI interfaces. Sequel Pro makes it very easy to SSH tunnel into a server and connect to MySQL, but there is nothing I have found built into pgAdmin3 to use SSH tunneling for connections.

Luckily I found it is simple enough to do.

First, open an ssh tunnel:

ssh -fNg -L 5555:localhost:5432 {your_username}@{yourdomain.com}

This open an SSH connection in the background mapping your local port 5555 to your server’s port 5432 (Postgres’ default port). Type “man ssh” to see what each of these flags is specifically doing.

Now, create a new connection in pgAdmin using localhost as your host and port 5555.

New pgAdmin Connection

Have you ever wanted to give your model some month choices relating to integers 1-12. I would guess it’s pretty common – common enough to be included in django.contrib. Well, it is. Here is a quick tip on how to include it in a model:

from django.db import models
from django.utils.dates import MONTHS


class RevenueGoal(models.Model):
    month = models.PositiveSmallIntegerField(choices=MONTHS.items())
    year = models.PositiveIntegerField()
    goal = models.DecimalField('Revenue Goal', max_digits=8, decimal_places=2)

Disclaimer: I am not a sysadmin. I’m just a developer. I welcome and encourage comments to improve this process!

I have set up a couple of Django servers lately and taken copious notes that I have extracted from various sources. Below are the commands I issue to a fresh Ubuntu server install to get Django up and running. This puts everything on one server (PostgreSQL, Celery, RabbitMQ, etc) so it’s nice for a small starter project but don’t expect it to scale.

Log in as root and add a non-root user. Add the user to the sudoers group. Log out and log back in as ‘username’.

adduser username
adduser username sudo
exit

Update the local package index. Upgrade all the packages that can be upgraded. Remove packages that are no longer needed and then reboot for good measure.

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get autoremove
sudo reboot

Install libraries for Python, PIP, PIL/Pillow, PostgreSQL, libevent for gevent, memcached server and library, RabbitMQ, git, nginx, & supervisor

sudo apt-get install build-essential python-dev python-pip libjpeg8-dev libfreetype6-dev zlib1g-dev postgresql postgresql-contrib libpq-dev libevent-dev memcached libmemcached-dev rabbitmq-server git nginx supervisor

Install virtualenv and virtualenvwrapper. To enable it, we need to add a line to our .bashrc file and log out and back in.

sudo pip install virtualenv virtualenvwrapper
echo "" >> .bashrc
echo "source /usr/local/bin/virtualenvwrapper.sh" >> .bashrc
source .bashrc

Make a virtualenv

mkvirtualenv project_env

Install postgres adminpack

sudo -u postgres psql
CREATE EXTENSION "adminpack";
\q

Change postgres password & create database

sudo passwd postgres
sudo su - postgres
psql -d template1 -c "ALTER USER postgres WITH PASSWORD 'changeme';"
createdb projectdb
createuser username --pwprompt
psql -d template1 -U postgres
GRANT ALL PRIVILEGES ON DATABASE projectdb to username;
\q
exit

Install RabbitMQ

sudo rabbitmqctl add_user username username_pw
sudo rabbitmqctl add_vhost username_vhost
sudo rabbitmqctl set_permissions -p username_vhost username ".*" ".*" ".*"
sudo rabbitmqctl clear_permissions -p username_vhost guest

Generate ssh key to upload to Github, Bitbucket, or wherever you host your code.

ssh-keygen -t rsa -C you@sample.com
cat ~/.ssh/id_rsa.pub

Create some /var/www dirs & set permissions on these directories.

sudo mkdir -p /var/www/static
sudo mkdir /var/www/media
sudo chown -R username:www-data /var/www

Clone your repository to your home directory and install the packages in your requirements file.

git clone git@bitbucket.org:yourusername/project.git
cd project/requirements
pip install -r prod.txt

Remove the default symbolic link for Nginx. Create a new blank config, and make a symlink to it. Edit the new configuration file.

sudo rm /etc/nginx/sites-enabled/default
sudo touch /etc/nginx/sites-available/project
cd /etc/nginx/sites-enabled
sudo ln -s ../sites-available/project
sudo vim /etc/nginx/sites-available/project

Add the following content to nginx config:

# define an upstream server named gunicorn on localhost port 8000
upstream gunicorn {
    server localhost:8000;
}

# make an nginx server
server {
    # listen on port 80
    listen 80;

    # for requests to these domains
    server_name yourdomain.com www.yourdomain.com;

    # look in this directory for files to serve
    root /var/www/;

    # keep logs in these files
    access_log /var/log/nginx/project.access.log;
    error_log /var/log/nginx/project.error.log;

    # You need this to allow users to upload large files
    # See http://wiki.nginx.org/HttpCoreModule#client_max_body_size
    # I'm not sure where it goes, so I put it in twice. It works.
    client_max_body_size 0;

    # THIS IS THE IMPORTANT LINE
    # this tries to serve a static file at the requested url
    # if no static file is found, it passes the url to gunicorn
    try_files $uri @gunicorn;

    # define rules for gunicorn
    location @gunicorn {
        # repeated just in case
        client_max_body_size 0;

        # proxy to the gunicorn upstream defined above
        proxy_pass http://gunicorn;

        # makes sure the URLs don't actually say http://gunicorn
        proxy_redirect off;

        # If gunicorn takes > 5 minutes to respond, give up
        # Feel free to change the time on this
        proxy_read_timeout 5m;

        # make sure these HTTP headers are set properly
        proxy_set_header Host            $host;
        proxy_set_header X-Real-IP       $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
server {
    listen  443 ssl;

    # start mine
    ssl on;
    ssl_certificate /etc/ssl/localcerts/yourdomain_com.crt;
    ssl_certificate_key /etc/ssl/localcerts/yourdomain.com.key;
    ssl_protocols        SSLv3 TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers          HIGH:!aNULL:!MD5:!kEDH;
	server_name  yourdomain.com www.yourdomain.com;

    # look in this directory for files to serve
    root /var/www/;

    # keep logs in these files
    access_log /var/log/nginx/project.access.log;
    error_log /var/log/nginx/project.error.log;

    # You need this to allow users to upload large files
    # See http://wiki.nginx.org/HttpCoreModule#client_max_body_size
    # I'm not sure where it goes, so I put it in twice. It works.
    client_max_body_size 0;

    # THIS IS THE IMPORTANT LINE
    # this tries to serve a static file at the requested url
    # if no static file is found, it passes the url to gunicorn
    try_files $uri @gunicorn;

    # define rules for gunicorn
    location @gunicorn {
        # repeated just in case
        client_max_body_size 0;

        # proxy to the gunicorn upstream defined above
        proxy_pass http://gunicorn;

        # makes sure the URLs don't actually say http://gunicorn
        proxy_redirect off;

        # If gunicorn takes > 5 minutes to respond, give up
        # Feel free to change the time on this
        proxy_read_timeout 5m;

        # make sure these HTTP headers are set properly
        proxy_set_header Host            $host;
        proxy_set_header X-Real-IP       $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Restart nginx

sudo service nginx restart

Set up database

cd /home/username/project
python manage.py syncdb --settings=project.settings.prod
python manage.py migrate --settings=project.settings.prod

Run collectstatic command

python manage.py collectstatic -l --noinput --settings=project.settings.prod
sudo /etc/init.d/nginx restart

Configure supervisor

Add the following contents to /etc/supervisor/conf.d/celeryd.conf

sudo vim /etc/supervisor/conf.d/celeryd.conf

Contents:

# the name of this service as far as supervisor is concerned
[program:celeryd]

# the command to start celery
command = /home/username/.virtualenvs/project_env/bin/python /home/username/project/manage.py celeryd -B -E --settings=project.settings.prod

# the directory to be in while running this
directory = /home/username/project

# the user to run this service as
user = username

# start this at boot, and restart it if it fails
autostart = true
autorestart = true

# take stdout and stderr of celery and write to these log files
stdout_logfile = /var/log/supervisor/celeryd.log
stderr_logfile = /var/log/supervisor/celeryd_err.log

Now we will create CeleryCam in /etc/supervisor/conf.d/celerycam.conf

sudo vim /etc/supervisor/conf.d/celerycam.conf

Contents:

[program:celerycam]
command = /home/username/.virtualenvs/project_env/bin/python /home/username/project/manage.py celerycam --settings=project.settings.prod
directory = /home/username/project
user = username
autostart = true
autorestart = true
stdout_logfile = /var/log/supervisor/celerycam.log
stderr_logfile = /var/log/supervisor/celerycam_err.log

Create Gunicorn script in /etc/supervisor/conf.d/gunicorn.conf

sudo vim /etc/supervisor/conf.d/gunicorn.conf

Contents:

[program:gunicorn]
command = /home/username/.virtualenvs/project_env/bin/python /home/username/project/manage.py run_gunicorn -w 4 -k gevent --settings=project.settings.prod
directory = /home/username/project
user = username
autostart = true
autorestart = true
stdout_logfile = /var/log/supervisor/gunicorn.log
stderr_logfile = /var/log/supervisor/gunicorn_err.log

Restart supervisor

sudo service supervisor restart

Restart/stop/start all services managed by supervisor

sudo supervisorctl restart all
sudo supervisorctl stop all
sudo supervisorctl start all

Or restart just celeryd

sudo supervisorctl restart celeryd

Or, start just gunicorn

sudo supervisorctl start gunicorn

Reboot and make sure everything starts up

sudo reboot

Bonus: set up ssl

sudo mkdir /etc/ssl/localcerts
cd /etc/ssl/localcerts
sudo openssl req -new -nodes -days 365 -keyout yourdomain.com.key -out yourdomain.com.csr
sudo chmod 400 /etc/ssl/localcerts/yourdomain.com.key
sudo chmod 400 /etc/ssl/localcerts/yourdomain.com.crt

I have been tasked with updating our real-time revenue stats at Neutron. After spending about a week going though and updating our PHP scripts I finally decided it would be worth my time and sanity to start from scratch with Python. I’m building a Django application that will store revenue stats from different sources, which I can then use to build views and an API for stat tools.

So for the past few days I’ve been writing scripts that log in to other websites and scrape data, or accessing the site’s API’s if they have one. I’ve learned a few things.

  1. requests > httplib2
  2. SOAP is the suck, but at least its an API. Suds makes SOAP suck less. I get it that SOAP is basically all .net developers know as far as APIs. ;)
  3. Beautiful Soup is a nice last resort.
  4. I’ve actually surprised how many businesses can survive on such crappy technology.

I saved Google Adsense for last figuring they would have the best API and it would therefore be the easiest to implement. It turned out more challenging than I anticipated. Apparently you can’t just plug in a username/password or API key, you have to go through the whole Oauth2 handshake to gain access to the API.

I found documentation was not as easy to find as I had hoped unfortunately. I found many broken links to documentation. Of all people I thought Google would be better at this. For example, on their most up to date developer docs I could find they point to this broken link to read more about authentication and authorization. (OK, that was weird, as soon as I posted it here, the link started working – I guess you can all thank me for that ;))

So this blog post is an attempt to document the process of getting reports out of Adsense and into my Django application.

In order to use Google’s API for accessing Adsense reports, you need to use the Adsense Management API. This API only support OAuth so you have to do the authentication flow in the browser at least once in order to get your credentials, then you can save these credentials so you have access going forward. To be honest, while I’ve heard about OAuth many times, I have never actually had a need to use it until now. So I’m learning as I go and feel free to leave a comment and point any misunderstandings I might have.

As I understand it, Google has one large API for their various products. Before you can talk to Adsense, you have to register your application through the Google API console. I registered my application. Since I don’t have a live URL yet, I used my development URL for now (localhost:8000). It seemed to work just fine. Download the JSON file with the link provided.

Also, while your managing your APIs. You will need to go to the services tab and turn on AdSense Management API if you have not already done so. Otherwise, when you try to make a request you will just get an error message that says “Access Not Configured”.

Google has created a client library for Python, which is easily installed with pip. They also have a Django sample project that uses this library to go through the OAuth2 handshake. I think it was written in Django 1.1 (Django 1.5 was just released as of this writing) so it is a bit out of date, but helps greatly as a starting point.

My app is simple. I just want to read in the amount of revenue on a given day and store it in my local database.

I created a new app in my django project called ‘adsense’. I created a models.py file to store credentials.

from django.contrib.auth.models import User
from django.db import models
from oauth2client.django_orm import CredentialsField

class Credential(models.Model):
    id = models.ForeignKey(User, primary_key=True)
    credential = CredentialsField()

class Revenue(models.Model):
    date = models.DateField(unique=True)
    revenue = models.DecimalField(max_digits=7, decimal_places=2)

    def __unicode__(self):
        return '{0} ${1}'.format(self.date, self.revenue)

I put the JSON file I downloaded from the API console in my app folder and created a the following views.py.

import os

from django.conf import settings
from django.contrib.auth.decorators import login_required
from django.contrib.sites.models import Site
from django.http import HttpResponseBadRequest, HttpResponse
from django.http import HttpResponseRedirect
from oauth2client import xsrfutil
from oauth2client.client import flow_from_clientsecrets
from oauth2client.django_orm import Storage

from .models import Credential


CLIENT_SECRETS = os.path.join(os.path.dirname(__file__), 'client_secrets.json')

FLOW = flow_from_clientsecrets(
    CLIENT_SECRETS,
    scope='https://www.googleapis.com/auth/adsense.readonly',
    redirect_uri='http://{0}/adsense/oauth2callback/'.format(
        Site.objects.get_current().domain))


@login_required
def index(request):
    storage = Storage(Credential, 'id', request.user, 'credential')
    credential = storage.get()
    if credential is None or credential.invalid is True:
        FLOW.params['state'] = xsrfutil.generate_token(
            settings.SECRET_KEY, request.user)
        # force approval prompt in order to get refresh_token
        FLOW.params['approval_prompt'] = 'force'
        authorize_url = FLOW.step1_get_authorize_url()
        return HttpResponseRedirect(authorize_url)
    else:
        return HttpResponse('Validated.')


@login_required
def auth_return(request):
    if not xsrfutil.validate_token(
            settings.SECRET_KEY, request.REQUEST['state'], request.user):
        return  HttpResponseBadRequest()
    credential = FLOW.step2_exchange(request.REQUEST)
    storage = Storage(Credential, 'id', request.user, 'credential')
    storage.put(credential)
    return HttpResponseRedirect("/adsense/")

Note that on line 32 I added a parameter to force the approval prompt. I was having problems getting “invalid_grant” errors because it seemed my credentials would expire. I’d have to go through the OAuth2 handshake every morning. I learned after much research that I wasn’t getting a refresh_token back. I found this tip on StackOverflow explaining how to get it. This line seemed to fix that problem.

In my main urls.py file I include a link to my app urls file:

main urls.py:

from django.conf.urls import patterns, include, url
from django.contrib import admin

admin.autodiscover()

urlpatterns = patterns(
    '',
    url(r'^adsense/', include('adsense.urls', namespace='adsense')),

    url(r'^admin/doc/', include('django.contrib.admindocs.urls')),
    url(r'^admin/', include(admin.site.urls)),
)

adsense/urls.py:

from django.conf.urls import patterns, url

urlpatterns = patterns(
    'adsense.views',
    url(r'^$', 'index', name='index'),
    url(r'^oauth2callback/$', 'auth_return', name='auth_return'),
)

Lastly, I have a class that makes the call to the API to get revenue for given dates. This is located in adsense/tasks.py as I will likely hook this up soon to run as a task with Celery/RabbitMQ.

import datetime
import httplib2

from apiclient.discovery import build
from celery.task import PeriodicTask
from django.contrib.auth.models import User
from oauth2client.django_orm import Storage

from .models import Credential, Revenue


TODAY = datetime.date.today()
YESTERDAY = TODAY - datetime.timedelta(days=1)


class GetReportTask(PeriodicTask):
    run_every = datetime.timedelta(minutes=2)

    def run(self, *args, **kwargs):
        scraper = Scraper()
        scraper.get_report()


class Scraper(object):
    def get_report(self, start_date=YESTERDAY, end_date=TODAY):
        user = User.objects.get(pk=1)
        storage = Storage(Credential, 'id', user, 'credential')
        credential = storage.get()
        if not credential is None and credential.invalid is False:
            http = httplib2.Http()
            http = credential.authorize(http)
            service = build('adsense', 'v1.2', http=http)
            reports = service.reports()
            report = reports.generate(
                startDate=start_date.strftime('%Y-%m-%d'),
                endDate=end_date.strftime('%Y-%m-%d'),
                dimension='DATE',
                metric='EARNINGS',
            )
            data = report.execute()
            for row in data['rows']:
                date = row[0]
                revenue = row[1]

                try:
                    record = Revenue.objects.get(date=date)
                except Revenue.DoesNotExist:
                    record = Revenue()
                record.date = date
                record.revenue = revenue
                record.save()
        else:
            print 'Invalid Adsense Credentials'

To make it work, I got to http://localhost:8000/adsense/. I’m then prompted to log in to my Google account. I authorize my app to allow Adsense access. The credentials are then stored in my local database and I can call my Scraper get_report() method. Congratulations to me, it worked!

I’ve been putting some time into updating an old site this weekend. I noticed that the homepage was taking a long time to load – around 5 to 8 seconds. Not good.

I tried caching queries but it didn’t help at all. Then I realized it was most likely due to my decision long ago to use textile to render text to html.

The site is located at direct-vs-dish.com. It essentially compares DIRECTV to DISH Network. On the home page is a number of features. Each feature represents a database record. Here is my original model for the features:

class Feature(models.Model):
    category = models.CharField(max_length=255)
    slug = models.SlugField()
    overview = models.TextField(blank=True, null=True)
    dish = models.TextField(blank=True, null=True)
    directv = models.TextField(blank=True, null=True)
    dish_link = models.URLField(blank=True, null=True)
    directv_link = models.URLField(blank=True, null=True)
    order = models.PositiveSmallIntegerField()

    def __unicode__(self):
        return self.category

    class Meta:
        ordering = ['order']

Three of the above fields use textile: overview, dish, & directv. I currently have 14 feature records. So that is a potential of 42 textile conversions for the home page.

In order to cache these textile conversions, I added three new fields. I then added a save method to populate the cached html fields. My model now looks like this:

from django.contrib.markup.templatetags.markup import textile


class Feature(models.Model):
    category = models.CharField(max_length=255)
    slug = models.SlugField()
    overview = models.TextField(blank=True, null=True)
    overview_html = models.TextField(blank=True)
    dish = models.TextField(blank=True, null=True)
    dish_html = models.TextField(blank=True)
    directv = models.TextField(blank=True, null=True)
    directv_html = models.TextField(blank=True)
    dish_link = models.URLField(blank=True, null=True)
    directv_link = models.URLField(blank=True, null=True)
    order = models.PositiveSmallIntegerField()
    
    def __unicode__(self):
        return self.category

    def save(self, **kwargs):
        self.overview_html = textile(self.overview)
        self.dish_html = textile(self.dish)
        self.directv_html = textile(self.directv)
        return super(Feature, self).save(kwargs)
        
    class Meta:
        ordering = ['order']

I use the Django admin to edit features so I added some styling to hide the cached html fields with an option to show them if you want to see what has been converted and cached.

class FeatureAdmin(admin.ModelAdmin):
    list_display = ('category', 'order')
    prepopulated_fields = {"slug": ("category",)}
    fieldsets = (
        (None, {
            'fields': ('category', 'slug', 'overview', 'dish', 'dish_link',
                       'directv', 'directv_link', 'order')
        }),
        ('Auto Generated', {
            'classes': ('collapse',),
            'fields': ('overview_html', 'dish_html', 'directv_html'),
        }),
    )
admin.site.register(Feature, FeatureAdmin)

My template tags went from this:

{{ feature.overview|textile }}

To this:

{{ feature.overview_html|safe }}

This has dropped my homepage rending time to about 750ms. This is without any caching of queries. Huge win!

sentryIf you are hosting a Django site, Sentry will make your life easier.

After my review of various hosting companies I decided to put EnvelopeBudget.com on Webfaction. But, I was still impressed with Digital Ocean so I kept my virtual server. Why not? It’s only $5 per month for full root access! Because all their servers have SSD’s I’ver never seen a virtual server boot so fast. Soon will be the day when you will hear someone say, “remember when computers had moving parts?” I kept it because I figured I’d find a use for it eventually. Well, I found a use for it.

I love Sentry. We used it at SendOutCards to help us better manage our server errors. I think we were running a pre 1.0 release when it was just called django-sentry. It has come a long way. I set up an account on GetSentry.com and loved it. Since I’m bootstrapping a start-up, I decided to set up my own sentry server on my Digital Ocean account.

I documented the process I went through setting up the server.

Create Ubuntu 12.10 X32 Server droplet & ssh into it as root

# add non-root user
adduser sentry

# add to sudoers
adduser sentry sudo

# log out of root and log in as sentry
exit

# update the local package index
sudo apt-get update

# actually upgrade all packages that can be upgraded
sudo apt-get dist-upgrade

# remove any packages that are no longer needed
sudo apt-get autoremove

# reboot the machine, which is only necessary for some updates
sudo reboot

# install python-dev
sudo apt-get install build-essential python-dev

# download distribute
curl -O http://python-distribute.org/distribute_setup.py

# install distribute
sudo python distribute_setup.py

# remove installation files
rm distribute*

# use distribute to install pip
sudo easy_install pip

# install virtualenv and virtualenvwrapper
sudo pip install virtualenv virtualenvwrapper

# to enable virtualenvwrapper add this line to the end of the .bashrc file
echo "" >> .bashrc
echo "source /usr/local/bin/virtualenvwrapper.sh" >> .bashrc

# exit and log back in to restart your shell
exit

# make virtualenv
mkvirtualenv sentry_env

# install sentry
pip install sentry

# create settings file (file will be located in ~/.sentry/sentry.conf.py)
sentry init

# install postgres
sudo apt-get install postgresql postgresql-contrib libpq-dev

# install postgres adminpack
sudo -u postgres psql
CREATE EXTENSION "adminpack";
\q

# change postgres password & create database
sudo passwd postgres
sudo su - postgres
psql -d template1 -c "ALTER USER postgres WITH PASSWORD 'changeme';"
createdb your_sentry_db_name
createuser your_sentry_user --pwprompt
psql -d template1 -U postgres
GRANT ALL PRIVILEGES ON DATABASE your_sentry_db_name to your_sentry_user;
\q
exit

# update config file to use postgres & host (with vim or your editor of choice)
sudo apt-get install vim
vim .sentry/sentry.conf.py

The following are the contents of my sentry.conf.py file

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'your_sentry_db_name',
        'USER': 'your_sentry_user',
        'PASSWORD': 'your_password',
        'HOST': 'localhost',
    }
}

You will also want to configure your SMTP mail account. I just used my gmail account.

# going to need psycopg2
workon sentry_env
pip install psycopg2

# set up databse
sentry upgrade

# let's try it out!
sentry start

# install nginx
sudo apt-get install nginx

# remove the default symbolic link
sudo rm /etc/nginx/sites-enabled/default

# create a new blank config, and make a symlink to it
sudo touch /etc/nginx/sites-available/sentry
cd /etc/nginx/sites-enabled
sudo ln -s ../sites-available/sentry

# edit the nginx configuration file
sudo vim /etc/nginx/sites-available/sentry

Here are the contents of my nginx file:

server {
    # listen on port 80
    listen 80;

    # for requests to these domains
    server_name yourdomain.com www.yourdomain.com;

    # keep logs in these files
    access_log /var/log/nginx/sentry.access.log;
    error_log /var/log/nginx/sentry.error.log;

    # You need this to allow users to upload large files
    # See http://wiki.nginx.org/HttpCoreModule#client_max_body_size
    # I'm not sure where it goes, so I put it in twice. It works.
    client_max_body_size 0;

    location / {
        proxy_pass http://localhost:9000;
        proxy_redirect off;

        proxy_read_timeout 5m;

        # make sure these HTTP headers are set properly
        proxy_set_header Host            $host;
        proxy_set_header X-Real-IP       $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

That’s about it.

# restart nginx
sudo service nginx restart

I’m not really sure of the proper way to keep sentry going. But I just run:

sentry start &

Perhaps someone more knowledgable can leave a comment and suggest the best way to start the service automatically on reboot.

Oh, I also just moved my ZNC bouncer to the same server as it is much more reliable than connecting to my Mac Mini at home.

Update

I set up supervisor as recommend in the comments and the docs to keep sentry runny (though it has never crashed, it does make restarting easier)

sudo apt-get install supervisor
sudo vim /etc/supervisor/conf.d/sentry.conf

Add the following to the sentry.conf file:

[program:sentry-web]
directory=/home/sentry/
command=/home/sentry/.virtualenvs/sentry_env/bin/sentry start http
autostart=true
autorestart=true
redirect_stderr=true

Restart supervidord

sudo killall supervisord
sudo supervisord

Upgrading Sentry:

I’ve upgraded twice. It was a painless process…

workon sentry_env
pip install sentry --upgrade
sentry upgrade
sudo supervisorctl restart sentry-web

Hosting Decisions

Note: Since writing this I have become more comfortable with managing my own servers. The pricing point of DigitalOcean moved me in that direction to be honest. Since writing this, DO has created a referral program that allows you to get a $10 credit when you sign up and I in turn get a referral credit. Use this link to sign up at DigitalOcean and get a $10 credit.

Where do you fit on this scale?

Sysadmin -> DBA -> Backend Programmer -> Frontend Programmer -> Designer

I have a range in the middle, but lack on each end of the spectrum. So when it comes to setting up a hosting server, I’d rather turn it over to someone more experienced in the sysadmin realm. But, when bootstrapping a startup, you find yourself becoming a jack of all trades (and master of none).

I’ve been in the process of re-writing Inzolo and re-launching as Envelope Budget. It recently came time to launch (ready or not). I spent way more time than I intended setting up a hosting account. I have been hosting Inzolo on Webfaction since its inception. Overall I’ve been quite pleased. I don’t really have any performance or downtime issues that I can remember, Webfaction has a nice interface to set up everything I need. I’ve actually been pleasantly surprised in how it has met my needs.

I’ve been hearing a lot of buzz about Heroku though. And so, I thought I’d try deploying there before I went live. First of all, let me explain my stack. EnvelopeBudget.com is written in Django and I’m using PostgreSQL as my database. I’m making use of johnny-cache and using Memcached to speed up the site a bit. I wrote a utility to import Inzolo accounts into Envelope Budget and found that I finally had a real need for asynchronous processing, so I implemented Celery and RabbitMQ to process the import and return status updates to the browser.

I was impressed after doing the Getting Started with Django tutorial on Heroku. What kind of magic is this? So I attempted to get my EnvelopeBudget stack up and running next. I modified my django project structure to be more Heroku friendly. I probably spend a good 8 hours leaning how Heroku makes deployment so simple though it never really seemed simple. I got it up and running but in the end I decided it wasn’t for me (at least for this project) mainly due to the price. Minimally it would cost me $55 per month because I needed two dynos (one web and one worker), and the SSL add-on. Seriously, why do they charge $20 per month to use SSL? SSL set up is free on the other 3 hosting plans I’m reviewing here. That was probably the biggest deal breaker. Also, this price was for using the dev PostgreSQL add-on which wouldn’t last long. Soon I’d need to upgrade to the Basic ($9/mo) or Crane ($50/mo) package. So, now my hosting was looking more like $105 per month. On top of that, you deploy by pushing to git (‘git push heroku master’). This is cool, but it seemed to take forever each time. It was annoying since I had to keep committing and pushing to troubleshoot problems. Deploying with fabric is much faster for me on the other three servers. Time to move on.

So at this point I’ve decided I’ll just go back to Webfaction. As I’m riding the train home from work and reading through my twitter feed I come across a link to a Complete Single Server Django Stack Tutorial. I read through it and it suddenly didn’t seem so scary setting my up own server. I’ve don’t pretty much all of this before on my own development environment. So, I go to the best place I know to spin up a new server fast – Linode. It probably took me about 2 hours to get everything up and running. I took copious notes along the way though. After getting it to work on the 512 plan ($20 per month), I destroyed that linode and set it up again on a 1 GB plan ($40/month). It took about 40 minute the second time (setting it up twice was faster than figuring out Heroku). I was surprised at how much faster the performance was on Linode. Webfaction & Heroku felt about the same, but Linode felt significantly faster.

After getting it all set up I got a tweet from a friend recommending I try out DigitalOcean while I’m at it. After looking at the prices and specs, I could get a 1 GB server for half the price and it had an SSD to make it faster – but only one core instead of 4. I took the time to set it up. The process was pretty much the same as with Linode. It only took about 30 minutes this time. Overall the site felt slower than Linode though. I’m guessing it was due to having only one core and because I’m located in Utah, my Linode was in Texas and DigitalOcean is New York. Still, installing packages seemed to take a lot longer so I’m thinking it was their data center’s internet speed that was source of slower speeds. Sorry, I don’t have any benchmarks so I can’t really give real numbers. One thing that really impressed me though was the reboot time of the server. It seemed about 5 times faster than my linode likely due to the SSD.

So, now it was time to make a choice. I had a launch counter ticking down on my homepage and I had to decide NOW. I had already spent 3 days making a decision. I finally decided to go with Webfaction’s 1 GB plan which is $40 per month (or $30 per month if paid yearly). I like the idea of having a managed plan. The biggest downside for me is that I don’t have root or sudo access. They don’t use virtualenv for their application setup and setting up projects is a bit kludgy felling because of it. Also, setting up Celery & RabbitMQ doesn’t feel as painless, but I managed it thanks to Mark Liu’s tutorial. I know there is a way to use virtualenv and gunicorn on Webfaction, but I doubt I’ll take the time to set my project up that way.

There was a snag though. I had originally set up my account on their most basic account with only has 256 MB of RAM. My site was already killed for running 2x that amount. I needed to upgrade ASAP but I need someone there to set up the new account and migrate my existing account. So I actually ended up launching on Linode. The site is up now and hosting performance is great, but I will likely move back to Webfaction because I soon started to realize there is always something else to set up. I have a git repo, a Trac system, email, & FTP already set up on Webfaction. I would likely want to put a WordPress blog at /blog. All of this is so easy with Webfaction and its more I have to research to do all of this on Linode.

TL;DR

So here is my tl;dr version in alphabetical order:

DigitalOcean: I love their pricing. For as little as $5 per month I can spin up a linux server. This would be great for a ZNC IRC bouncer for example. They seem fairly new still so time will tell how they compete with Linode. Their internet connection seemed a bit slow, but for root access to a server, it can be overlooked.

Heroku: If I were a hipster I’d bite the bullet and host here to get in with the cool crowd. Overall it was just too expensive for a bootstrapped startup project. The biggest benefit I see with Heroku is the ability to scale fast, both forwards and backwards when you need to. Scaling is a good problem to have. If I get to that that point, money won’t be an issue and I will revisit Heroku. I would probably also use it if I built a very small site where the specs fit within their free model or if I was in the middle of a hack-a-thon and needed to get online fast.

Linode: This seems to be the standard for spinning up your own dedicated server with root access. If I root access, performance and a history of good support, I’ll go here.

Webfaction: I’ve been around the block and learned that the grass is not really greener on the other side. Although I don’t have root access and it’s hosted on CentOS rather than Debian/Ubuntu which I’m more familiar with, it has so many features for making it easy to set up email, multiple domains, SSL, different types of apps (Django + PHP + Ruby on Rails anyone?), Trac, Git, etc. The price is competitive, the support is good, the uptime and performance is good – I haven’t found sufficient reason to leave.

UPDATE

After doing a number of installs at work I got more comfortable with deploying on gunicorn & nginx, so I ended up switching to DigitalOcean. This is where EnvelopeBudget.com is currently hosted and a have a couple other droplets hosting YouTrack & Sentry. The main reason I left Webfaction was that I needed to update my SSL certificate ASAP and there is a slight lag time with Webfaction because you have to submit a ticket to complete your SSL setup.

Reasons for leaving Webfaction:

  • Total control of SSL setup
  • Performance – I wanted SD drives
  • Price – more computing power for the price
  • Virtualenv – Upgrading is a lot easier when using virtualenv

Things to consider before leaving Webfaction

  • Webfaction comes with email. I’m now using Zoho for free email.
  • Easier to configure – It took a while to figure out how to run WordPress on /blog with nginx. Also, I had to learn the whole process of configuring an SSL certificate.
  • I didn’t bother migrating Trac. Webfaction had a nice one click installer. I’ve moved to YouTrack instead.
  • There are a number of other one-click install solutions available on Webfaction. Be sure you know what you are leaving.

Clicky