Nerdy Dork
Dustin Davis reviews… the internet.

There are pros and cons for both sides of website development, using an “out of the box” PHP platform such as WordPress or hiring a website developer to create your custom website has elements to examine before making your decision.

WordPress Design Features:

WordPress is free, but offers advanced customized features for a small extra fee. You have full-access to create your custom website with user-friendly building tools for webpages and blogging site pages. WordPress has an open-source blogging tool with a CMS (content management system), based on PHP with MySQL. If the opinion of the masses can sway your decision, there are 60 million websites using WordPress, including 18.9 percent of the Top-ten million best websites online today. As of February 3, 2014, WordPress version 3.8 (released December 12, 2013) has been downloaded over 16 million times.

WordPress is evaluated as an efficient and versatile platform, website templates use a template processor and features thousands of themes using PHP, CSS and HTML coding. Advanced features can be added or edited during or after the build. The versatility of WordPress allows customization and tailoring to the specific needs of your website using 26,000 thousand plug-ins available.

With the development of WordPress versions 3.0, 3.8 and WordPress MU – “Multi-user”, website owners can host their own blogging communities. New website features include a dashboard feature and eight new data tables for each blog. The administrator can control and moderate all blogs from a single dashboard.

WordPress’ last release in 2013, version 3.8, has an improved administrative interface. The main dashboard has been simplified and it has a better responsive design for mobile devices.

WordPress Challenges:

Vulnerabilities that have been addresses recently in WordPress 3.0 involved security issues. These issues were detected in systems that had not been upgraded. In June, 2013, 50 of the most downloaded WordPress plug-ins were subjected to common SQL injection and XSS Web attacks. Also, seven out of the ten Top e-commerce plug-ins were vulnerable to Web Attacks.

There are current remedies to prevent identified vulnerabilities to Web Attacks such as editing the site’s .htaccess file to prevent SQL injection, this also blocks sensitive files from being accessed.

Web Design Comparative:

From the side of affordablity, WordPress wins, the basic cost is Free and even with additional customization, the cost is much less than a standard fee for a professional web-designer set up. As the most used website and blogging site platforms on the Web today, WordPress is a proven platform for reliability and trustworthy service.

From the side of custom web design and individuality of presentation, professional web designers offer custom-looks that can surpassed the somewhat “cookie-cutter” looks of WordPress. No other website will have the custom looks a professional web designer can build according to your needs and specifications. Their custom layouts, size of designing elements, presentations using custom fonts, colors, highlights and shades, plus custom designed graphics will stand out to impress your customers.

Challenges of Using a Professional Website Developer:

The things about using a web designer is you really don’t know who is writing your code. They may be exaggerating their experience level and you wouldn’t know until problems develop. Even at the end of the build, you may not have what you asked for. Sometimes they know all the right words to say, but when it comes to the finished product, it will be lacking in comprehensive functionality.

For some people, taking a chance on a web designer is worth the price to get a more unique web design. In that case try to get referrals from people who have used this developer for their sites. Even with this recommendation, there is still a risk as you still have no way to confirm who wrote the coding for each application on your site.

Ultimately, WordPress services millions of website owners and takes responsibility to provide quick fixes if a problem develops. This serves as insurance that if your website has a problem you will experience the least amount of downtime possible with a large company backing their product.

Show Your WorkMy junior year of high school I took a calculus class from the local college extension. I happened to get a nice Casio calculator (circa 1994) that allowed me to write programs. Since I didn’t really have a computer, this was my first experience programming.

Each time we learned a new algorithm, I would figure out how to program that algorithm into my calculator. The problem was that once I did, I would promptly forget how to work the problem by hand.

I mostly got B’s on my tests. I would get all the answers right, but I would often get docked for not showing my work. How do you show your work when the only way you remember how to solve a problem is to plug numbers in to the program you wrote on your calculator?

I was thinking about this recently. Today I might be accused of the opposite. When I leave a comment in a bug or feature ticket, I might be accused of sharing too much information. The business person that submitted the bug in our ticket system doesn’t really need to see all the queries I ran, the results of those queries and a documented process of how I came to find my solution. They just wanted it fixed.

But here’s the deal: I don’t write all that information for them. I write it for me.

Like the algorithms I programmed into my calculator in high school, I will promptly forget what I worked on two days ago and the process I followed. If I run into a similar problem, finding the ticket where I documented my process of finding a solution in the past is often much faster than coming up with that solution (or a new solution) again. This is also why I blog.

So, if you are going to err, err on the side of sharing too much information. Even if nobody cares, you will thank yourself later. On top of that, the business person requesting the bug fix might just learn a little more about your system in the process.

Lately I was getting this error frequently as I was using Django’s built in cache_page decorator to cache some views.

memcache in check_key
MemcachedKeyLengthError: Key length is > 250

Basically the problem is that Memcached only allows a 250 char key and some of my view names were pretty long and so it was creating keys greater than 250 chars.

I found a quick fix to hash the key with an md5 hash if the key is going to be over 250 characters. You can modify the function that creates the key.

In my settings file I added the following:

import hashlib

...

def hash_key(key, key_prefix, version):
    new_key = ':'.join([key_prefix, str(version), key])
    if len(new_key) > 250:
        m = hashlib.md5()
        m.update(new_key)
        new_key = m.hexdigest()
    return new_key

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
        'KEY_FUNCTION': hash_key,
    }
}

The reason why I only hash if the key is going to be over 250 characters is because 1) hashing is CPU intensive and I only want to do it when I have to; 2) I prefer to have my memcached keys human readable when possible; 3) less likely to have collision problems with duplicate hashes.

I thank Russell Keith-Magee for these tips.

Even after coding in Python for the past five years I’ve never really considered myself an expert in the language because I find the more I know, the more I know I don’t know. I generally keep my code simple on purpose until I have a good reason to be complex – which for most django sites, I haven’t had a good reason to be complex.

Today I had good reason. I’m currently building a number of key performance indicator (KPI) stats for Neutron. There are currently 46 different stats that I need to calculate for 5 different time periods.

For each state I need:

  • Stats for start of current day to current time with a comparison to yesterday start of day to the current time.
  • This week compared to last week delta
  • This month compared to last month delta
  • This quarter compare to last quarter delta
  • This year compared to last year delta

I will be building a view for each stat and associated time period to return these values in JSON format. So as it stand there will be 230 views. I needed to come up with something clever to save myself some lines of code. I opted for class based views.

First I built a base class that will return the JSON data in a consistent format:

class StatWithDelta(BaseDetailView):
    start = None
    end = None
    delta_start = None
    delta_end = None
    title = None
    subtitle = None

    def __init__(self):
        super(StatWithDelta, self).__init__()
        self.end = djtz.localtime(djtz.now())

    def value(self):
        raise NotImplementedError

    def delta(self):
        raise NotImplementedError

    def get(self, request, *args, **kwargs):
        value = self.value()
        delta_value = self.delta()
        try:
            delta_percent = round((((delta_value - value) / value) * 100), 2)
        except ZeroDivisionError:
            delta_percent = 0
        payload = {
            'value': value,
            'delta': delta_percent,
            'title': self.title,
            'subtitle': self.subtitle,
        }
        return self.render_to_response(payload)

    def render_to_response(self, context):
        return self.get_json_response(self.convert_context_to_json(context))

    def get_json_response(self, content, **httpresponse_kwargs):
        return http.HttpResponse(content,
                                 content_type='application/json',
                                 **httpresponse_kwargs)

    def convert_context_to_json(self, context):
        return json.dumps(context)

Next I built classes for each required time range. Here is my class for today compared to yesterday:

class TodayYesterday(StatWithDelta):
    subtitle = 'Today vs. Yesterday'

    def __init__(self):
        super(TodayYesterday, self).__init__()
        self.start = self.end.replace(hour=0, minute=0, second=0, microsecond=0)
        self.delta_start = self.start - datetime.timedelta(days=1)
        self.delta_end = self.end - datetime.timedelta(days=1)

Now for each stat I create a class that gets the main value and its delta value. Here is one example:

class GrossMarginPercent(StatWithDelta):
    title = 'Gross Margin Percent'

    def value(self):
        return functions.gross_margin_percent_within(self.start, self.end)

    def delta(self):
        return functions.gross_margin_percent_within(
            self.delta_start, self.delta_end)

I thought this was clever, but then I found myself writing a lot of similar code. I would create a class based view for each stat class and time period, then an associated url mapping. So for the stat class above I would have these five classes:

class GrossMarginPercentDay(GrossMarginPercent, TodayYesterday):
    pass


class GrossMarginPercentWeek(GrossMarginPercent, ThisWeekLastWeek):
    pass


class GrossMarginPercentMonth(GrossMarginPercent, ThisMonthLastMonth):
    pass


class GrossMarginPercentQuarter(GrossMarginPercent, ThisQuarterLastQuarter):
    pass


class GrossMarginPercentYear(GrossMarginPercent, ThisYearLastYear):
    pass

… and these urls:

    url(r'^edu/gmp-dtd/$', GrossMarginPercentDay.as_view()),
    url(r'^edu/gmp-wtd/$', GrossMarginPercentWeek.as_view()),
    url(r'^edu/gmp-mtd/$', GrossMarginPercentMonth.as_view()),
    url(r'^edu/gmp-qtd/$', GrossMarginPercentQuarter.as_view()),
    url(r'^edu/gmp-ytd/$', GrossMarginPercentYear.as_view()),

You can see the lines of code adding up. I was going to add 230+ lines of code to my urls.py file and 4600 lines of code to my views.py file (20 * 230) following PEP8 guidelines.

So I decided to use one url pattern to send to one view function to dynamically create each of the stat-period classes. Here is my new url pattern:

    url(r'^(?P<category>[\w\-]+)/(?P<period>day|week|month|quarter|year)/'
        r'(?P<base_class_name>\w+)/$', 'magic_view'),

And here is my “magic_view” function that where the *magic* happens:

def magic_view(request, category, period, base_class_name):
    """
    Builds a dynamic class subclassing the base class name passed in and a time 
    period class. It will return its as_view() method.

    URL structure: /category/period/KPI_Class/

    category: KPI category (edu, conversion, etc.) not really used at this point
    period: day, week, month, quarter, year
    KPI Class: One of the class names in this file
    """
    class_name = '{}{}'.format(base_class_name, period.capitalize())
    _module = sys.modules[__name__]
    base_cls = getattr(_module, base_class_name)
    if period == 'day':
        period_name = 'TodayYesterday'
    else:
        period_name = 'This{0}Last{0}'.format(period.capitalize())
    period_cls = getattr(_module, period_name)
    
    # Create a dynamic class based on the base class and time period class
    cls = type(class_name, (base_cls, period_cls), dict())
    return cls.as_view()(request)

So if you include all the comments lines to explain why I did, I’m only using 25 lines of code to save 4830 lines. That’s a lot of typing. Python, my fingers thank you!

A friend pointed me to this simple yet humorous website yesterday which essentially gives a new lazy coder excuse whenever the page is refreshed.

I couldn’t help but whip out a bot to plug in to our IRC channel. My lazy coder bot will give a random excuse whenever someone mentions the word “why”.

I used my Rollbot script as a base to write this up quickly.

requirements.txt

Twisted==13.1.0
beautifulsoup4==4.2.1
requests==1.2.3

because.py

from bs4 import BeautifulSoup
import requests
from twisted.words.protocols import irc
from twisted.internet import protocol, reactor

NICK = '_lazy_coder_'
CHANNEL = '#yourchannel'
PASSWORD = 'channel_password'

class MyBot(irc.IRCClient):
    def _get_nickname(self):
        return self.factory.nickname
    nickname = property(_get_nickname)

    def signedOn(self):
        self.join(self.factory.channel)
        print "Signed on as {}.".format(self.nickname)

    def joined(self, channel):
        print "Joined %s." % channel

    def privmsg(self, user, channel, msg):
        """
        Whenever someone says "why" give a lazy programmer response
        """
        if 'why' in msg.lower():
            # get lazy response
            because = self._get_because()

            # post message
            self.msg(CHANNEL, because)

    def _get_because(self):
        req = requests.get('http://developerexcuses.com/')
        soup = BeautifulSoup(req.text)
        elem = soup.find('a')
        return elem.text.encode('ascii', 'ignore')

class MyBotFactory(protocol.ClientFactory):
    protocol = MyBot

    def __init__(self, channel, nickname=NICK):
        self.channel = channel
        self.nickname = nickname

    def clientConnectionLost(self, connector, reason):
        print "Lost connection (%s), reconnecting." % reason
        connector.connect()

    def clientConnectionFailed(self, connector, reason):
        print "Could not connect: %s" % reason

if __name__ == "__main__":
    channel = CHANNEL
    if PASSWORD:
        channel += ' {}'.format(PASSWORD)
    reactor.connectTCP('irc.freenode.net', 6667, MyBotFactory(channel))
    reactor.run()

*UPDATE: I’ve made some minor modifications and posted the project on Github

I’m not afraid to admit, I’m a visual guy. I like GUI interfaces. Sequel Pro makes it very easy to SSH tunnel into a server and connect to MySQL, but there is nothing I have found built into pgAdmin3 to use SSH tunneling for connections.

Luckily I found it is simple enough to do.

First, open an ssh tunnel:

ssh -fNg -L 5555:localhost:5432 {your_username}@{yourdomain.com}

This open an SSH connection in the background mapping your local port 5555 to your server’s port 5432 (Postgres’ default port). Type “man ssh” to see what each of these flags is specifically doing.

Now, create a new connection in pgAdmin using localhost as your host and port 5555.

New pgAdmin Connection

Have you ever wanted to give your model some month choices relating to integers 1-12. I would guess it’s pretty common – common enough to be included in django.contrib. Well, it is. Here is a quick tip on how to include it in a model:

from django.db import models
from django.utils.dates import MONTHS


class RevenueGoal(models.Model):
    month = models.PositiveSmallIntegerField(choices=MONTHS.items())
    year = models.PositiveIntegerField()
    goal = models.DecimalField('Revenue Goal', max_digits=8, decimal_places=2)

Disclaimer: I am not a sysadmin. I’m just a developer. I welcome and encourage comments to improve this process!

I have set up a couple of Django servers lately and taken copious notes that I have extracted from various sources. Below are the commands I issue to a fresh Ubuntu server install to get Django up and running. This puts everything on one server (PostgreSQL, Celery, RabbitMQ, etc) so it’s nice for a small starter project but don’t expect it to scale.

Log in as root and add a non-root user. Add the user to the sudoers group. Log out and log back in as ‘username’.

adduser username
adduser username sudo
exit

Update the local package index. Upgrade all the packages that can be upgraded. Remove packages that are no longer needed and then reboot for good measure.

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get autoremove
sudo reboot

Install libraries for Python, PIP, PIL/Pillow, PostgreSQL, libevent for gevent, memcached server and library, RabbitMQ, git, nginx, & supervisor

sudo apt-get install build-essential python-dev python-pip libjpeg8-dev libfreetype6-dev zlib1g-dev postgresql postgresql-contrib libpq-dev libevent-dev memcached libmemcached-dev rabbitmq-server git nginx supervisor

Install virtualenv and virtualenvwrapper. To enable it, we need to add a line to our .bashrc file and log out and back in.

sudo pip install virtualenv virtualenvwrapper
echo "" >> .bashrc
echo "source /usr/local/bin/virtualenvwrapper.sh" >> .bashrc
exit

Make a virtualenv

mkvirtualenv project_env

Install postgres adminpack

sudo -u postgres psql
CREATE EXTENSION "adminpack";
\q

Change postgres password & create database

sudo passwd postgres
sudo su - postgres
psql -d template1 -c "ALTER USER postgres WITH PASSWORD 'changeme';"
createdb projectdb
createuser username --pwprompt
psql -d template1 -U postgres
GRANT ALL PRIVILEGES ON DATABASE projectdb to username;
\q
exit

Install RabbitMQ

sudo rabbitmqctl add_user username username_pw
sudo rabbitmqctl add_vhost username_vhost
sudo rabbitmqctl set_permissions -p username_vhost username ".*" ".*" ".*"
sudo rabbitmqctl clear_permissions -p username_vhost guest

Generate ssh key to upload to Github, Bitbucket, or wherever you host your code.

ssh-keygen -t rsa -C you@sample.com
cat ~/.ssh/id_rsa.pub

Create some /var/www dirs & set permissions on these directories.

sudo mkdir -p /var/www/static
sudo mkdir /var/www/media
sudo chown -R username:www-data /var/www

Clone your repository to your home directory and install the packages in your requirements file.

git clone git@bitbucket.org:yourusername/project.git
cd project/requirements
pip install -r prod.txt

Remove the default symbolic link for Nginx. Create a new blank config, and make a symlink to it. Edit the new configuration file.

sudo rm /etc/nginx/sites-enabled/default
sudo touch /etc/nginx/sites-available/project
cd /etc/nginx/sites-enabled
sudo ln -s ../sites-available/project
sudo vim /etc/nginx/sites-available/project

Add the following content to nginx config:

# define an upstream server named gunicorn on localhost port 8000
upstream gunicorn {
    server localhost:8000;
}

# make an nginx server
server {
    # listen on port 80
    listen 80;

    # for requests to these domains
    server_name yourdomain.com www.yourdomain.com;

    # look in this directory for files to serve
    root /var/www/;

    # keep logs in these files
    access_log /var/log/nginx/project.access.log;
    error_log /var/log/nginx/project.error.log;

    # You need this to allow users to upload large files
    # See http://wiki.nginx.org/HttpCoreModule#client_max_body_size
    # I'm not sure where it goes, so I put it in twice. It works.
    client_max_body_size 0;

    # THIS IS THE IMPORTANT LINE
    # this tries to serve a static file at the requested url
    # if no static file is found, it passes the url to gunicorn
    try_files $uri @gunicorn;

    # define rules for gunicorn
    location @gunicorn {
        # repeated just in case
        client_max_body_size 0;

        # proxy to the gunicorn upstream defined above
        proxy_pass http://gunicorn;

        # makes sure the URLs don't actually say http://gunicorn 
        proxy_redirect off;

        # If gunicorn takes > 5 minutes to respond, give up
        # Feel free to change the time on this
        proxy_read_timeout 5m;

        # make sure these HTTP headers are set properly
        proxy_set_header Host            $host;
        proxy_set_header X-Real-IP       $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
server {
    listen  443 ssl;
    
    # start mine
    ssl on;
    ssl_certificate /etc/ssl/localcerts/yourdomain_com.crt;
    ssl_certificate_key /etc/ssl/localcerts/yourdomain.com.key;
    ssl_protocols        SSLv3 TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers          HIGH:!aNULL:!MD5:!kEDH;
	server_name  yourdomain.com www.yourdomain.com;
	
    # look in this directory for files to serve
    root /var/www/;

    # keep logs in these files
    access_log /var/log/nginx/project.access.log;
    error_log /var/log/nginx/project.error.log;

    # You need this to allow users to upload large files
    # See http://wiki.nginx.org/HttpCoreModule#client_max_body_size
    # I'm not sure where it goes, so I put it in twice. It works.
    client_max_body_size 0;

    # THIS IS THE IMPORTANT LINE
    # this tries to serve a static file at the requested url
    # if no static file is found, it passes the url to gunicorn
    try_files $uri @gunicorn;

    # define rules for gunicorn
    location @gunicorn {
        # repeated just in case
        client_max_body_size 0;

        # proxy to the gunicorn upstream defined above
        proxy_pass http://gunicorn;

        # makes sure the URLs don't actually say http://gunicorn 
        proxy_redirect off;

        # If gunicorn takes > 5 minutes to respond, give up
        # Feel free to change the time on this
        proxy_read_timeout 5m;

        # make sure these HTTP headers are set properly
        proxy_set_header Host            $host;
        proxy_set_header X-Real-IP       $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Restart nginx

sudo service nginx restart

Set up database

cd /home/username/project
python manage.py syncdb --settings=project.settings.prod
python manage.py migrate --settings=project.settings.prod

Run collectstatic command

python manage.py collectstatic -l --noinput --settings=project.settings.prod
sudo /etc/init.d/nginx restart

Configure supervisor

Add the following contents to /etc/supervisor/conf.d/celeryd.conf

sudo vim /etc/supervisor/conf.d/celeryd.conf

Contents:

# the name of this service as far as supervisor is concerned
[program:celeryd]

# the command to start celery
command = /home/username/.virtualenvs/project_env/bin/python /home/username/project/manage.py celeryd -B -E --settings=project.settings.prod

# the directory to be in while running this
directory = /home/username/project

# the user to run this service as
user = username

# start this at boot, and restart it if it fails
autostart = true
autorestart = true

# take stdout and stderr of celery and write to these log files
stdout_logfile = /var/log/supervisor/celeryd.log
stderr_logfile = /var/log/supervisor/celeryd_err.log

Now we will create CeleryCam in /etc/supervisor/conf.d/celerycam.conf

sudo vim /etc/supervisor/conf.d/celerycam.conf

Contents:

[program:celerycam]
command = /home/username/.virtualenvs/project_env/bin/python /home/username/project/manage.py celerycam --settings=project.settings.prod
directory = /home/username/project
user = username
autostart = true
autorestart = true
stdout_logfile = /var/log/supervisor/celerycam.log
stderr_logfile = /var/log/supervisor/celerycam_err.log

Create Gunicorn script in /etc/supervisor/conf.d/gunicorn.conf

sudo vim /etc/supervisor/conf.d/gunicorn.conf

Contents:

[program:gunicorn]
command = /home/username/.virtualenvs/project_env/bin/python /home/username/project/manage.py run_gunicorn -w 4 -k gevent --settings=project.settings.prod
directory = /home/username/project
user = username
autostart = true
autorestart = true
stdout_logfile = /var/log/supervisor/gunicorn.log
stderr_logfile = /var/log/supervisor/gunicorn_err.log

Restart supervisor

sudo service supervisor restart

Restart/stop/start all services managed by supervisor

sudo supervisorctl restart all
sudo supervisorctl stop all
sudo supervisorctl start all

Or restart just celeryd

sudo supervisorctl restart celeryd

Or, start just gunicorn

sudo supervisorctl start gunicorn

Reboot and make sure everything starts up

sudo reboot

Bonus: set up ssl

sudo mkdir /etc/ssl/localcerts
cd /etc/ssl/localcerts
sudo openssl req -new -nodes -days 365 -keyout yourdomain.com.key -out yourdomain.com.csr
sudo chmod 400 /etc/ssl/localcerts/yourdomain.com.key
sudo chmod 400 /etc/ssl/localcerts/yourdomain.com.crt

I have been tasked with updating our real-time revenue stats at Neutron. After spending about a week going though and updating our PHP scripts I finally decided it would be worth my time and sanity to start from scratch with Python. I’m building a Django application that will store revenue stats from different sources, which I can then use to build views and an API for stat tools.

So for the past few days I’ve been writing scripts that log in to other websites and scrape data, or accessing the site’s API’s if they have one. I’ve learned a few things.

  1. requests > httplib2
  2. SOAP is the suck, but at least its an API. Suds makes SOAP suck less. I get it that SOAP is basically all .net developers know as far as APIs. ;)
  3. Beautiful Soup is a nice last resort.
  4. I’ve actually surprised how many businesses can survive on such crappy technology.

I saved Google Adsense for last figuring they would have the best API and it would therefore be the easiest to implement. It turned out more challenging than I anticipated. Apparently you can’t just plug in a username/password or API key, you have to go through the whole Oauth2 handshake to gain access to the API.

I found documentation was not as easy to find as I had hoped unfortunately. I found many broken links to documentation. Of all people I thought Google would be better at this. For example, on their most up to date developer docs I could find they point to this broken link to read more about authentication and authorization. (OK, that was weird, as soon as I posted it here, the link started working – I guess you can all thank me for that ;))

So this blog post is an attempt to document the process of getting reports out of Adsense and into my Django application.

In order to use Google’s API for accessing Adsense reports, you need to use the Adsense Management API. This API only support OAuth so you have to do the authentication flow in the browser at least once in order to get your credentials, then you can save these credentials so you have access going forward. To be honest, while I’ve heard about OAuth many times, I have never actually had a need to use it until now. So I’m learning as I go and feel free to leave a comment and point any misunderstandings I might have.

As I understand it, Google has one large API for their various products. Before you can talk to Adsense, you have to register your application through the Google API console. I registered my application. Since I don’t have a live URL yet, I used my development URL for now (localhost:8000). It seemed to work just fine. Download the JSON file with the link provided.

Also, while your managing your APIs. You will need to go to the services tab and turn on AdSense Management API if you have not already done so. Otherwise, when you try to make a request you will just get an error message that says “Access Not Configured”.

Google has created a client library for Python, which is easily installed with pip. They also have a Django sample project that uses this library to go through the OAuth2 handshake. I think it was written in Django 1.1 (Django 1.5 was just released as of this writing) so it is a bit out of date, but helps greatly as a starting point.

My app is simple. I just want to read in the amount of revenue on a given day and store it in my local database.

I created a new app in my django project called ‘adsense’. I created a models.py file to store credentials.

from django.contrib.auth.models import User
from django.db import models
from oauth2client.django_orm import CredentialsField

class Credential(models.Model):
    id = models.ForeignKey(User, primary_key=True)
    credential = CredentialsField()

class Revenue(models.Model):
    date = models.DateField(unique=True)
    revenue = models.DecimalField(max_digits=7, decimal_places=2)

    def __unicode__(self):
        return '{0} ${1}'.format(self.date, self.revenue)

I put the JSON file I downloaded from the API console in my app folder and created a the following views.py.

import os

from django.conf import settings
from django.contrib.auth.decorators import login_required
from django.contrib.sites.models import Site
from django.http import HttpResponseBadRequest, HttpResponse
from django.http import HttpResponseRedirect
from oauth2client import xsrfutil
from oauth2client.client import flow_from_clientsecrets
from oauth2client.django_orm import Storage

from .models import Credential


CLIENT_SECRETS = os.path.join(os.path.dirname(__file__), 'client_secrets.json')

FLOW = flow_from_clientsecrets(
    CLIENT_SECRETS,
    scope='https://www.googleapis.com/auth/adsense.readonly',
    redirect_uri='http://{0}/adsense/oauth2callback/'.format(
        Site.objects.get_current().domain))


@login_required
def index(request):
    storage = Storage(Credential, 'id', request.user, 'credential')
    credential = storage.get()
    if credential is None or credential.invalid is True:
        FLOW.params['state'] = xsrfutil.generate_token(
            settings.SECRET_KEY, request.user)
        # force approval prompt in order to get refresh_token
        FLOW.params['approval_prompt'] = 'force'
        authorize_url = FLOW.step1_get_authorize_url()
        return HttpResponseRedirect(authorize_url)
    else:
        return HttpResponse('Validated.')


@login_required
def auth_return(request):
    if not xsrfutil.validate_token(
            settings.SECRET_KEY, request.REQUEST['state'], request.user):
        return  HttpResponseBadRequest()
    credential = FLOW.step2_exchange(request.REQUEST)
    storage = Storage(Credential, 'id', request.user, 'credential')
    storage.put(credential)
    return HttpResponseRedirect("/adsense/")

Note that on line 32 I added a parameter to force the approval prompt. I was having problems getting “invalid_grant” errors because it seemed my credentials would expire. I’d have to go through the OAuth2 handshake every morning. I learned after much research that I wasn’t getting a refresh_token back. I found this tip on StackOverflow explaining how to get it. This line seemed to fix that problem.

In my main urls.py file I include a link to my app urls file:

main urls.py:

from django.conf.urls import patterns, include, url
from django.contrib import admin

admin.autodiscover()

urlpatterns = patterns(
    '',
    url(r'^adsense/', include('adsense.urls', namespace='adsense')),

    url(r'^admin/doc/', include('django.contrib.admindocs.urls')),
    url(r'^admin/', include(admin.site.urls)),
)

adsense/urls.py:

from django.conf.urls import patterns, url

urlpatterns = patterns(
    'adsense.views',
    url(r'^$', 'index', name='index'),
    url(r'^oauth2callback/$', 'auth_return', name='auth_return'),
)

Lastly, I have a class that makes the call to the API to get revenue for given dates. This is located in adsense/tasks.py as I will likely hook this up soon to run as a task with Celery/RabbitMQ.

import datetime
import httplib2

from apiclient.discovery import build
from celery.task import PeriodicTask
from django.contrib.auth.models import User
from oauth2client.django_orm import Storage

from .models import Credential, Revenue


TODAY = datetime.date.today()
YESTERDAY = TODAY - datetime.timedelta(days=1)


class GetReportTask(PeriodicTask):
    run_every = datetime.timedelta(minutes=2)

    def run(self, *args, **kwargs):
        scraper = Scraper()
        scraper.get_report()


class Scraper(object):
    def get_report(self, start_date=YESTERDAY, end_date=TODAY):
        user = User.objects.get(pk=1)
        storage = Storage(Credential, 'id', user, 'credential')
        credential = storage.get()
        if not credential is None and credential.invalid is False:
            http = httplib2.Http()
            http = credential.authorize(http)
            service = build('adsense', 'v1.2', http=http)
            reports = service.reports()
            report = reports.generate(
                startDate=start_date.strftime('%Y-%m-%d'),
                endDate=end_date.strftime('%Y-%m-%d'),
                dimension='DATE',
                metric='EARNINGS',
            )
            data = report.execute()
            for row in data['rows']:
                date = row[0]
                revenue = row[1]

                try:
                    record = Revenue.objects.get(date=date)
                except Revenue.DoesNotExist:
                    record = Revenue()
                record.date = date
                record.revenue = revenue
                record.save()
        else:
            print 'Invalid Adsense Credentials'

To make it work, I got to http://localhost:8000/adsense/. I’m then prompted to log in to my Google account. I authorize my app to allow Adsense access. The credentials are then stored in my local database and I can call my Scraper get_report() method. Congratulations to me, it worked!

I’ve been putting some time into updating an old site this weekend. I noticed that the homepage was taking a long time to load – around 5 to 8 seconds. Not good.

I tried caching queries but it didn’t help at all. Then I realized it was most likely due to my decision long ago to use textile to render text to html.

The site is located at direct-vs-dish.com. It essentially compares DIRECTV to DISH Network. On the home page is a number of features. Each feature represents a database record. Here is my original model for the features:

class Feature(models.Model):
    category = models.CharField(max_length=255)
    slug = models.SlugField()
    overview = models.TextField(blank=True, null=True)
    dish = models.TextField(blank=True, null=True)
    directv = models.TextField(blank=True, null=True)
    dish_link = models.URLField(blank=True, null=True)
    directv_link = models.URLField(blank=True, null=True)
    order = models.PositiveSmallIntegerField()

    def __unicode__(self):
        return self.category

    class Meta:
        ordering = ['order']

Three of the above fields use textile: overview, dish, & directv. I currently have 14 feature records. So that is a potential of 42 textile conversions for the home page.

In order to cache these textile conversions, I added three new fields. I then added a save method to populate the cached html fields. My model now looks like this:

from django.contrib.markup.templatetags.markup import textile


class Feature(models.Model):
    category = models.CharField(max_length=255)
    slug = models.SlugField()
    overview = models.TextField(blank=True, null=True)
    overview_html = models.TextField(blank=True)
    dish = models.TextField(blank=True, null=True)
    dish_html = models.TextField(blank=True)
    directv = models.TextField(blank=True, null=True)
    directv_html = models.TextField(blank=True)
    dish_link = models.URLField(blank=True, null=True)
    directv_link = models.URLField(blank=True, null=True)
    order = models.PositiveSmallIntegerField()
    
    def __unicode__(self):
        return self.category

    def save(self, **kwargs):
        self.overview_html = textile(self.overview)
        self.dish_html = textile(self.dish)
        self.directv_html = textile(self.directv)
        return super(Feature, self).save(kwargs)
        
    class Meta:
        ordering = ['order']

I use the Django admin to edit features so I added some styling to hide the cached html fields with an option to show them if you want to see what has been converted and cached.

class FeatureAdmin(admin.ModelAdmin):
    list_display = ('category', 'order')
    prepopulated_fields = {"slug": ("category",)}
    fieldsets = (
        (None, {
            'fields': ('category', 'slug', 'overview', 'dish', 'dish_link',
                       'directv', 'directv_link', 'order')
        }),
        ('Auto Generated', {
            'classes': ('collapse',),
            'fields': ('overview_html', 'dish_html', 'directv_html'),
        }),
    )
admin.site.register(Feature, FeatureAdmin)

My template tags went from this:

{{ feature.overview|textile }}

To this:

{{ feature.overview_html|safe }}

This has dropped my homepage rending time to about 750ms. This is without any caching of queries. Huge win!

Clicky