Bitrot

My blog had a little crash, and is being restored from “backup”…

Posted in Uncategorized | Leave a comment

django :: Http404 handling

Just like “everyone” else I’ve written a custom CMS (Content Management System) that we use internally. One can discuss the merits of writing your own vs. using something like Plone, but given a boss who sometimes wants it “just so” I felt more comfortable with a system I wrote myself. Creating custom “portals” is one of the services we offer, so it’s also part of our core business.

Today, I wanted to gain some knowledge about pages we don’t serve, yet someone is looking for them. The usual scenarios are dead links in either bookmarks, emails, directly entered typos, or search-engines that are going amok. Sometimes there is an obvious alternative page that could be displayed, e.g. if a page has moved.

Our urls.py file has this as its last rule

[sourcecode language=”Python”]
(r’^(?P<path>.+)$’, views.page),
[/sourcecode]

and the page view starts by finding the correct page (we use separate settings files for the various websites we run)

[sourcecode language=”Python”]
webpage = get_object_or_404(Page, site=settings.WEBSITE, path=path)
[/sourcecode]

The Page model keeps track of which version of resources should be displayed, which permissions (if any) are required to view the page, keywords, description, title, etc.

Changing get_object_or_404 to our own function, we get

[sourcecode language=”Python”]
webpage = select_page(request, path)
[/sourcecode]

the settings are global, so we don’t need to pass settings.WEBSITE as an argument. The select_page function uses the Redirect model

[sourcecode language=”Python”]
class Redirect(models.Model):
website = models.PositiveIntegerField()
path = models.CharField(max_length=240)
redirect = models.CharField(max_length=240, default=’/’)
keywords = models.CharField(max_length=240, null=True, blank=True)
counter = models.PositiveIntegerField(default=0, blank=True)

class Admin:
list_display = ‘id website path redirect counter keywords’.split()
search_fields = ‘path redirect keywords’.split()
list_filter = [‘website’]
[/sourcecode]

finally, the select_page function

[sourcecode language=”Python”]
def select_page(request, path):
orig_path = path
path = normpath(request, path) # make sure there is a slash at the end

try:
webpage = Page.objects.get(site=settings.WEBSITE, path=path)

except Page.DoesNotExist:
referer = request.META.get(‘HTTP_REFERER’)

# to handle paths like /page.asp?page=454 we need to make the query string significant…
q = request.META.get(‘QUERY_STRING’)
url = orig_path
if q:
url += ‘?’ + q

if not referer:
# probably search engine…
r, created = Redirect.objects.get_or_create(website=settings.WEBSITE, path=url)
if created:
r.redirect = None # default is "/" which might have been a mistake…
r.save()
else:
if r.redirect is not None:
return select_page(request, r.redirect)

r.counter += 1
r.save()
time.sleep(min(9, 2*r.counter)) # bad spider!

raise http.Http404

# someone probably linked to our old site… try to find some
# useful content for ’em…
try:
r = Redirect.objects.get(website=settings.WEBSITE, path=url,
redirect__isnull=False)
return select_page(request, r.redirect)

except Redirect.DoesNotExist:
# we could use the same strategy as above (simply inserting into the
# Redirect table), however I like to handle these manually since they
# are usually a sign that something is very wrong…
# The bjorn (I’m bjorn, btw.) module makes it trivially easy to send
# email to myself — in its simplest form: bjorn.email(‘hello world’)
bjorn.email(repr(request),
subject="missing-page[linked]: %s" % url,
use_snippet=False)
raise http.Http404

return webpage
[/sourcecode]

The astute reader will notice that we’re in fact not redirecting. If a visitor goes to http://example.com/foo.html and the Redirect table contains an entry with “foo.html” -> “/”, then the visitor’s browser will show the url he typed and the content from ‘/’. While not necessarily a bug, it doesn’t help your page statistics, and I’ll probably end up changing it later…

Posted in Uncategorized | Leave a comment

django :: implementing reCAPTCHA in 5 easy steps

It turns out reCAPTCHA is very easy to implement 🙂

Step #1: install the Python client:

[sourcecode language=”ps”]
easy_install recaptcha-client
[/sourcecode]

Step #2: obtain public and private keys here: http://recaptcha.net/whyrecaptcha.html.

Step #3: Then create a small helper module, mycaptcha.py, for convenience and to keep your keys in one place…

[sourcecode language=”Python”]
import recaptcha.client.captcha as rc

public_key = ‘your-public-key-from-step-2-goes-here’
private_key = ‘your-private-key-from-step-2-goes-here’

def displayhtml(use_ssl=False, error=None):
return rc.displayhtml(public_key, use_ssl, error)

def submit(request):
return rc.submit(
request.REQUEST.get(‘recaptcha_challenge_field’,”),
request.REQUEST.get(‘recaptcha_response_field’,”),
private_key,
request.META.get(‘REMOTE_ADDR’, ”))
[/sourcecode]

Step #4: create your Django view

[sourcecode language=”Python”]
def captcha_test_view(request):
import mycaptcha
error = None

if request.method == ‘POST’:
response = mycaptcha.submit(request)
if response.is_valid:
return HttpResponseRedirect(‘success-url/’)
else:
error = response.error_code

recaptcha = captcha.displayhtml(error)

return render_to_response(‘recaptcha_test.html’, {‘recaptcha’:recaptcha})
[/sourcecode]

Step #5: include the following inside the form element of your html…

[sourcecode language=”Python”]
<form method=POST …>
{{ recaptcha|safe }}

<input type=submit>
</form>
[/sourcecode]

Posted in django | Leave a comment

django :: date filter cheat sheet

I don’t seem able to memorize the list of format characters for the date filter, maybe because the documentation lists them in alphabetical order. I’ve created the cheat sheet below with an attempt at semantic grouping…

Time seconds s Seconds, 2 digits (with leading zeros).
minutes i Minutes (with leading zeros).
hours
(24 hr clock)
G Hour, 24-hour format without leading zeros.
H Hour, 24-hour format with leading zeros.
hours
(12 hr clock)
g Hour, 12-hour format without leading zeros.
h Hour, 12-hour format with leading zeros.
a a.m. or p.m.
A A.M. or P.M.
Date year y Year, 2 digits.
Y Year, 4 digits.
z Day of the year (0 – 365).
week W ISO-8601 week number.
w Day number of the week (0=Sunday, 6=Saturday).
month n Month number
m Month number, 2 digits (leading zeros)
b Month name, lower case, 3 characters.
M Month name, 3 characters.
F Month name
day j Day of the month
d Day of the month, 2 digits (leading zeros).
D Day name, 3 characters.
l Day name
z Day of the year (0 – 365)
w Day number of the week (0=Sunday, 6=Saturday).
misc. metainfo t Number of days in the given month.
L Leap year? (bool).
special formats r RFC 2822 formatted date..
U Seconds since the Unix Epoch (1970-01-01 00:00:00).
z Day of the year (0 – 365).
time zone O Difference to Greenwich time in hours.
Z Time zone offset in seconds. The offset for timezones west of UTC is always negative, and for those east of UTC is always positive..
T Time zone of this machine. (‘EST’, ‘MDT’).
English only S English ordinal suffix for day of the month (1st, 2nd)
P Time, in 12-hour hours, minutes and ‘a.m.’/’p.m.’, with minutes left off if they’re zero and the special-case strings ‘midnight’ and ‘noon’ if appropriate. Proprietary extension.
f Time, in 12-hour hours and minutes, with minutes left off if they’re zero. (Proprietary extension.)
N Month abbreviation in Associated Press Style (proprietary extension).
Posted in django | Leave a comment