tkbe

December 17, 2009

django :: Http404 handling

Filed under: django — tb @ 5:12 am

Just like "everyone" else I've written a custom CMS (Content Management System) that we use internally. One can discuss the merits of writing your own vs. using something like Plone, but given a boss who sometimes wants it "just so" I felt more comfortable with a system I wrote myself. Creating custom "portals" is one of the services we offer, so it's also part of our core business.

Today, I wanted to gain some knowledge about pages we don't serve, yet someone is looking for them. The usual scenarios are dead links in either bookmarks, emails, directly entered typos, or search-engines that are going amok. Sometimes there is an obvious alternative page that could be displayed, e.g. if a page has moved.

Our urls.py file has this as its last rule

PYTHON:
  1. (r'^(?P<path>.+)$', views.page),

and the page view starts by finding the correct page (we use separate settings files for the various websites we run)

PYTHON:
  1. webpage = get_object_or_404(Page, site=settings.WEBSITE, path=path)

The Page model keeps track of which version of resources should be displayed, which permissions (if any) are required to view the page, keywords, description, title, etc.

Changing get_object_or_404 to our own function, we get

PYTHON:
  1. webpage = select_page(request, path)

the settings are global, so we don't need to pass settings.WEBSITE as an argument. The select_page function uses the Redirect model

PYTHON:
  1. class Redirect(models.Model):
  2.     website = models.PositiveIntegerField()
  3.     path = models.CharField(max_length=240)
  4.     redirect = models.CharField(max_length=240, default='/')
  5.     keywords = models.CharField(max_length=240, null=True, blank=True)
  6.     counter = models.PositiveIntegerField(default=0, blank=True)
  7.  
  8.     class Admin:
  9.         list_display = 'id website path redirect counter keywords'.split()
  10.         search_fields = 'path redirect keywords'.split()
  11.         list_filter = ['website']

finally, the select_page function

PYTHON:
  1. def select_page(request, path):
  2.     orig_path = path
  3.     path = normpath(request, path)  # make sure there is a slash at the end
  4.  
  5.     try:
  6.         webpage = Page.objects.get(site=settings.WEBSITE, path=path)
  7.        
  8.     except Page.DoesNotExist:
  9.         referer = request.META.get('HTTP_REFERER')
  10.  
  11.         # to handle paths like /page.asp?page=454 we need to make the query string significant...
  12.         q = request.META.get('QUERY_STRING')
  13.         url = orig_path
  14.         if q:
  15.             url += '?' + q
  16.        
  17.         if not referer:
  18.             # probably search engine...
  19.             r, created = Redirect.objects.get_or_create(website=settings.WEBSITE, path=url)
  20.             if created:
  21.                 r.redirect = None  # default is "/" which might have been a mistake...
  22.                 r.save()
  23.             else:
  24.                 if r.redirect is not None:
  25.                     return select_page(request, r.redirect)
  26.                
  27.                 r.counter += 1
  28.                 r.save()
  29.                 time.sleep(min(9, 2*r.counter)) # bad spider!
  30.  
  31.             raise http.Http404
  32.  
  33.         # someone probably linked to our old site... try to find some
  34.         # useful content for 'em...
  35.         try:
  36.             r = Redirect.objects.get(website=settings.WEBSITE,  path=url,
  37.                                      redirect__isnull=False)
  38.             return select_page(request, r.redirect)
  39.        
  40.         except Redirect.DoesNotExist:
  41.             # we could use the same strategy as above (simply inserting into the
  42.             # Redirect table), however I like to handle these manually since they
  43.             # are usually a sign that something is very wrong...
  44.             # The bjorn (I'm bjorn, btw.) module makes it trivially easy to send
  45.             # email to myself -- in its simplest form: bjorn.email('hello world')
  46.             bjorn.email(repr(request),
  47.                         subject="missing-page[linked]: %s" % url,
  48.                         use_snippet=False)
  49.             raise http.Http404
  50.        
  51.     return webpage

The astute reader will notice that we're in fact not redirecting. If a visitor goes to http://example.com/foo.html and the Redirect table contains an entry with "foo.html" -> "/", then the visitor's browser will show the url he typed and the content from '/'. While not necessarily a bug, it doesn't help your page statistics, and I'll probably end up changing it later...

December 13, 2009

Microformats: Boon or Bane?

Filed under: html — tb @ 4:25 am

Jeff Atwood recently wrote a blog-post with the title "Microformats: Boon or Bane?", which inspired me to implement the hCard micro format on our pages.

As Jeff mentions, it's not entirely trivial to get it right, but not rocket-surgery.

The main feeling I'm left with is that it is very US centric, even as it strives not to be. I was left doing hacks like these:

HTML:
  1. mobile:
  2.   <span class="tel">
  3.     <span class="type" style="display:none">cell</span>
  4.     <span class="value">+47 12 34 56 78</span>
  5.   </span>

notice the "mobile:" and "display:none"...

With the Operator extension loaded in Firefox, the result is definitely 1337.

Powered by WordPress