django :: Http404 handling

Just like “everyone” else I’ve written a custom CMS (Content Management System) that we use internally. One can discuss the merits of writing your own vs. using something like Plone, but given a boss who sometimes wants it “just so” I felt more comfortable with a system I wrote myself. Creating custom “portals” is one of the services we offer, so it’s also part of our core business.

Today, I wanted to gain some knowledge about pages we don’t serve, yet someone is looking for them. The usual scenarios are dead links in either bookmarks, emails, directly entered typos, or search-engines that are going amok. Sometimes there is an obvious alternative page that could be displayed, e.g. if a page has moved.

Our urls.py file has this as its last rule

      (r'^(?P<path>.+)$', views.page),

and the page view starts by finding the correct page (we use separate settings files for the various websites we run)

      webpage = get_object_or_404(Page, site=settings.WEBSITE, path=path)

The Page model keeps track of which version of resources should be displayed, which permissions (if any) are required to view the page, keywords, description, title, etc.

Changing get_object_or_404 to our own function, we get

      webpage = select_page(request, path)

the settings are global, so we don’t need to pass settings.WEBSITE as an argument. The select_page function uses the Redirect model

      class Redirect(models.Model):
          website = models.PositiveIntegerField()
          path = models.CharField(max_length=240)
          redirect = models.CharField(max_length=240, default='/')
          keywords = models.CharField(max_length=240, null=True, blank=True)
          counter = models.PositiveIntegerField(default=0, blank=True)
       
          class Admin:
              list_display = 'id website path redirect counter keywords'.split()
              search_fields = 'path redirect keywords'.split()
              list_filter = ['website']

finally, the select_page function

      def select_page(request, path):
          orig_path = path
          path = normpath(request, path)  # make sure there is a slash at the end
       
          try:
              webpage = Page.objects.get(site=settings.WEBSITE, path=path)
             
          except Page.DoesNotExist:
              referer = request.META.get('HTTP_REFERER')
       
              # to handle paths like /page.asp?page=454 we need to make the query string significant...
              q = request.META.get('QUERY_STRING')
              url = orig_path
              if q:
                  url += '?' + q
             
              if not referer:
                  # probably search engine...
                  r, created = Redirect.objects.get_or_create(website=settings.WEBSITE, path=url)
                  if created:
                      r.redirect = None  # default is "/" which might have been a mistake...
                      r.save()
                  else:
                      if r.redirect is not None:
                          return select_page(request, r.redirect)
                     
                      r.counter += 1
                      r.save()
                      time.sleep(min(9, 2*r.counter)) # bad spider!
       
                  raise http.Http404
       
              # someone probably linked to our old site... try to find some
              # useful content for 'em...
              try:
                  r = Redirect.objects.get(website=settings.WEBSITE,  path=url,
                                           redirect__isnull=False)
                  return select_page(request, r.redirect)
             
              except Redirect.DoesNotExist:
                  # we could use the same strategy as above (simply inserting into the
                  # Redirect table), however I like to handle these manually since they
                  # are usually a sign that something is very wrong...
                  # The bjorn (I'm bjorn, btw.) module makes it trivially easy to send
                  # email to myself -- in its simplest form: bjorn.email('hello world')
                  bjorn.email(repr(request),
                              subject="missing-page[linked]: %s" % url,
                              use_snippet=False)
                  raise http.Http404
             
          return webpage

The astute reader will notice that we’re in fact not redirecting. If a visitor goes to http://example.com/foo.html and the Redirect table contains an entry with “foo.html” -> “/”, then the visitor’s browser will show the url he typed and the content from ‘/’. While not necessarily a bug, it doesn’t help your page statistics, and I’ll probably end up changing it later…

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *