Django :: International characters in urls
I remember looking at putting Norwegian characters in urls a while ago, and giving up in disgust. I'm not talking about hostnames here, that actually works relatively well (e.g. http://båtførerprøven.norsktest.no/). What I'm talking about is something like http://www.norsktest.no/båtførerprøven. I haven't found a particularly pretty solution, but it seems to work on IE6, IE7, FF1.5, FF3, Opera9, and Safari3.1...
The solution is for a cms I'm writing for internal use, so almost all urls need to be looked up in a database. Here's from the urls.py:
-
(r'^(?P<path>(/[^/]+)+)', views.view_page),
and here's the view code
-
def normpath(path):
-
# path data in the database is lowercase, with hyphens, and without a trailing /
-
path = path.lower().replace(' ', '-')
-
if path.endswith('/'):
-
path = path[:-1]
-
-
# encoding machinery needed to support FF1.5
-
# We're taking advantage of the fact that non-ascii characters
-
# in the Latin-1 encoding are not valid UTF-8
-
try:
-
# if we just have ascii, or we have extended urls already
-
# encoded in utf-8...
-
tmp = path.decode('u8')
-
# ... then all is well in the universe (skip to end).
-
except:
-
# if we get here we could have garbage, or we could
-
# be on FF1.5 with Norwegian characters...
-
try: # check for possible latin-1 encoding
-
tmp = path.decode('l1')
-
path = tmp.encode('u8')
-
# whoo!
-
except:
-
raise http.Http404 # Garbage not found on our site.
-
-
return path
-
-
def view_page(request, path):
-
path = normpath(path)
-
webpage = dj.get_object_or_404(Page, path=path)
-
...
Lot's of fun urls are possible with this scheme, however IE7 seems to require a trailing / in the urls to keep from mangling the entered text with %XX codes (the look-up works regardless).