tkbe

April 18, 2007

Python :: property set

Filed under: python — tb @ 11:11 am

It looks like Python is getting tuples with named members in 2.6(http://www.oluyede.org/blog/2007/03/11/updates-from-python-svn-part-2/ and http://docs.python.org/dev/lib/named-tuple-factory.html). I suspect many of us have implemented similar functionality ourselves, e.g. Shannon -jj Behrens describes how he sometimes uses dictionaries to return composite polymorphic values (http://jjinux.blogspot.com/2007/03/python-returning-multiple-things-of.html). The problem with dictionaries is of course that they require too much excercise of your little finger in typing ['xx']. That's even worse for me since I'm using a keyboard layout that switches national characters onto those keys when I tap the caps-lock key, so I can use my American-keyboard touch typing skillz and eat my national characters as well (I'm looking forward to the Metaphor-off!)

It looks like the new Python NamedTuple type is going to limit the fields to those that are defined at creation time. It's based on a tuple, so I suppose that follows naturally, however it doesn't seem natural for the abstract-data-type of a container of named fields with iteration and indexing. I've called my implementation of this ADT a property set since most of the motivating use cases for this was returning returning values that had properties attached to them. The use is as follows...

You can assign to random fields, the only limitation is that they cannot start with an underscore, but public fields wouldn't have that anyway so it's not really a limitation (the limitation comes from the fact that the implementation overrides __setattr__ and being able to interpret fields starting with an underscore as internal to the implementation simplifies things quite a bit):

PYTHON:
  1. >>> p = pset()
  2. >>> p.a = 42
  3. >>> p.b = 'hello'
  4. >>> p.c = [p.a, p.b]
  5. >>> p
  6. pset(a=42, b='hello', c=[42, 'hello'])

You can iterate over the values:

PYTHON:
  1. >>> for key, value in p:
  2. ...     print key, value
  3. ...
  4. a 42
  5. b hello
  6. c [42, 'hello']

Notice that it maintains the insertion order, and you can also access by index:

PYTHON:
  1. >>> p[1]
  2. 'hello'

For technical reasons it is not possible to maintain the order when creating a pset from keyword arguments (I was hesitating to put this functionality in, but practicality beats purity, and it's turned out to be very practical). Equality does not require isomorphism, which means that as long as the sets have the same fields they compare equal:

PYTHON:
  1. >>> q = pset(a=42, b='hello', c=[42,'hello'])
  2. >>> q
  3. pset(a=42, c=[42, 'hello'], b='hello')
  4. >>> p == q
  5. True

You can keep the order given to the constructor by initializing with a list of tuples:

PYTHON:
  1. >>> list(p.items())
  2. [('a', 42), ('b', 'hello'), ('c', [42, 'hello'])]
  3. >>> r = pset(p.items())
  4. >>> r
  5. pset(a=42, b='hello', c=[42, 'hello'])

The example above does of course not mean that you can't create a pset from a pset directly (this also maintains order):

PYTHON:
  1. >>> s = pset(p)
  2. >>> s
  3. pset(a=42, b='hello', c=[42, 'hello'])

It's also extremely useful to be able to use field indexing notation as well:

PYTHON:
  1. >>> p
  2. pset(a=42, b='hello', c=[42, 'hello'])
  3. >>> p.b
  4. 'hello'
  5. >>> p[1]
  6. 'hello'
  7. >>> p['b']
  8. 'hello'
  9. >>> p['b'] = 'world'
  10. >>> p
  11. pset(a=42, b='world', c=[42, 'hello'])
  12. >>> p[1] = 'foo'
  13. >>> p
  14. pset(a=42, b='foo', c=[42, 'hello'])

Here's the code:

PYTHON:
  1. class pset(dict):
  2.     """This code is placed in the Public Domain.
  3.    
  4.        Property Set class.
  5.        A property set is an object where values are attached to attributes,
  6.        but can still be iterated over as key/value pairs.
  7.        The order of assignment is maintained during iteration.
  8.        Only one value allowed per key.
  9.         >>> x = pset()
  10.         >>> x.a = 42
  11.         >>> x.b = 'foo'
  12.         >>> x.a = 314
  13.         >>> x
  14.          pset(a=314, b='foo')
  15.     """
  16.     def __init__(self, items=(), **attrs):
  17.         object.__setattr__(self, '_order', [])
  18.         super(pset, self).__init__()
  19.         for k, v in items:
  20.             self.add(k, v)
  21.         for k, v in attrs.items():
  22.             self.add(k, v)
  23.  
  24.     def add(self, key, value):
  25.         if type(key) in (int, long):
  26.             key = self._order[key]
  27.         elif key not in self._order:
  28.             self._order.append(key)
  29.         dict.__setitem__(self, key, value)
  30.  
  31.     def __eq__(self, other):
  32.         """Equal iff they have the same set of keys, and the values for
  33.            each key is equal. Key order is not considered for equality.
  34.         """
  35.         if set(self._order) == set(other._order):
  36.             for key in self._order:
  37.                 if self[key] != other[key]:
  38.                     return False
  39.             return True
  40.         return False
  41.  
  42.     def __iadd__(self, other):
  43.         for k, v in other:
  44.             self.add(k, v)
  45.  
  46.     # should probably have an __radd__ method too...
  47.     def __add__(self, other):
  48.         tmp = self.__class__()
  49.         tmp += self
  50.         tmp += other
  51.         return tmp
  52.  
  53.     def __repr__(self):
  54.         vals = ', '.join('%s=%s' % (k, repr(v)) for (k,v) in self)
  55.         return '%s(%s)' % (self.__class__.__name__, vals)
  56.  
  57.     def __getattr__(self, key):
  58.         if key not in self:
  59.             raise AttributeError(key)
  60.         return self.get(key)
  61.  
  62.     def __getitem__(self, key):
  63.         if type(key) in (int, long):
  64.             key = self._order[key]
  65.         return self.get(key)
  66.    
  67.     __str__ = __repr__
  68.  
  69.    
  70.     def __iter__(self):
  71.         return ((k, self.get(k)) for k in self._order)
  72.  
  73.     def items(self):
  74.         return iter(self)
  75.  
  76.     def __setattr__(self, key, val):
  77.         if key.startswith('_'):
  78.             object.__setattr__(self, key, val)
  79.         else:
  80.             self.add(key, val)
  81.  
  82.     def __setitem__(self, key, val):
  83.         self.add(key, val)

March 25, 2007

Django :: admin search functionality

Filed under: django — tb @ 7:59 am

In-house staff has been getting spoiled lately by the simplicity of the search box in the Django admin interface. My favorite comment from last week was "the stupid thing doesn't even search the middle name field when I enter a name, I wish it could be more like the Django", in reference to an application for which we're paying a significant amount of licencing fees. (On a side-note I would suggest to everyone to re-brand the admin pages -- my users are now convinced Django is the stuff that I've put on their admin page...)

For a number of reasons, however, I'm starting to shift out parts of the admin interface for my own home-made pages. While they've served us well so far, we've gotten to the point now that the feature requests require more custom work than can be integrated into the admin interface. It's a good thing. It means users are thinking about how things can be done better, and it's also the way the Django admin interface is supposed to be used -- allowing everyone to focus on something besides basic admin functionality until later in a project.

It would seem important though, to not take a step backward and lose functionality like e.g. the simplicity of a single search box. I was to lazy to go look at the source, but a quick googling only turned up an entry from Petro Verkhogliad (http://petro.tanreisoftware.com/?p=22) that only deals with searching a single field (and a suggestion for an algorithm to search multiple fields that materializes way to much data to be practical). The second entry I found was from Steven Ametjan (http://www2.wolfsreign.com/archives/2007/01/22/writing-search-view-django/). His solution is unfortunately buggy if there is more than one search term.

A correct version should look something like this:

PYTHON:
  1. from django.db.models import Q
  2.  
  3. def search(terms=None):
  4.     if terms is None:
  5.         return Customers.objects.all()
  6.  
  7.     query = Customer.objects
  8.     for term in terms:
  9.         query = query.filter(
  10.               Q(fname__icontains=term)
  11.             | Q(lname__icontains=term)
  12.             | Q(email__icontains=term)
  13.             | Q(zipcode__icontains=term)
  14.             | Q(birthdate=term))
  15.     return query

That works ok, but searching the birthdate field requires using database syntax to enter dates (2007-03-25). People get very upset when they can't enter dates in their local format (I can't emphasize this too much). Where I'm sitting right now, dates are always entered as dd.mm.yyyy. We can fix this problem though, and at the same time make our searches faster by utilizing some domain knowledge. In this case we know that we don't need to search in non-date fields for terms that are dates (nor for non-date data in date fields). The only numeric column we're searching is the zipcode column, so we can limit terms that are matched against this column as well, and we end up with something like this:

PYTHON:
  1. def possible_zipcode(v):
  2.     try:
  3.         int(v)
  4.         return True
  5.     except:
  6.         return False
  7.  
  8. def local_date_format(v):
  9.     if len(v) == 10 and len(v.split('.')) == 3:
  10.         try:
  11.             day, month, year = map(int, v.split('.'))
  12.             datetime.date(year, month, day)
  13.             return True
  14.         except:
  15.             pass
  16.     return False
  17.  
  18.  
  19. def search(terms=None):
  20.     if terms is None:
  21.         return Customers.objects.all()
  22.  
  23.     query = Customer.objects
  24.     for term in terms:
  25.         if possible_zipcode(term):
  26.             query = query.filter(zipcode=term)
  27.         elif local_date_format(term):
  28.             day, month, year = map(int, term.split('.'))
  29.             query = query.filter(birthdate=datetime.date(year, month, day))
  30.         elif '@' in term:
  31.             query = query.filter(email__icontains=term)
  32.         else:
  33.             query = query.filter(
  34.                   Q(fname__icontains=term)
  35.                 | Q(lname__icontains=term))
  36.     return query

For my real code running against real data this approach turned out to be an order of magnitude faster than Django's generic algorithm... (In all fairness to Django I should probably mention that the only time Django's search box isn't almost instantaneous is when I'm running the admin interface on my personal machine against our production database over a vpn connection from home ;-)

March 8, 2007

python :: how old are you?

Filed under: python — tb @ 5:06 pm

It's a simple question, any ~5 y/o can answer it "I'm five years, four months, and two days". So, how do we calculate that using Python? Subtracting date objects only produces objects giving the difference in days, which is very excact, and very unhelpful -- still it's the only sane option for a datetime library...

One problem is how old a person is who is born in the leap-year 2000, on date(2000, 2, 29) if today was date(2003, 2, 28). I decided they were 2 years, 11 months, and 30 days, and that on date(2003, 3, 1) they were 3 years and 1 day. That seems very logical to me, although I know there are lot's of people that disagree with me... Feel free to come up with code that works for you ;-)

The code turned out to be relatively short and simple and works on the simple principle of "I'm one year older if my birthday is today or was earlier this year".

PYTHON:
  1. from datetime import date as _date
  2. from calendar import monthrange as _monthrange
  3.  
  4. def age(dob, today=_date.today()):
  5.     y = today.year - dob.year
  6.     m = today.month - dob.month
  7.     d = today.day - dob.day
  8.  
  9.     while m <0 or d <0:
  10.         while m <0:
  11.             y -= 1
  12.             m = 12 + m  # m is negative
  13.         if d <0:
  14.             m -= 1
  15.             days = days_previous_month(today.year, today.month)
  16.             d = max(0, days - dob.day) + today.day
  17.  
  18.     return y, m, d
  19.  
  20. def days_previous_month(y, m):
  21.     m -= 1
  22.     if m == 0:
  23.         y -= 1
  24.         m = 12
  25.     _, days = _monthrange(y, m)
  26.     return days

The calendar module is way underutilized most of the time, so it was nice to get to use it...

« Previous PageNext Page »

Powered by WordPress