Python :: flatten

Although it can seem like a CS excercise, everyone will sooner or later have to write a “flatten” function — a function that takes a nested “list” and makes it one-dimensional:

      >>> flatten([1,2, [3,4], 5])
      [1, 2, 3, 4, 5]

it should also handle Python data types in a reasonable manner:

      >>> flatten(['hello', 'world'])
      ['hello', 'world']
      >>> flatten(i**2 for i in range(10))
      [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The trick to the implementation is to realize that everything that is iterable should be flattened… except strings and unicode. To test if something is iterable, call iter on it and be prepared to catch a TypeError: iterator over non-sequence. To test if something is string or unicode use isinstance(item, basestring). That results in something like this:

      def flatten(lst):
          def _flatten(lst, res):
              for item in lst:
                  if isinstance(item, basestring):
                      res.append(item)
                  else:
                      try:
                          _flatten(iter(item), res)
                      except TypeError: # iterator over nonsequence
                          res.append(item)
              return res
          return _flatten(lst, [])

That generates the entire list in the res variable of the inner function as it’s recursing through the datastructure. That could be a very long list, so perhaps a generator works better?:

      def flatten(lst):
          for item in lst:
              if isinstance(item, basestring):
                  yield item
              else:
                  try:
                      flatten(iter(item))
                  except TypeError: # iterator over nonsequence
                      yield item
          return

That looks pretty good 🙂

So what was the reason I had to write flatten this time?…

      >>> from html import table, tr, td
      >>> td_num = html.mktag('td', width="20%", style="background:aqua")
      >>> table(tr(td_num(x), td(x**2)) for x in range(6))

Which gives

0 0
1 1
2 4
3 9
4 16
5 25

the __str__ for all the classes is only defined for the base class, which needed to flatten its contents…

This entry was posted in Python. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *