Thursday, January 03, 2008

Django on Jython: Minding the Gap

Summary

The most important thing to know about Django on Jython is that we are almost there, and with clean code. End-to-end functionality is demonstrated by the admin tool running in full CRUD, along with a substantial number of unit tests and syncdb. But this has been achieved by so far requiring only 6 lines of code in changes to Django trunk. (There will be more, however, see below.)

Running on Jython

To run Django on Jython, with a PostgreSQL backend, the following steps are necessary:

  • Use the Modern branch of Jython. This consolidated the bugs, workarounds, and patches of numerous people - plus a bunch more - in a stable, almost-ready-to-be-merged-into-trunk version of Jython. The most important aspect is that we have tried to make Jython conform more to CPython, using Django as our guide, although there are some gaps - especially if Django already had incorporated fixes. Our driving goal is to converge on these gaps over time. Please note that is intended to be stable, performant code.

  • Use the Django trunk (tested with rev 6992, later should be OK too).
  • Apply these two patches, django.dispatch.robustapply (diff) and django.views.debug (diff) due to Leo Soto. I would imagine these will be in Django trunk soon.

  • Copy these three files from CPythonLib to Lib: gettext.py, locale.py, optparse.py. Please note that these files are only partially working on Jython, that's why they haven't been promoted yet (gettext.py actually works, as verified by test_gettext.py, but depends on still failing locale.py). But they are very close, and they appear to be fine for Django. Certainly fine for this round of development!
  • Use the database backend zxjdbc_postgresql, which was contributed by Leo Soto. Frank Wierzbicki has an experimental backend for MySQL, this should be incorporated soon.

Status

Here's what works:

syncdb and the very cool Django admin run; many unit tests pass. You can run with internationalization enabled. You do need to run the dev server with --noreload for now. We need to document here how to run with modjy, which is Alan Kennedy's servlet container for WSGI apps.

In running the model unit tests, here are the things we seem to be missing, accounting for most of the approximately 75 failures:

  • Many doctests are fragile, because they depend on the dict traversal ordering; in Jython, this is different that CPython, and if we adopt ConcurrentHashMap, it's not even repeatable. This would seem to be a pervasive bug in Django.

  • We still have some encoding problems, again seen in doctests. An example where output is expected to be lower case hex, not upper case. I fixed the problem in PyUnicode, but there are more places.

  • Problem with the ManagerDescriptor handling, in django.db.models.manager.

  • No decorators yet! (But they are coming soon, and are now available experimentally for Jython in the newcompiler work I have been leading.)

There may be some other rough categories, we need to look at the failures more systematically. All that doctest noise is certainly annoying!

Next Steps

On the Django front, get more of the unit tests running!

Before we can push modern into trunk, the following needs to be done:

  • The test_extcall unit test currently fails. This appears to be a dependency on dict traversal being repeatable, a bad assumption. However, it's a mind bending test. The 2.3 version is particularly problematic because it's not modular at all. Google's GHOP has just produced an improved version for Python 2.6 - we will look at this as a starting point.

  • Tristan King provided a near complete subset of the functionality for time.strptime, as implemented in org.python.modules.time.Time. This needs to be enhanced. I just tested this, and all unit tests in the CPythonLib version of test_time now pass except for strptime -- specifically the conversion specifier '%c' -- so we can also move to that, and discard our Jython version, when this is completed. That should be soon!

  • Decide whether we should use ConcurentHashMap or not as the backing hash map for dict and __dict__. CHM introduces creation overhead, but it should prove to be far more scalable on multicore systems. The programming model is also far nicer with respect to Jython.

Saturday, December 16, 2006

Pythoneers Monthly Meeting: This Wednesday, December 20, in Boulder, Colorado

Important Change: we will be meeting at bivio Software instead of Jill's to better accommodate this month's demos.

This coming Wednesday (December 20) we are having our monthly meeting for the Front Ranage Pythoneers. Come join a lively discussion of Python demos, features, tips & techniques, and directions, both for fun and professional development.

Here are the meeting specifics:

  • Date/time: Wednesday, December 20, 6-8 PM
  • Location: bivio Software, Inc., 28th and Iris. Above Hair Elite in Suite S. Google Maps link
  • Tom Churchill and Vinny will demo Churchill Navigation's earth-rendering engine (which looks like Google Earth, only apparently even better and faster ;) ). Vinny (their main Python guy) will explain how they built the glue logic (and why they decided against SWIG) and perhaps some of the implications of using Python as a scripting language in a real-time (60 fps) environment, and the techniques we employed to keep the graphics pipeline from stalling when making an expensive call into their engine from Python.
  • Brian Granger from Tech-X will help us think more deeply about concurrent Python programming, especially as seen in a new version of IPython he has been working on.
  • BoulderSprint. Eric Dobbs proposed we adopt Jython, and this looks like we have enough momentum to actually get some useful work done. We will talk about the upcoming sprint to be held on Saturday, January 6.
We will have food & drink available. Did I mention the free beer? Hope to see you there.

- Jim

Labels: , , ,

Friday, November 10, 2006

Pythoneers Monthly Meeting - This Wednesday, Nov 15, in Boulder, Colorado

This coming Wednesday (November 15) we are having our monthly meeting for the Front Range Pythoneers. Please note, we have moved to Jill's at the St Julien Hotel. Come join a lively discussion of Python features, tips & techniques, and directions, both for fun and professional development.

Here are the meeting specifics:
  • Status: Always on
  • Date/time: Every 3rd Wednesday, 6-8 PM
  • Location: Jill's at the St Julien Hotel (9th and Walnut), the bar area. Jill's combines a beautiful room, great food and beverages, and happy-hour pricing. And for this meeting: your first draft beer (up to 20 people) is free!
Hope to see you there.

We've recently set up a wiki. There, you can help us expand the Guide to Front Range Pythoneering. Or contribute ideas to future sprints and jams.

Monday, November 06, 2006

Boulder Sprint: Adding Oracle support to Django

We had a great turnout on Saturday for the first Boulder Sprint held by the Front Range Pythoneers. Our goal was to provide production-level support of Oracle in Django 1.0. I'm glad to report that we made a strong start on this goal.

Django developer Jacob Kaplan-Moss flew out to Boulder from Lawrence, Kansas, providing us both leadership and guidance into the Django internals. (Next time I hope Jacob doesn't have to fly out the next day on a 6 AM flight!) From Array Biopharma, we had five developers: Ian Kelly, Matt Boersma, Matt Drew, Michelle Cyr, and Mitch Smith. Eric Dobbs, of Bivio, contributed both the space and his seasoned Python skills. And there was me (Jim Baker). Thanks to everyone for your hard work!

We worked on a number of key issues for supporting Oracle:
  • Perhaps most important, Jacob split out Oracle-specific functionality into the Oracle backend, allowing for more modularity. Django uses quite portable code, in conjunction with the Python DB2 API, but Oracle has its peculiarities. Being pragmatic, we had to work through that.
  • Mapping Django's TextFields to Oracle's CLOBs, not LONGs, which pretty much are deprecated. (Remember Django's origin, we certainly need support for text!) However, supporting CLOBs required some changes: no buffering in the Python layer, just iterate directly over the cursor; explicitly read in data from the LOB reference; prepare the OCI by giving cx_Oracle explicit type information (also necessary for timestamps with greater than one second precision).
  • Pagination queries. Django's ORM grew out of supporting PostgreSQL, which has OFFSET and LIMIT clauses, useful for the pagination queries often seen in stateless web apps. Oracle actually has quite good support for this type of queries but this fact is not well-known. And frankly it's a bit clumsy to use, requiring doubly nested subqueries. See Oracle guru Tom Kyte's article in Oracle magazine for more details. I made some progress on this front, but I still need to integrate it into the new django.db.backends.oracle.OracleQuerySet class added by Jacob.
  • Test schema support. Oracle uses the concept of "user schema" where other databases might use "database". There's a bit of trickiness in working appropriately with this, especially if there are tablespaces being set up for this test. Eric took the lead on this.
  • Mitch Smith wrote two gnarly Oracle-specific queries that have almost got
    introspection and Django's "inspectdb" command working correctly.
  • Part of our goal was to get all existing tests to pass from runtests.py, and we're about 70% there.
  • Array Biopharma now has their test web app, donuts, running.
And this is just what I saw from my side of the conference room table! We do have a photostream for the Boulder Sprint. Check out Matt B. and Michelle contemplating at the Oracle. Or Matt B. and Ian asking rhetorically, "How can we screw up a 3-line function?" There is also a wiki.

For the moment, until we get the work integrated in the Django trunk, you can checkout the Boulder Sprint branch here with Subversion:

svn co http://code.djangoproject.com/svn/django/branches/boulder-oracle-sprint

Tuesday, October 31, 2006

Boulder Sprint this Saturday

On November 4, 9 AM-6 PM, the Front Range Pythoneers is holding a sprint to complete the support for Oracle in Django. Why might you want to attend? Whether you're interested in Django, portable object-relational mapping code, how to optimize an Oracle execution plan for generated SQL, or just doing some intensive coding in Python, this should be a great opportunity to learn and contribute.

See the wiki for more details: http://wiki.python.org/moin/BoulderSprint