Code review comment for lp://staging/~gary/launchpad/buildout-py

Revision history for this message
Gary Poster (gary) wrote :

The main goal of this branch was to get the build working when lazr packages were installed into the system. When you have the same namespace ("lazr," for instance) in site-packages and in eggs, the site-package namespace package would mask all of the eggs in the same namespace.

Fixing this happened first in our upstream zc.buildout branch. Then we also have the same kind of problem when we start subprocesses; and when we start standalone scripts in the "scripts," "utilities," and "cron" directories. For those, we use PYTHONPATH and _pythonpath.py, respectively. The branch needed to make these approaches also work around the namespace package problem.

This work happened on Karmic, because that's where the problem was first discovered, because that's what we are supposed to be dogfooding, and because some lazr packages are installed by default in Karmic because of apport (and maybe other things) so the problem happens consistently there. Trying to get tests to pass in Karmic discovered a few other issues, which I addressed here. I should have been using a pipeline and then put this work in an earlier branch, but I did not. If you want me to divide this work up after the fact, that will take some time, but I will completely understand the request. For now, though, we have a diff that is almost 1300 lines long.

When reviewing, in addition to looking at the diff in Launchpad, you should probably also look at the changes in the scripts that buildout is producing. The ``bin/py`` script, for instance, is wildly different now: it used to use PYTHONPATH, and now it uses the faux-interpreter that buildout produces.

The ``bin/py`` script is worth calling out for another reason. Codehosting noticed a marked increase in slowness when 3.0 launched, and investigated to find that at least some of the problem may be put on setuptools' door. Michael Hudson reported that the speed to start a Python interpreter increased quadratically based on the length of sys.path. One of the codehosting tools relies on starting an Python process for every operation, so this start-up time is noticeable. I was a bit worried that my changes in this branch would exacerbate the problem, because ``bin/py`` has moved from a bash script relying on PYTHONPATH to pass its configuration to a buildout-generated faux interpreter. According to tests of ``time bin/py -c ''`` in the old and new versions, the speed is about the same (between .2 and .3 seconds on my machine, about a factor of ten slower than ``time python2.4 -c ''``). Therefore, at least I have not made the problem worse.

Now I'll start describing the changes to this branch in more detail.

Makefile:

Since bin/py is now a faux-interpreter, it does not support the -t option ("issue warnings about inconsistent tab usage"). We use that option a few times in the Makefile, so I call the faux Python interpreter with the actual Python executable, and give the -t flag to the real executable.

buildmailman.py:

Thanks to Barry Warsaw's inspiration, I was able to have Mailman use the correct paths (and get the fix for namespace packages described at the start of this comment) by simply telling it to configure using bin/py rather than the actual Python executable.

buildout-templates/_pythonpath.py.in:

This script template changed significantly.

Much of the changes are very similar to the tricks our buildout branch now does with its scripts. It cleans out sys.modules of the packages that are not present when the executable is started with -S ("don't imply 'import site' on initialization"). This means that namespace packages from site-packages, which insert dumb namespace modules that block the eggs in their .pth files, are cleaned out. Then it sets up the path, imports pkg_resources before reprocessing .pth files to get the eggs into the story, and then processes .pth files. Then, it tries to inject a bit of sanity: messing with sys.modules is scary, and if we did anything other than what we expected, we bail out. As the error message describes, the general solution to this error is to import _pythonpath earlier, where "first" is an ideal definition of "earlier." It then sets up Storm and registers a warning filter, as we do elsewhere. We only do all of this work if it is necessary: we use importing "canonical" as a smoke test.

buildout-templates/bin/py.in, buildout-templates/bin/pythonpath-py.in:

We now use a buildout-generated faux-interpreter for bin/py, rather than the PYTHONPATH approach. The PYTHONPATH approach is still available via ``pythonpath-py`` though. Operations sometimes do unexpected things, and I want to have the old approach easily available for emergency changes.

buildout-templates/bin/test.in:

This has the same sort of changes as described in _pythonpath.py. One difference worth highlighting is that we are now using a new feature of z3c.recipe.buildout to produce indented lists of paths, rather than one huge long list of paths. This can make the file easier to read. There are a few fly-bys of fixing comment capitalization and punctuation. It also mentions the site.py fork, which I will discuss below.

buildout.cfg:

This specifies that we want buildout to generate bin/py (``interpreter = py``). It also does another comment fly-by for capitalization and punctuation.

cronscripts/check-teamparticipation.py, cronscripts/create-debwatches.py, cronscripts/update-debwatches.py, cronscripts/update-remote-product.py, cronscripts/update-sourceforge-remote-products.py, scripts/bug-export.py, scripts/bug-import.py, scripts/cache-country-mirrors.py, scripts/entitlements-to-lp.py, scripts/find-email-clusters.py, scripts/ftpmaster-tools/remove-package.py, scripts/ftpmaster-tools/sync-source.py, scripts/import-packagenames.py, scripts/merge-email-clusters.py, scripts/mlist-import.py, scripts/mlist-sync.py, scripts/rosetta/remove-upstream-translations.py, scripts/sourceforge-import.py, scripts/upload2librarian.py, utilities/check-scripts.py, utilities/lsconf.py, utilities/paste:

Per the discussion of _pythonpath.py.in, the ``import _pythonpath`` line needs to happen first.

lib/canonical/launchpad/rest/configuration.py:

Karmic uses a newer version of libxml2 than Jaunty and Hardy. This version tests that ids are unique. This exposed a problem in the lazr.restful package (thank you to flacoste for the pointer!). I fixed the lazr.restful package and made a new release. This new release incorporated another change that Leonard had made for ISD, which we needed to accommodate by adding the set_hop_by_hop_headers attribute. Here is the description of hop_by_hop_headers from lazr.restful.interfaces._rest.py:

    set_hop_by_hop_headers = Bool(
        title=u"Whether your application server will allow lazr.restful "
        "to set hop-by-hop headers.",
        default=True,
        description=u"Ordinarily, lazr.restful automatically compresses "
        "outgoing representations and sets the Transfer-Encoding "
        "header accordingly. (Apache's mod_compress does something "
        "similar, but it sets the Content-Encoding header, which means "
        "it has to munge the ETag header, which is no good for our purposes.) "
        "But WSGI application servers prohibit intermediaries from "
        "setting hop-by-hop headers like Transfer-Encoding. The best you can "
        "do in that situation is avoid setting those headers and use a "
        "web server that does compression using Transfer-Encoding.")

Zope's WSGI implementation is apparently not strict enough to cause the problem that the interface describes, which is good news for us practically, because that means we can use lazr.restful's better handling.

lib/lp/codehosting/tests/test_lpserve.py:

Hopefully the new comment describes the situation clearly. Michael Hudson was willing to go along with this solution, given that we will be able to rip it out soon.

lib/lp/services/scripts/tests/test_all_scripts.py:

I had to fix a lot of scripts (moving _pythonpath to be the first import) so I changed the script tester to report all the problems at once. This saved me time, and I thought it was a reasonable change generally, so I left it in.

lib/site.py:

This is the hack--forking the Python standard library site.py--that I am saddest about in this branch. I talked about it with Francis Lacoste and he blessed the approach, under the circumstances. I also have described it to Barry Warsaw and Björn Tillenius, so they are at least aware, if not necessarily accepting.

The comment at the top describes the rationale, and highlights the change. If PEP 382 (namespace packages, http://www.python.org/dev/peps/pep-0382/) fixes the underlying problem then hopefully we can get rid of this fork, as well as some of the other more unpleasant bits in this branch. That won't be till Python 2.7 at the soonest, though.

You should not have to review the file after this block, near the very top:

+try:
+ __import__('pkg_resources') # Use __import__ to not pollute the namespace.
+except ImportError:
+ pass

Everything else is from the Python (2.5) standard library.

scripts/update-bzr-version-info.sh:

Setting the PYTHONPATH to lib confused bzr, so that the bzr-version-info.py was empty. Removing the PYTHONPATH made it work again.

utilities/windmill.py:

This was blocking the import of the actual windmill package by some scripts. Francis Lacoste told me that we no longer needed it.

versions.cfg:

We're using the newest lazr.restful package, for the fix I described above relating to libxml2 versions. We're using the newest versions of my three buildout-related branches to address the namespace issue, as well get a few other cleanups and features (like the indented paths).

« Back to merge proposal