On Oct 16, 2009, at 02:30 PM, Gary Poster wrote: >Gary Poster has proposed merging lp:~gary/launchpad/buildout-py into lp:launchpad/devel. > > Requested reviews: > Barry Warsaw (barry) > > >The main goal of this branch was to get the build working when lazr packages were installed into the system. When you have the same namespace ("lazr," for instance) in site-packages and in eggs, the site-package namespace package would mask all of the eggs in the same namespace. Hi Gary, Thanks so much for working on this branch. From all the discussions we had about it, I'm sure it was "fun". :) >When reviewing, in addition to looking at the diff in Launchpad, you should probably also look at the changes in the scripts that buildout is producing. The ``bin/py`` script, for instance, is wildly different now: it used to use PYTHONPATH, and now it uses the faux-interpreter that buildout produces. I'm going to try the branch out as part of the review, though I might do that in two separate responses. >The ``bin/py`` script is worth calling out for another reason. Codehosting noticed a marked increase in slowness when 3.0 launched, and investigated to find that at least some of the problem may be put on setuptools' door. Michael Hudson reported that the speed to start a Python interpreter increased quadratically based on the length of sys.path. One of the codehosting tools relies on starting an Python process for every operation, so this start-up time is noticeable. I was a bit worried that my changes in this branch would exacerbate the problem, because ``bin/py`` has moved from a bash script relying on PYTHONPATH to pass its configuration to a buildout-generated faux interpreter. According to tests of ``time bin/py -c ''`` in the old and new versions, the speed is about the same (between .2 and .3 seconds on my machine, about a factor of ten slower than ``time python2.4 -c ''``). Therefore, at least I have not made the problem worse. That's good. I'm fairly concerned about the quadratic impact of long sys.path entries. This is something we should pursue in upstream, especially if super long sys.paths are going to be a common occurrence with buildout (or virtualenv or pip) based development. >Since bin/py is now a faux-interpreter, it does not support the -t option ("issue warnings about inconsistent tab usage"). We use that option a few times in the Makefile, so I call the faux Python interpreter with the actual Python executable, and give the -t flag to the real executable. Does it make sense to try to extend bin/py to understand -t? That's probably an upstream change, and if so, out of scope right now for this branch. IWBNI bin/py understood all the command line options that `python` understood. >buildmailman.py: > >Thanks to Barry Warsaw's inspiration, I was able to have Mailman use the correct paths (and get the fix for namespace packages described at the start of this comment) by simply telling it to configure using bin/py rather than the actual Python executable. And you trusted that Barry guy? You're brave and gullible. But okay, I'll let this one slide. :) >We now use a buildout-generated faux-interpreter for bin/py, rather than the PYTHONPATH approach. The PYTHONPATH approach is still available via ``pythonpath-py`` though. Operations sometimes do unexpected things, and I want to have the old approach easily available for emergency changes. Sounds good to me. >buildout-templates/bin/test.in: > >This has the same sort of changes as described in _pythonpath.py. One difference worth highlighting is that we are now using a new feature of z3c.recipe.buildout to produce indented lists of paths, rather than one huge long list of paths. This can make the file easier to read. There are a few fly-bys of fixing comment capitalization and punctuation. It also mentions the site.py fork, which I will discuss below. Nice (well the indented list bit...) >lib/canonical/launchpad/rest/configuration.py: > >Karmic uses a newer version of libxml2 than Jaunty and Hardy. This version tests that ids are unique. This exposed a problem in the lazr.restful package (thank you to flacoste for the pointer!). I fixed the lazr.restful package and made a new release. This new release incorporated another change that Leonard had made for ISD, which we needed to accommodate by adding the set_hop_by_hop_headers attribute. Here is the description of hop_by_hop_headers from lazr.restful.interfaces._rest.py: > > set_hop_by_hop_headers = Bool( > title=u"Whether your application server will allow lazr.restful " > "to set hop-by-hop headers.", > default=True, > description=u"Ordinarily, lazr.restful automatically compresses " > "outgoing representations and sets the Transfer-Encoding " > "header accordingly. (Apache's mod_compress does something " > "similar, but it sets the Content-Encoding header, which means " > "it has to munge the ETag header, which is no good for our purposes.) " > "But WSGI application servers prohibit intermediaries from " > "setting hop-by-hop headers like Transfer-Encoding. The best you can " > "do in that situation is avoid setting those headers and use a " > "web server that does compression using Transfer-Encoding.") > >Zope's WSGI implementation is apparently not strict enough to cause the problem that the interface describes, which is good news for us practically, because that means we can use lazr.restful's better handling. I submitted this bug report on lazr.restful last night on this very issue. https://bugs.edge.launchpad.net/lazr.restful/+bug/452750 It broke Mailman 3. Now, I fully understand that a pre-1.0 release makes no guarantees of API stability, but it was still disconcerting, especially because the NEWS file didn't really describe the change's impact on existing applications. >lib/lp/codehosting/tests/test_lpserve.py: > >Hopefully the new comment describes the situation clearly. Michael Hudson was willing to go along with this solution, given that we will be able to rip it out soon. When you say "rip it out soon" do you mean when we move to Python 2.6? Or is that to do some changes coming soon in codehosting? Maybe I'll find out when I get to the code. ;) >lib/site.py: > >This is the hack--forking the Python standard library site.py--that I am saddest about in this branch. I talked about it with Francis Lacoste and he blessed the approach, under the circumstances. I also have described it to Barry Warsaw and Björn Tillenius, so they are at least aware, if not necessarily accepting. > >The comment at the top describes the rationale, and highlights the change. If PEP 382 (namespace packages, http://www.python.org/dev/peps/pep-0382/) fixes the underlying problem then hopefully we can get rid of this fork, as well as some of the other more unpleasant bits in this branch. That won't be till Python 2.7 at the soonest, though. > >You should not have to review the file after this block, near the very top: > >+try: >+ __import__('pkg_resources') # Use __import__ to not pollute the namespace. >+except ImportError: >+ pass > >Everything else is from the Python (2.5) standard library. I suggested an experiment on IRC. Instead of copying the whole site.py file to add this little bit at the top, why not make lib/site.py execfile the real site.py after it does the minimal import of pkg_resources? That way you don't have to copy the whole site.py file to here, or track changes to Python's version of the file. You should be able to calculate the path to the real site.py from sys attributes. I suggested a 15m experiment to see if this approach can work. If not, feel free to punt and keep what you've got. Practicality beats purity. On to the code! Because of the size, I'll delete anything that seems uncontroversial to me. === modified file 'buildmailman.py' --- buildmailman.py 2009-09-11 02:17:29 +0000 +++ buildmailman.py 2009-10-16 14:30:32 +0000 > @@ -88,7 +88,7 @@ > './configure', > '--prefix', mailman_path, > '--with-var-prefix=' + var_dir, > - '--with-python=' + sys.executable, > + '--with-python=' + os.path.abspath('bin/py'), > '--with-username=' + user, > '--with-groupname=' + group, > '--with-mail-gid=' + group, Man I am so happy this works. The alternative was horrendous. === modified file 'buildout-templates/_pythonpath.py.in' --- buildout-templates/_pythonpath.py.in 2009-08-21 19:13:05 +0000 +++ buildout-templates/_pythonpath.py.in 2009-10-16 14:30:32 +0000 > @@ -6,15 +6,71 @@ > > __metaclass__ = type > > -import sys, os > - > -sys.path[0:0] = [${string-paths}] > -# Enable Storm's C extensions > -os.environ['STORM_CEXTENSIONS'] = '1' > - > -# We don't want to bother tests or logs with these. > -import warnings > -warnings.filterwarnings( > - 'ignore', > - 'Module .+ was already imported from .+, but .+ is being added.*', > - UserWarning) > +import sys > +import os > +import site Alphabetize? > + > +stdlib_paths = [ > + ${indented-stdlib-paths} > + ] > +egg_paths = [ > + ${indented-egg-paths} > + ] > +site_dirs = [ > + ${indented-dir-paths} > + ] > +clean_modules = [ > + 'os', > + '_pythonpath', > + 'site', > + 'sitecustomize', > + 'sys', > + ${clean-sys-modules} > + ] > + > +def set_path(): > + previously_imported = {} > + for k in sys.modules.keys(): Please add a comment above this line explaining why you're using .keys() here. It's not completely obvious at first reading that it's because you're mutating sys.modules inside the loop. > + if k not in clean_modules: > + previously_imported[k] = sys.modules.pop(k) > + > + # We keep the very first path because that is typically the directory > + # of the file that imported us, which should continue to have precedence. > + sys.path[1:] = egg_paths > + sys.path.extend(stdlib_paths) > + # Add the site_dirs before `addsitedir` in case it has setuptools. > + sys.path.extend(site_dirs) > + # Process all buildout-controlled eggs before site-packages by importing > + # pkg_resources. This is only important for namespace packages, so it may > + # not have been added, so ignore import errors. > + try: > + import pkg_resources > + except ImportError: > + pass > + # Process .pth files. > + for p in site_dirs: > + site.addsitedir(p) > + > + # We don't want to have dropped any packages that weren't already added > + # back by what we just did. If we did, there's a good chance that the > + # world will now be insane. Quit now, and let's fix it. > + unexpected = set(previously_imported).difference(sys.modules) > + if unexpected: > + raise RuntimeError( > + 'Found unexpected module(s): %s\n\nImport _pythonpath earlier!' % > + (', '.join(sorted(unexpected)),)) > + > + # Enable Storm's C extensions > + os.environ['STORM_CEXTENSIONS'] = '1' > + > + # We don't want to bother tests or logs with these. > + import warnings > + warnings.filterwarnings( > + 'ignore', > + 'Module .+ was already imported from .+, but .+ is being added.*', > + UserWarning) > + > +try: > + import canonical > +except ImportError: > + set_path() === modified file 'buildout-templates/bin/test.in' --- buildout-templates/bin/test.in 2009-09-17 17:42:25 +0000 +++ buildout-templates/bin/test.in 2009-10-16 14:30:32 +0000 > @@ -14,7 +14,49 @@ > ############################################################################## > """Test script > """ > -import sys, os, time, logging, warnings, re > + > +# NOTE: This is a generated file. The original is in > +# buildout-templates/bin/test.in > + > +import site > +import sys etc... This looks a lot like _pythonpath.py.in. Is it at all possible to share the code somehow? If not, I'm afraid that if/when you have to patch one you'll have to remember to patch them both. === modified file 'lib/lp/codehosting/tests/test_lpserve.py' --- lib/lp/codehosting/tests/test_lpserve.py 2009-07-17 02:25:09 +0000 +++ lib/lp/codehosting/tests/test_lpserve.py 2009-10-16 14:30:32 +0000 > @@ -6,6 +6,7 @@ > __metaclass__ = type > > import os > +import re > from subprocess import PIPE > import unittest > > @@ -30,7 +31,35 @@ > > def assertFinishedCleanly(self, result): > """Assert that a server process finished cleanly.""" > - self.assertEqual((0, '', ''), tuple(result)) > + # XXX gary 2009-10-15 bug 316192 Can you add a comment to that bug explaining that this test method should be cleaned up once Launchpad is updated? === added file 'lib/site.py' --- lib/site.py 1970-01-01 00:00:00 +0000 +++ lib/site.py 2009-10-16 14:30:32 +0000 > @@ -0,0 +1,432 @@ > +# LAUNCHPAD FORKED VERSION OF STDLIB SITE.PY > +# Why are we forking site.py? > +# The short answer is that namespace packages in setuptools have problems. > +# A longer answer is that we have to import pkg_resources before namespace > +# package .pth files are processed or else the distribution's namespace > +# packages will mask all of the egg-based packages in the same namespace > +# package. Normally, we handle that in bin/py or _pythonpath. but sometimes > +# we do subprocess calls, relying on the PYTHONPATH to set the eggs > +# correctly. It is for this situation that we have hacked site.py. Here > +# is the change: > +try: > + __import__('pkg_resources') # Use __import__ to not pollute the namespace. > +except ImportError: > + pass You followed up on IRC that the experiment with execfile worked. Yay! merge-conditional, r=me with consideration of the above. review approve status approve