On Oct 16, 2009, at 1:08 PM, Barry Warsaw wrote: > On Oct 16, 2009, at 02:30 PM, Gary Poster wrote: > >> Gary Poster has proposed merging lp:~gary/launchpad/buildout-py >> into lp:launchpad/devel. >> >> Requested reviews: >> Barry Warsaw (barry) >> >> >> The main goal of this branch was to get the build working when lazr >> packages were installed into the system. When you have the same >> namespace ("lazr," for instance) in site-packages and in eggs, the >> site-package namespace package would mask all of the eggs in the >> same namespace. > > Hi Gary, > > Thanks so much for working on this branch. From all the discussions > we had about it, I'm sure it was "fun". :) :-) yup. >> When reviewing, in addition to looking at the diff in Launchpad, >> you should probably also look at the changes in the scripts that >> buildout is producing. The ``bin/py`` script, for instance, is >> wildly different now: it used to use PYTHONPATH, and now it uses >> the faux-interpreter that buildout produces. > > I'm going to try the branch out as part of the review, though I > might do that in two separate responses. > >> The ``bin/py`` script is worth calling out for another reason. >> Codehosting noticed a marked increase in slowness when 3.0 >> launched, and investigated to find that at least some of the >> problem may be put on setuptools' door. Michael Hudson reported >> that the speed to start a Python interpreter increased >> quadratically based on the length of sys.path. One of the >> codehosting tools relies on starting an Python process for every >> operation, so this start-up time is noticeable. I was a bit >> worried that my changes in this branch would exacerbate the >> problem, because ``bin/py`` has moved from a bash script relying on >> PYTHONPATH to pass its configuration to a buildout-generated faux >> interpreter. According to tests of ``time bin/py -c ''`` in the >> old and new versions, the speed is about the same (between .2 and . >> 3 seconds on my machine, about a factor of ten slower than ``time >> python2.4 -c ''``). Therefore, at least I have not made the >> problem worse. > > That's good. I'm fairly concerned about the quadratic impact of > long sys.path entries. This is something we should pursue in > upstream, especially if super long sys.paths are going to be a > common occurrence with buildout (or virtualenv or pip) based > development. Agreed. Francis has already mentioned maybe pursuing Distribute. Maybe we should be working with them and trying to fix this there, >> Since bin/py is now a faux-interpreter, it does not support the -t >> option ("issue warnings about inconsistent tab usage"). We use >> that option a few times in the Makefile, so I call the faux Python >> interpreter with the actual Python executable, and give the -t flag >> to the real executable. > > Does it make sense to try to extend bin/py to understand -t? That's > probably an upstream change, and if so, out of scope right now for > this branch. IWBNI bin/py understood all the command line options > that `python` understood. It would be nice if it understood all of the normal flags. I actually did do some work upstream to make buildout's faux interpreter closer to the real one, but I couldn't see how to make -t work, or I would have done it. >> buildmailman.py: >> >> Thanks to Barry Warsaw's inspiration, I was able to have Mailman >> use the correct paths (and get the fix for namespace packages >> described at the start of this comment) by simply telling it to >> configure using bin/py rather than the actual Python executable. > > And you trusted that Barry guy? You're brave and gullible. But > okay, I'll let this one slide. > > :) :-) >> We now use a buildout-generated faux-interpreter for bin/py, rather >> than the PYTHONPATH approach. The PYTHONPATH approach is still >> available via ``pythonpath-py`` though. Operations sometimes do >> unexpected things, and I want to have the old approach easily >> available for emergency changes. > > Sounds good to me. > >> buildout-templates/bin/test.in: >> >> This has the same sort of changes as described in _pythonpath.py. >> One difference worth highlighting is that we are now using a new >> feature of z3c.recipe.buildout to produce indented lists of paths, >> rather than one huge long list of paths. This can make the file >> easier to read. There are a few fly-bys of fixing comment >> capitalization and punctuation. It also mentions the site.py fork, >> which I will discuss below. > > Nice (well the indented list bit...) heh >> lib/canonical/launchpad/rest/configuration.py: >> >> Karmic uses a newer version of libxml2 than Jaunty and Hardy. This >> version tests that ids are unique. This exposed a problem in the >> lazr.restful package (thank you to flacoste for the pointer!). I >> fixed the lazr.restful package and made a new release. This new >> release incorporated another change that Leonard had made for ISD, >> which we needed to accommodate by adding the set_hop_by_hop_headers >> attribute. Here is the description of hop_by_hop_headers from >> lazr.restful.interfaces._rest.py: >> >> set_hop_by_hop_headers = Bool( >> title=u"Whether your application server will allow >> lazr.restful " >> "to set hop-by-hop headers.", >> default=True, >> description=u"Ordinarily, lazr.restful automatically >> compresses " >> "outgoing representations and sets the Transfer-Encoding " >> "header accordingly. (Apache's mod_compress does something " >> "similar, but it sets the Content-Encoding header, which >> means " >> "it has to munge the ETag header, which is no good for our >> purposes.) " >> "But WSGI application servers prohibit intermediaries from " >> "setting hop-by-hop headers like Transfer-Encoding. The best >> you can " >> "do in that situation is avoid setting those headers and use >> a " >> "web server that does compression using Transfer-Encoding.") >> >> Zope's WSGI implementation is apparently not strict enough to cause >> the problem that the interface describes, which is good news for us >> practically, because that means we can use lazr.restful's better >> handling. > > I submitted this bug report on lazr.restful last night on this very > issue. > > https://bugs.edge.launchpad.net/lazr.restful/+bug/452750 > > It broke Mailman 3. Now, I fully understand that a pre-1.0 release > makes no guarantees of API stability, but it was still > disconcerting, especially because the NEWS file didn't really > describe the change's impact on existing applications. Good point. Leonard and I both should have thought of this. Hopefully we will both be more aware of this going forward. >> lib/lp/codehosting/tests/test_lpserve.py: >> >> Hopefully the new comment describes the situation clearly. Michael >> Hudson was willing to go along with this solution, given that we >> will be able to rip it out soon. > > When you say "rip it out soon" do you mean when we move to Python > 2.6? Or is that to do some changes coming soon in codehosting? > Maybe I'll find out when I get to the code. ;) codehosting: when they switch to the next bzr release, we should get the bug fix that makes this go away. >> lib/site.py: >> >> This is the hack--forking the Python standard library site.py--that >> I am saddest about in this branch. I talked about it with Francis >> Lacoste and he blessed the approach, under the circumstances. I >> also have described it to Barry Warsaw and Björn Tillenius, so they >> are at least aware, if not necessarily accepting. >> >> The comment at the top describes the rationale, and highlights the >> change. If PEP 382 (namespace packages, http://www.python.org/dev/peps/pep-0382/ >> ) fixes the underlying problem then hopefully we can get rid of >> this fork, as well as some of the other more unpleasant bits in >> this branch. That won't be till Python 2.7 at the soonest, though. >> >> You should not have to review the file after this block, near the >> very top: >> >> +try: >> + __import__('pkg_resources') # Use __import__ to not pollute >> the namespace. >> +except ImportError: >> + pass >> >> Everything else is from the Python (2.5) standard library. > > > I suggested an experiment on IRC. Instead of copying the whole > site.py file to add this little bit at the top, why not make lib/ > site.py execfile the real site.py after it does the minimal import > of pkg_resources? That way you don't have to copy the whole site.py > file to here, or track changes to Python's version of the file. You > should be able to calculate the path to the real site.py from sys > attributes. > > I suggested a 15m experiment to see if this approach can work. If > not, feel free to punt and keep what you've got. Practicality beats > purity. As you say later in the email, your idea bore fruit. Yay! > On to the code! Because of the size, I'll delete anything that > seems uncontroversial to me. > > === modified file 'buildmailman.py' > --- buildmailman.py 2009-09-11 02:17:29 +0000 > +++ buildmailman.py 2009-10-16 14:30:32 +0000 >> @@ -88,7 +88,7 @@ >> './configure', >> '--prefix', mailman_path, >> '--with-var-prefix=' + var_dir, >> - '--with-python=' + sys.executable, >> + '--with-python=' + os.path.abspath('bin/py'), >> '--with-username=' + user, >> '--with-groupname=' + group, >> '--with-mail-gid=' + group, > > Man I am so happy this works. The alternative was horrendous. :-) > === modified file 'buildout-templates/_pythonpath.py.in' > --- buildout-templates/_pythonpath.py.in 2009-08-21 19:13:05 +0000 > +++ buildout-templates/_pythonpath.py.in 2009-10-16 14:30:32 +0000 >> @@ -6,15 +6,71 @@ >> >> __metaclass__ = type >> >> -import sys, os >> - >> -sys.path[0:0] = [${string-paths}] >> -# Enable Storm's C extensions >> -os.environ['STORM_CEXTENSIONS'] = '1' >> - >> -# We don't want to bother tests or logs with these. >> -import warnings >> -warnings.filterwarnings( >> - 'ignore', >> - 'Module .+ was already imported from .+, but .+ is being >> added.*', >> - UserWarning) >> +import sys >> +import os >> +import site > > Alphabetize? Done. >> + >> +stdlib_paths = [ >> + ${indented-stdlib-paths} >> + ] >> +egg_paths = [ >> + ${indented-egg-paths} >> + ] >> +site_dirs = [ >> + ${indented-dir-paths} >> + ] >> +clean_modules = [ >> + 'os', >> + '_pythonpath', >> + 'site', >> + 'sitecustomize', >> + 'sys', >> + ${clean-sys-modules} >> + ] >> + >> +def set_path(): >> + previously_imported = {} >> + for k in sys.modules.keys(): > > Please add a comment above this line explaining why you're > using .keys() here. It's not completely obvious at first reading > that it's because you're mutating sys.modules inside the loop. Done. >> + if k not in clean_modules: >> + previously_imported[k] = sys.modules.pop(k) >> + >> + # We keep the very first path because that is typically the >> directory >> + # of the file that imported us, which should continue to have >> precedence. >> + sys.path[1:] = egg_paths >> + sys.path.extend(stdlib_paths) >> + # Add the site_dirs before `addsitedir` in case it has >> setuptools. >> + sys.path.extend(site_dirs) >> + # Process all buildout-controlled eggs before site-packages by >> importing >> + # pkg_resources. This is only important for namespace >> packages, so it may >> + # not have been added, so ignore import errors. >> + try: >> + import pkg_resources >> + except ImportError: >> + pass >> + # Process .pth files. >> + for p in site_dirs: >> + site.addsitedir(p) >> + >> + # We don't want to have dropped any packages that weren't >> already added >> + # back by what we just did. If we did, there's a good chance >> that the >> + # world will now be insane. Quit now, and let's fix it. >> + unexpected = set(previously_imported).difference(sys.modules) >> + if unexpected: >> + raise RuntimeError( >> + 'Found unexpected module(s): %s\n\nImport _pythonpath >> earlier!' % >> + (', '.join(sorted(unexpected)),)) >> + >> + # Enable Storm's C extensions >> + os.environ['STORM_CEXTENSIONS'] = '1' >> + >> + # We don't want to bother tests or logs with these. >> + import warnings >> + warnings.filterwarnings( >> + 'ignore', >> + 'Module .+ was already imported from .+, but .+ is being >> added.*', >> + UserWarning) >> + >> +try: >> + import canonical >> +except ImportError: >> + set_path() > > === modified file 'buildout-templates/bin/test.in' > --- buildout-templates/bin/test.in 2009-09-17 17:42:25 +0000 > +++ buildout-templates/bin/test.in 2009-10-16 14:30:32 +0000 >> @@ -14,7 +14,49 @@ >> ############################################################################## >> """Test script >> """ >> -import sys, os, time, logging, warnings, re >> + >> +# NOTE: This is a generated file. The original is in >> +# buildout-templates/bin/test.in >> + >> +import site >> +import sys > > etc... > > This looks a lot like _pythonpath.py.in. Is it at all possible to > share the code somehow? If not, I'm afraid that if/when you have to > patch one you'll have to remember to patch them both. The problem is actually slightly bigger. This code is very similar to the code generated by zc.buildout for its scripts and its faux- interpreter. It's possible I'm being lazy or simply not being creative enough, but I don't see how to share the buildout code with these files. If I am right, I will always have to patch zc.buildout and our own code when something changes. But anyway, at your urging, I did figure out a way to eliminate one duplication of the code. I made test.in use bin/py as the executable, and then it can get rid of the sys.path hacks itself. > > === modified file 'lib/lp/codehosting/tests/test_lpserve.py' > --- lib/lp/codehosting/tests/test_lpserve.py 2009-07-17 02:25:09 +0000 > +++ lib/lp/codehosting/tests/test_lpserve.py 2009-10-16 14:30:32 +0000 >> @@ -6,6 +6,7 @@ >> __metaclass__ = type >> >> import os >> +import re >> from subprocess import PIPE >> import unittest >> >> @@ -30,7 +31,35 @@ >> >> def assertFinishedCleanly(self, result): >> """Assert that a server process finished cleanly.""" >> - self.assertEqual((0, '', ''), tuple(result)) >> + # XXX gary 2009-10-15 bug 316192 > > Can you add a comment to that bug explaining that this test method > should be cleaned up once Launchpad is updated? Good idea. Done. https://bugs.edge.launchpad.net/bzr/+bug/316192/comments/8 > === added file 'lib/site.py' > --- lib/site.py 1970-01-01 00:00:00 +0000 > +++ lib/site.py 2009-10-16 14:30:32 +0000 >> @@ -0,0 +1,432 @@ >> +# LAUNCHPAD FORKED VERSION OF STDLIB SITE.PY >> +# Why are we forking site.py? >> +# The short answer is that namespace packages in setuptools have >> problems. >> +# A longer answer is that we have to import pkg_resources before >> namespace >> +# package .pth files are processed or else the distribution's >> namespace >> +# packages will mask all of the egg-based packages in the same >> namespace >> +# package. Normally, we handle that in bin/py or _pythonpath. but >> sometimes >> +# we do subprocess calls, relying on the PYTHONPATH to set the eggs >> +# correctly. It is for this situation that we have hacked >> site.py. Here >> +# is the change: >> +try: >> + __import__('pkg_resources') # Use __import__ to not pollute >> the namespace. >> +except ImportError: >> + pass > > You followed up on IRC that the experiment with execfile worked. Yay! Yes. This is what site.py looks like now: # LAUNCHPAD HACK OF STDLIB SITE.PY # Why are we hacking site.py? # The short answer is that namespace packages in setuptools have problems. # A longer answer is that we have to import pkg_resources before namespace # package .pth files are processed or else the distribution's namespace # packages will mask all of the egg-based packages in the same namespace # package. Normally, we handle that in bin/py or _pythonpath. but sometimes # we do subprocess calls, relying on the PYTHONPATH to set the eggs # correctly. It is for this situation that we have hacked site.py. Here # is the important part: try: __import__('pkg_resources') # Use __import__ to not pollute the namespace. except ImportError: pass # Now, we want to get the usual site.py behavior. import os import sys execfile( os.path.join(sys.prefix, 'lib', 'python' + sys.version[:3], 'site.py')) > merge-conditional, r=me with consideration of the above. Cool. Thank you very much for the review! Here's the incremental diff, not including the site.py change. === modified file 'buildout-templates/_pythonpath.py.in' --- buildout-templates/_pythonpath.py.in 2009-10-16 03:18:11 +0000 +++ buildout-templates/_pythonpath.py.in 2009-10-16 19:58:03 +0000 @@ -6,9 +6,9 @@ __metaclass__ = type -import sys import os import site +import sys stdlib_paths = [ ${indented-stdlib-paths} @@ -30,6 +30,8 @@ def set_path(): previously_imported = {} + # We get a copy of the keys rather than simply iterating because we will + # be mutating the dictionary. for k in sys.modules.keys(): if k not in clean_modules: previously_imported[k] = sys.modules.pop(k) === modified file 'buildout-templates/bin/test.in' --- buildout-templates/bin/test.in 2009-10-15 04:19:54 +0000 +++ buildout-templates/bin/test.in 2009-10-16 19:53:05 +0000 @@ -1,4 +1,4 @@ -#!${buildout:executable} +#!${buildout:directory}/bin/py ############################################################################## # # Copyright (c) 2004 Zope Corporation and Contributors. @@ -18,45 +18,7 @@ # NOTE: This is a generated file. The original is in # buildout-templates/bin/test.in -import site -import sys - -stdlib_paths = [ - ${indented-stdlib-paths} - ] -egg_paths = [ - ${indented-egg-paths} - ] -site_dirs = [ - ${indented-dir-paths} - ] -clean_modules = [ - 'site', - 'sitecustomize', - 'sys', - ${clean-sys-modules} - ] - -# Clean out sys.modules from site's processing of .pth files. -for k in sys.modules.keys(): - if k not in clean_modules: - del sys.modules[k] - -sys.path[:] = egg_paths -sys.path.extend(stdlib_paths) -# Add the site_dirs before `addsitedir` in case it has setuptools. -sys.path.extend(site_dirs) -# Process all buildout-controlled eggs before site-packages by importing -# pkg_resources. -try: - import pkg_resources -except ImportError: - pass -# Process .pth files. -for p in site_dirs: - site.addsitedir(p) - -import os, time, logging, warnings, re +import logging, os, re, sys, time, warnings BUILD_DIR = '${buildout:directory}'