On Jul 13, 2009, at 5:06 PM, Francis J. Lacoste wrote: > On July 11, 2009, Gary Poster wrote: >> Gary Poster has proposed merging lp:~gary/zc.buildout/support- >> system-python >> into lp:~gary/zc.buildout/trunk. >> >> Requested reviews: >> Francis J. Lacoste (flacoste) >> >> I have made a number of changes to zc.buildout in the >> gary-support-system-python branch in zope.org's svn repository. >> They all >> support Launchpad or Landscape's use of Buildout with a system >> Python, >> directly or indirectly. >> > > Wow, this was a lot of changes! Yeah, sorry. >> From my cursory understanding, these changes looks good. I have a >> bunch of > questions / suggestions. So from the Launchpad point of view, it > seems to > satisfy our use case. > > Thanks a lot, this will make zc.buildout a lot more robust for our > use-case. > >> +# We have to manually parse our options rather than using one of >> the stdlib >> +# tools because we want to pass the ones we don't recognize along to >> +# zc.buildout.buildout.main. > > Wouldn't it have been possible to use a '--' marker to delimit the > bootstrap > options from the buildout ones: > > ./bootstrap.py --eggs eggs -- ZC.buildout options follows. > > That's a common idiom I think. It is, but it would break backwards compatibility of the usage of this script. This was a design decision from before I came a long. If Jim were OK with breaking backwards compatibility I'd be happy to go this way. >> +match_equals = re.compile(r'(%s)=(\S*)' % >> ('|'.join(configuration),)).match > > By using this regex, you preclude any paths with spaces in them. I > know many > people avoid them because a lot of UNIX programs break on them, but > I'm sure > you'll encounter a lot of Mac OS X users that have spaces in some of > their > paths :-) Yes, that was silly, thanks. (.*) should be sufficient. See incremental diff. > >> +# defaults >> +tmpeggs = None > > The above comment doesn't make sense to me. Agreed. It was a bit of a legacy comment--at one point in development I had set a lot of default values there. Now there is only one. I removed "default" and added a more pertinent comment. See incremental diff. > >> +if (configuration['--download-base'] and >> + not configuration['--download-base'].endswith('/')): >> + # download base needs a trailing slash to make the world happy >> + configuration['--download-base'] += '/' >> + > > I know this is something specific to our coding conventions, but > comments are > sentences needing a capital and a period ;-) Ack, see diff. > >> +if configuration['--version']: >> + configuration['--version'] = '==' + configuration['--version'] > > I don't understand the reason for that code? I'm guessing it's > because you are > going to pass it over to setuptools? Maybe add a comment? This is legacy code to some degree, but yes. I changed this approach entirely to try to clarify, and added a comment. > === modified file 'src/zc/buildout/buildout.py' >> - prefer_final = options.get('prefer-final', 'false') >> + prefer_final = options.setdefault('prefer-final', 'false') > > Is there any reason for the switch to setdefault? If it's because > you are > reusing options in other places, then I guess that's fine. > Otherwise, I know > some people find using setdefault() with it's return value a little > confusing > to read. It is because when buildout uses -vv, it is supposed to show you all configuration values, as shown in buildout.txt in the "Predefined buildout options" section. To appear in this output, they need to be in the options dict. Therefore, when appropriate, I switched from .get to .setdefault so that the options would get the default. It seemed like a reasonable, concise, and idiomatic way of accomplishing this. Otherwise, we'd need something like this. prefer_final = options.get('prefer-final', 'false') options['prefer-final'] = prefer_final I prefer the setdefault approach, but I'm happy to change. >> + options = self['buildout'] >> + >> + # Get a base working set for our distributions that >> corresponds to > the >> + # stated desires in the configuration. >> + distributions = ['setuptools', 'zc.buildout'] > > You have trailing white space on that previous line. Ow, I sure do, and a lot of it. Thanks. Changed in diff. > > >> === modified file 'src/zc/buildout/buildout.txt' > > I like the merging of the sections into the general option list. It > might make > them harder to find from the table of contents though. True. It makes it consistent with the other options though. I don't know why those would have been treated differently from the rest. I went for consistency, since I couldn't determine a rationale for putting an option in one place or the other. I consolidated into one of the places. >> === modified file 'src/zc/buildout/easy_install.py' >> + >> +def _get_system_packages(executable): >> + """return a pair of the standard lib and site packages for the > executable. >> + """ >> + # We want to get a list of the site packages, which is not >> easy. The >> + # canonical way to do this is to use > distutils.sysconfig.get_python_lib(), >> + # but that only returns a single path, which does not reflect >> reality > for >> + # many system Pythons, which have multiple additions. >> Instead, we > start >> + # Python with -S, which does not import site.py and set up the >> extra > paths >> + # like site-packages or (Ubuntu/Debian) dist-packages and >> python- > support. >> + # We then compare that sys.path with the normal one. The set >> of the > normal >> + # one minus the set of the ones in ``python -S`` is the set of >> packages >> + # that are effectively site-packages. >> + def get_sys_path(clean=False): >> + cmd = [executable, "-c", >> + "import sys, os;" >> + "print repr([os.path.normpath(p) for p in >> sys.path])"] >> + if clean: >> + cmd.insert(1, '-S') >> + _proc = subprocess.Popen( >> + cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) >> + stdout, stderr = _proc.communicate(); >> + if _proc.returncode: >> + raise RuntimeError( >> + 'error trying to get system packages:\n%s' % >> (stderr,)) >> + res = eval(stdout) > > Woa, eval! You like playing with fire! I would simply have use > writelines(), > readlines()... but heh, if the executable is compromised... all > things are > loose. I guess there is no danger here. That would be nice though, you're right. this was taken from my quick and dirty proof of concept. If you want me to change it, let me know. > >> @@ -650,34 +732,52 @@ >> self._maybe_add_setuptools(ws, dist) >> >> # OK, we have the requested distributions and they're in the > working >> - # set, but they may have unmet requirements. We'll simply >> keep >> - # trying to resolve requirements, adding missing >> requirements as > they >> - # are reported. >> - # >> - # Note that we don't pass in the environment, because we >> want >> + # set, but they may have unmet requirements. We'll >> resolve these >> + # requirements. This is code modified from >> + # pkg_resources.WorkingSet.resolve. We can't reuse that >> code > directly >> + # because we have to constrain our requirements (see >> + # >> versions_section_ignored_for_dependency_in_favor_of_site_packages > in >> + # zc.buildout.tests). > > Wouldn't it make tracking divergence easier if you moved the "code > modified > from" into a separate function? It would, but instead I opted to simplify the function that had used the ``resolve`` function. I thought it was much nicer. Here's the difference. First, this is the excerpt of what I chose to do (so, this is what you've already seen): -------------------- # OK, we have the requested distributions and they're in the working # set, but they may have unmet requirements. We'll resolve these # requirements. This is code modified from # pkg_resources.WorkingSet.resolve. We can't reuse that code directly # because we have to constrain our requirements (see # versions_section_ignored_for_dependency_in_favor_of_site_packages in # zc.buildout.tests). # requirements.reverse() # set up the stack processed = {} # set of processed requirements best = {} # key -> dist # # Note that we don't use the existing environment, because we want # to look for new eggs unless what we have is the best that # matches the requirement. env = pkg_resources.Environment(ws.entries) while requirements: # process dependencies breadth-first req = self._constrain(requirements.pop(0)) if req in processed: # Ignore cyclic or redundant dependencies continue dist = best.get(req.key) if dist is None: # Find the best distribution and add it to the map dist = ws.by_key.get(req.key) if dist is None: try: dist = best[req.key] = env.best_match(req, ws) except pkg_resources.VersionConflict, err: raise VersionConflict(err, ws) if dist is None: if dest: logger.debug('Getting required %r', str(req)) else: logger.debug('Adding required %r', str(req)) _log_requirement(ws, req) for dist in self._get_dist(req, ws, self._always_unzip): ws.add(dist) self._maybe_add_setuptools(ws, dist) if dist not in req: # Oops, the "best" so far conflicts with a dependency raise VersionConflict( pkg_resources.VersionConflict(dist, req), ws) requirements.extend(dist.requires(req.extras)[::-1]) processed[req] = True return ws -------------------- Now, this is what it would look like for what you propose. -------------------- # OK, we have the requested distributions and they're in the working # set, but they may have unmet requirements. We'll simply keep # trying to resolve requirements, adding missing requirements as they # are reported. # # Note that we don't pass in the environment, because we want # to look for new eggs unless what we have is the best that # matches the requirement. while 1: try: self._resolve(ws, requirements) except pkg_resources.DistributionNotFound, err: [requirement] = err requirement = self._constrain(requirement) if dest: logger.debug('Getting required %r', str(requirement)) else: logger.debug('Adding required %r', str(requirement)) _log_requirement(ws, requirement) for dist in self._get_dist(requirement, ws, self._always_unzip ): ws.add(dist) self._maybe_add_setuptools(ws, dist) except pkg_resources.VersionConflict, err: raise VersionConflict(err, ws) else: break return ws def _resolve(installer, self, requirements, env=None, installer=None): # This is a fork of the pkg_resources.WorkingSet.resolve code. The # only important difference is the line marked with "FORK." Otherwise, # the only other difference is that the installer is the first # argument. The odd choice of having "self" be the second argument is # there to make the differences between this code and the original be # as few as possible. requirements = list(requirements)[::-1] # set up the stack processed = {} # set of processed requirements best = {} # key -> dist to_activate = [] while requirements: # process dependencies breadth-first req = installer._constrain(requirements.pop(0)) # FORK if req in processed: # Ignore cyclic or redundant dependencies continue dist = best.get(req.key) if dist is None: # Find the best distribution and add it to the map dist = self.by_key.get(req.key) if dist is None: if env is None: env = Environment(self.entries) dist = best[req.key] = env.best_match(req, self, installer) if dist is None: raise DistributionNotFound(req) # XXX put more info here to_activate.append(dist) if dist not in req: # Oops, the "best" so far conflicts with a dependency raise VersionConflict(dist,req) # XXX put more info here requirements.extend(dist.requires(req.extras)[::-1]) processed[req] = True return to_activate # return list of distros to activate -------------------- My approach (the first) simplifies the code path quite a bit, I think. If you follow what's going on in the second code path, I think you might agree. That said, if you and Jim agree on changing to the second approach, obviously it's easy enough. > >> + # >> + requirements.reverse() # set up the stack >> + processed = {} # set of processed requirements >> + best = {} # key -> dist >> + # >> + # Note that we don't use the existing environment, because >> we want >> # to look for new eggs unless what we have is the best that >> # matches the requirement. >> - while 1: >> - try: >> - ws.resolve(requirements) >> - except pkg_resources.DistributionNotFound, err: >> - [requirement] = err >> - requirement = self._constrain(requirement) >> - if dest: >> - logger.debug('Getting required %r', >> str(requirement)) >> - else: >> - logger.debug('Adding required %r', >> str(requirement)) >> - _log_requirement(ws, requirement) >> - >> - for dist in self._get_dist(requirement, ws, > self._always_unzip >> - ): >> - >> - ws.add(dist) >> - self._maybe_add_setuptools(ws, dist) >> - except pkg_resources.VersionConflict, err: >> - raise VersionConflict(err, ws) >> - else: >> - break >> + env = pkg_resources.Environment(ws.entries) >> + while requirements: >> + # process dependencies breadth-first >> + req = self._constrain(requirements.pop(0)) >> + if req in processed: >> + # Ignore cyclic or redundant dependencies >> + continue >> + dist = best.get(req.key) >> + if dist is None: >> + # Find the best distribution and add it to the map >> + dist = ws.by_key.get(req.key) >> + if dist is None: >> + try: >> + dist = best[req.key] = env.best_match(req, >> ws) >> + except pkg_resources.VersionConflict, err: >> + raise VersionConflict(err, ws) >> + if dist is None: >> + if dest: > dest? Is it a typo (of dist) or something else? (I don't see a dest > in the > context.) Not a typo. It's farther up (``dest = self._dest``). Since dist and dest do look awfully similar, I changed this to ``destination = self._dest``. I also tried changing dist -> distribution but I didn't like it, so I didn't go that far. You can see what I did in the incremental diff. >> + logger.debug('Getting required %r', >> str(req)) >> + else: >> + logger.debug('Adding required %r', >> str(req)) >> + _log_requirement(ws, req) >> + for dist in self._get_dist(req, >> + ws, >> self._always_unzip): >> + ws.add(dist) >> + self._maybe_add_setuptools(ws, dist) >> + if dist not in req: >> + # Oops, the "best" so far conflicts with a >> dependency >> + raise VersionConflict( >> + pkg_resources.VersionConflict(dist, req), ws) >> + requirements.extend(dist.requires(req.extras)[::-1]) >> + processed[req] = True > >> @@ -969,7 +1121,7 @@ >> if relative_paths: >> relative_paths = os.path.normcase(relative_paths) >> sname = os.path.normcase(os.path.abspath(sname)) >> - spath = ',\n '.join( >> + spath = ',\n '.join( >> [_relativitize(os.path.normcase(path_item), sname, > relative_paths) >> for path_item in path] >> ) >> @@ -977,7 +1129,7 @@ >> for i in range(_relative_depth(relative_paths, sname)): >> rpsetup += "base = os.path.dirname(base)\n" >> else: >> - spath = repr(path)[1:-1].replace(', ', ',\n ') >> + spath = repr(path)[1:-1].replace(', ', ',\n ') >> rpsetup = '' >> return spath, rpsetup > > Is the change in indentation intentional? Yes. In the interpreter script, the paths are within an indented code block, while in a normal script, the paths are not part of indentation. Four spaces was a compromise that to my eyes looked fine in both uses. You can see the results in the easy_install.txt document. > > >> === modified file 'src/zc/buildout/tests.py' > >> @@ -2533,6 +2789,8 @@ >> def create_sample_eggs(test, executable=sys.executable): >> write = test.globs['write'] >> dest = test.globs['sample_eggs'] >> + site_packages = test.globs['tmpdir']('site_packages') >> + test.globs['site_packages'] = site_packages >> tmp = tempfile.mkdtemp() >> try: >> write(tmp, 'README.txt', '') >> @@ -2549,6 +2807,8 @@ >> % (i, c1) >> ) >> zc.buildout.testing.sdist(tmp, dest) >> + if i==1: >> + zc.buildout.testing.sys_install(tmp, site_packages) >> >> write( >> tmp, 'setup.py', >> @@ -2578,6 +2838,8 @@ >> " zip_safe=True, version='0.%s%s')\n" % (i, c1) >> ) >> zc.buildout.testing.bdist_egg(tmp, executable, dest) >> + if i==3: >> + zc.buildout.testing.sys_install(tmp, site_packages) >> > > I don't understand the above changes. This function is all about setting up bits and bobs to use in tests. Before my changes, the function created and added some sdists and bdist_eggs for use in the tests. After my changes, the function also installed a couple of packages in a directory, using the same basic approach that Ubuntu and Mac OS X use to install eggs in their system Pythons (which is different than other approaches to installing eggs, unfortunately). I'm able to use this faux-site-packages in the new bugfix test. I added a doctstring to the function which tries to put this in perspective (see incremental diff). > >> @@ -2652,6 +2914,49 @@ >> test.globs['sample_eggs']) >> test.globs['update_extdemo'] = lambda : add_source_dist(test, >> 1.5) >> zc.buildout.testing.install_develop('zc.recipe.egg', test) >> + # most tests don't need this set up, and it takes some time, >> so we just >> + # make it available as a convenience. > > Sentence starts with a capital :-) Ack. :-) Incremental diff follows. Thanks again. Gary Index: src/zc/buildout/tests.py =================================================================== --- src/zc/buildout/tests.py (revision 101878) +++ src/zc/buildout/tests.py (revision 101879) @@ -2787,6 +2787,13 @@ ###################################################################### def create_sample_eggs(test, executable=sys.executable): + """Creates sample eggs, source distributions, and a faux site- packages." + + Unlike the faux site-packages created by + ``get_executable_with_site_packages``, this one has packages installed the + way distributions often install eggs in system Pythons (via + zc.buildout.testing.sys_install). + """ write = test.globs['write'] dest = test.globs['sample_eggs'] site_packages = test.globs['tmpdir']('site_packages') @@ -2914,7 +2921,7 @@ test.globs['sample_eggs']) test.globs['update_extdemo'] = lambda : add_source_dist(test, 1.5) zc.buildout.testing.install_develop('zc.recipe.egg', test) - # most tests don't need this set up, and it takes some time, so we just + # Most tests don't need this set up, and it takes some time, so we just # make it available as a convenience. def get_executable_with_site_packages(requirements=None): executable_buildout = test.globs['tmpdir']('executable') Index: src/zc/buildout/easy_install.py =================================================================== --- src/zc/buildout/easy_install.py (revision 101878) +++ src/zc/buildout/easy_install.py (revision 101879) @@ -712,9 +712,9 @@ logger.debug('Installing %s.', repr(specs)[1:-1]) path = self._path - dest = self._dest - if dest is not None and dest not in path: - path.insert(0, dest) + destination = self._dest + if destination is not None and destination not in path: + path.insert(0, destination) requirements = [self._constrain(pkg_resources.Requirement.parse(spec)) for spec in specs] @@ -739,23 +739,23 @@ # versions_section_ignored_for_dependency_in_favor_of_site_packages in # zc.buildout.tests). # - requirements.reverse() # set up the stack - processed = {} # set of processed requirements - best = {} # key -> dist + requirements.reverse() # Set up the stack. + processed = {} # This is a set of processed requirements. + best = {} # This is a mapping of key -> dist. # # Note that we don't use the existing environment, because we want # to look for new eggs unless what we have is the best that # matches the requirement. env = pkg_resources.Environment(ws.entries) while requirements: - # process dependencies breadth-first + # Process dependencies breadth-first. req = self._constrain(requirements.pop(0)) if req in processed: - # Ignore cyclic or redundant dependencies + # Ignore cyclic or redundant dependencies. continue dist = best.get(req.key) if dist is None: - # Find the best distribution and add it to the map + # Find the best distribution and add it to the map. dist = ws.by_key.get(req.key) if dist is None: try: @@ -763,7 +763,7 @@ except pkg_resources.VersionConflict, err: raise VersionConflict(err, ws) if dist is None: - if dest: + if destination: logger.debug('Getting required %r', str(req)) else: logger.debug('Adding required %r', str(req)) @@ -773,7 +773,7 @@ ws.add(dist) self._maybe_add_setuptools(ws, dist) if dist not in req: - # Oops, the "best" so far conflicts with a dependency + # Oops, the "best" so far conflicts with a dependency. raise VersionConflict( pkg_resources.VersionConflict(dist, req), ws) requirements.extend(dist.requires(req.extras)[::-1]) Index: src/zc/buildout/buildout.py =================================================================== --- src/zc/buildout/buildout.py (revision 101878) +++ src/zc/buildout/buildout.py (revision 101879) @@ -290,7 +290,7 @@ # Get a base working set for our distributions that corresponds to the # stated desires in the configuration. - distributions = ['setuptools', 'zc.buildout'] + distributions = ['setuptools', 'zc.buildout'] if options.get('offline') == 'true': ws = zc.buildout.easy_install.working_set( distributions, options['executable'], Index: bootstrap/bootstrap.py =================================================================== --- bootstrap/bootstrap.py (revision 101878) +++ bootstrap/bootstrap.py (revision 101879) @@ -56,13 +56,15 @@ By using --ez_setup-source and --download-base to point to local resources, you can keep this script from going over the network. ''' % configuration) -match_equals = re.compile(r'(%s)=(\S*)' % ('|'.join(configuration),)).match +match_equals = re.compile(r'(%s)=(.*)' % ('|'.join(configuration),)).match args = sys.argv[1:] if args == ['--help']: print helpstring sys.exit(0) -# defaults +# If we end up using a temporary directory for storing our eggs, this will +# hold the path of that directory. On the other hand, if an explicit directory +# is specified in the argv, this will remain None. tmpeggs = None while args: @@ -84,13 +86,13 @@ for name in ('--ez_setup-source', '--download-base'): val = configuration[name] - if val is not None and '://' not in val: # we're being lazy. + if val is not None and '://' not in val: # We're being lazy. configuration[name] = 'file://%s' % ( urllib.pathname2url(os.path.abspath(os.path.expanduser(val))),) if (configuration['--download-base'] and not configuration['--download-base'].endswith('/')): - # download base needs a trailing slash to make the world happy + # Download base needs a trailing slash to make the world happy. configuration['--download-base'] += '/' if not configuration['--eggs']: @@ -99,8 +101,10 @@ configuration['--eggs'] = os.path.abspath( os.path.expanduser(configuration['--eggs'])) +# The requirement is what we will pass to setuptools to specify zc.buildout. +requirement = 'zc.buildout' if configuration['--version']: - configuration['--version'] = '==' + configuration['--version'] + requirement += '==' + configuration['--version'] try: import pkg_resources @@ -132,7 +136,7 @@ if configuration['--download-base']: cmd.extend(['-f', quote(configuration['--download-base'])]) -cmd.append('zc.buildout' + configuration['--version']) +cmd.append(requirement) ws = pkg_resources.working_set env = dict( @@ -153,7 +157,7 @@ sys.exit(exitcode) ws.add_entry(configuration['--eggs']) -ws.require('zc.buildout' + configuration['--version']) +ws.require(requirement) import zc.buildout.buildout args.append('bootstrap') zc.buildout.buildout.main(args)