Merge lp://staging/~xaav/bzr/tarball-generator into lp://staging/bzr
- tarball-generator
- Merge into bzr.dev
Status: | Superseded | ||||
---|---|---|---|---|---|
Proposed branch: | lp://staging/~xaav/bzr/tarball-generator | ||||
Merge into: | lp://staging/bzr | ||||
Diff against target: |
536 lines (+248/-100) 4 files modified
bzrlib/export/__init__.py (+55/-21) bzrlib/export/dir_exporter.py (+16/-2) bzrlib/export/tar_exporter.py (+163/-75) bzrlib/export/zip_exporter.py (+14/-2) |
||||
To merge this branch: | bzr merge lp://staging/~xaav/bzr/tarball-generator | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
xaav (community) | Abstain | ||
Martin Pool | Pending | ||
Robert Collins | Pending | ||
Review via email: mp+63319@code.staging.launchpad.net |
This proposal supersedes a proposal from 2011-06-01.
This proposal has been superseded by a proposal from 2011-06-06.
Commit message
Description of the change
This change fixes the related bug:
- yeild after ball.addfile is called.
- export_tarball function wrapper.
Jelmer Vernooij (jelmer) wrote : Posted in a previous version of this proposal | # |
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
On Thu, Jun 2, 2011 at 11:01 AM, Jelmer Vernooij <email address hidden> wrote:
> Hi Geoff,
>
> On Wed, 2011-06-01 at 22:36 +0000, Geoff wrote:
>> Geoff has proposed merging lp:~xaav/bzr/tarball-generator into lp:bzr.
>>
>> Requested reviews:
>> bzr-core (bzr-core)
>> Related bugs:
>> Bug #791005 in Bazaar: "Impossible to get data from export during progress."
>> https:/
>>
>> For more details, see:
>> https:/
>>
>> This change fixes the related bug:
>> - yeild after ball.addfile is called.
>> - export_tarball function wrapper.
> Thanks for working on this... it'd be really nice to have export support
> in Loggerhead.
>
> I think rather than making export_tarball a generator, it would make
> more sense to factor out the body of the loop in export_tarball, which
> could then be used by your function and the existing export_tarball
> function. I.e. a export_tarball_item function that just returns a (item,
> fileobj) tuple.
>
> Can you add a trivial test function for the new method you expose,
> whatever it is? That way we won't accidentally change it, and break
> loggerhead.
That would still mean duplicating code in loggerhead vs bzrlib, which
frankly sucks.
Whats the concern over this being a generator in bzrlib?
-Rob
Jelmer Vernooij (jelmer) wrote : Posted in a previous version of this proposal | # |
On Wed, 2011-06-01 at 23:07 +0000, Robert Collins wrote:
> On Thu, Jun 2, 2011 at 11:01 AM, Jelmer Vernooij <email address hidden> wrote:
> > Hi Geoff,
> >
> > On Wed, 2011-06-01 at 22:36 +0000, Geoff wrote:
> >> Geoff has proposed merging lp:~xaav/bzr/tarball-generator into lp:bzr.
> >>
> >> Requested reviews:
> >> bzr-core (bzr-core)
> >> Related bugs:
> >> Bug #791005 in Bazaar: "Impossible to get data from export during progress."
> >> https:/
> >>
> >> For more details, see:
> >> https:/
> >>
> >> This change fixes the related bug:
> >> - yeild after ball.addfile is called.
> >> - export_tarball function wrapper.
> > Thanks for working on this... it'd be really nice to have export support
> > in Loggerhead.
> >
> > I think rather than making export_tarball a generator, it would make
> > more sense to factor out the body of the loop in export_tarball, which
> > could then be used by your function and the existing export_tarball
> > function. I.e. a export_tarball_item function that just returns a (item,
> > fileobj) tuple.
> >
> > Can you add a trivial test function for the new method you expose,
> > whatever it is? That way we won't accidentally change it, and break
> > loggerhead.
> That would still mean duplicating code in loggerhead vs bzrlib, which
> frankly sucks.
>
> Whats the concern over this being a generator in bzrlib?
It's a generator, but it still takes a tarball object that it will add
entries to even though it also yields them.
export_tarball would roughly be:
def export_
for ie, path in tree.iter_
(item, fileobj) = export_
ball.
and loggerhead could do roughly the same thing, except it wouldn't have
to add the items returned to the tarfile object but it could stream them
directly.
Having a separate export_tarball_item function also gives us freedom to
select a different set of files to export - for example, there is a bug
open to support a --exclude option to "bzr export", and there is a bug
about exporting .bzrignore.
Cheers,
Jelmer
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
On Thu, Jun 2, 2011 at 11:40 AM, Jelmer Vernooij <email address hidden> wrote:
>> Whats the concern over this being a generator in bzrlib?
> It's a generator, but it still takes a tarball object that it will add
> entries to even though it also yields them.
>
> export_tarball would roughly be:
>
> def export_
> for ie, path in tree.iter_
> (item, fileobj) = export_
> ball.add(item, fileobj)
>
> and loggerhead could do roughly the same thing, except it wouldn't have
> to add the items returned to the tarfile object but it could stream them
> directly.
That means loggerhead is duplicating code. I think the yield of the
ball was incidental and not the point of the patch.
> Having a separate export_tarball_item function also gives us freedom to
> select a different set of files to export - for example, there is a bug
> open to support a --exclude option to "bzr export", and there is a bug
> about exporting .bzrignore.
Likewise here - a separate inner function may be a useful thing to do
but it doesn't help loggerhead.
xaav (xaav) wrote : Posted in a previous version of this proposal | # |
The yield of the ball is not really helpful to anything - I figured it would make more sense to yield the tarball than yield nothing.
Alternatively, we could move the for in loop to the tgz,tbz, plain tar, etc... exporters.
All I really care about here is fixing the bug in loggerhead. Please make up your mind about what I'm supposed to do - duplicate code in loggerhead or factor it out here - and inform me of that decision.
xaav (xaav) wrote : Posted in a previous version of this proposal | # |
Okay, I've updated the code to use a different method. Please tell me which method you prefer.
Robert Collins (lifeless) wrote : Posted in a previous version of this proposal | # |
Hi Geoff, sorry you are getting torn in different directions. I've chatted with Jelmer on IRC to avoid this for you.
I think this is pretty close. We may want a test that it works in bzrlib's test suite, to show the contract loggerhead needs.
I think what we want is the following:
* in loggerhead a *single* code path that can handle:
- tar.gz
- zip
- tar.lzma
- etc
exports. It would be something like you wrote.
output_stream = LoggerheadFilel
exporter = bzrlib.
for _ in exporter:
yield output_
* in bzrlib get_export_
That way we can fixup bzrlib to support zip etc in the future and only have to make cosmetic changes to loggerhead.
What do you think?
xaav (xaav) wrote : Posted in a previous version of this proposal | # |
> Hi Geoff, sorry you are getting torn in different directions. I've chatted
> with Jelmer on IRC to avoid this for you.
>
> I think this is pretty close. We may want a test that it works in bzrlib's
> test suite, to show the contract loggerhead needs.
>
> I think what we want is the following:
>
> * in loggerhead a *single* code path that can handle:
> - tar.gz
> - zip
> - tar.lzma
> - etc
>
> exports. It would be something like you wrote.
>
> output_stream = LoggerheadFilel
> exporter = bzrlib.
> revision_id)
> for _ in exporter:
> yield output_
>
> * in bzrlib get_export_
> the right GZipFile etc and raises an exception if it can't stream.
>
> That way we can fixup bzrlib to support zip etc in the future and only have to
> make cosmetic changes to loggerhead.
>
> What do you think?
Sounds good. I should have it coded up by tomorrow.
xaav (xaav) wrote : Posted in a previous version of this proposal | # |
Okay, I've reworked this to meet the specifications, but it may need some cleaning-up.
You are welcome to commend in the meantime.
Martin Pool (mbp) wrote : Posted in a previous version of this proposal | # |
I commend this patch ;-)
[question] It seems to me that you can get streaming by just observing the chunks being written into the file object, and having a generator as well is redundant. I see this is basically needed because PEP333 strongly deprecates having streaming data sent to a write function returned by a generator.
Perhaps it would at least be good to explain that it is done this way for the sake of PEP333.
[question] Given that wsgi does actually support having write() called repeatedly for streaming data, why not use it? I realize it's not the recommended interface, but it is the interface that does fit most naturally with the lower level tarball code we're using. It seems a bit like we're reimplementing something that could be better done in the wsgi framework -- specifically, turning push commands into pull. That would have the concrete benefit of allowing streaming just exactly as data is produced by the compressor, on finer granularity than exporting a single file.
+
+def get_export_
+ per_file_
+
+ """Returns a generator that exports the given Tree to the specific destination.
...
+ :param fileobj: Optional file object to use
Factoring out this function makes sense.
[fix] Please make the docstring explain what the generator actually produces. I am confused whether the content is written to the file object or returned in the generator; also how fileobj interacts with dest.
[idea] Actually, perhaps it would be better to do away with 'dest' entirely at this layer? I think it is only there for the sake of heuristics about choosing the format, and then for opening the file, and
[fix] I see all the actual 'exporters', and also 'export' itself now return generators which must be consumed for them to work. Generally if we change an api in such a way that existing callers will be wrong (rather than getting an error), we ought to rename the api too, and of course add a news entry. Another option would be to keep the existing function names doing the whole job, and add new ones that are generators.
+def export_
+ force_mtime=None):
Lifting this out is good.
[tweak] Per pep8, we indent continuation lines either by 4 spaces, or to be just after the preceding parenthesis.
Martin Pool (mbp) wrote : Posted in a previous version of this proposal | # |
Just for the sake of being clear: as the comment tags are meant to indicate, I'd _like_ to see if there's a simpler way, but I'd be ok with just fixing the docs and the API stability.
Jelmer Vernooij (jelmer) wrote : Posted in a previous version of this proposal | # |
per_file_timestamps determines whether or not force_mtime is set; passing down per_file_timestamps to the actual exporters is not necessary since it won't be used.
xaav (xaav) wrote : Posted in a previous version of this proposal | # |
[question] Given that wsgi does actually support having write() called repeatedly for streaming data, why not use it? I realize it's not the recommended interface, but it is the interface that does fit most naturally with the lower level tarball code we're using. It seems a bit like we're reimplementing something that could be better done in the wsgi framework -- specifically, turning push commands into pull. That would have the concrete benefit of allowing streaming just exactly as data is produced by the compressor, on finer granularity than exporting a single file.
It seemed like calling that method wouldn't support chunked encoding - it would instead store it in a buffer and calculate the content length. Also the documentation regarding chunked encoding isn't good, and this seemed the most appropriate way to do this.
xaav (xaav) wrote : Posted in a previous version of this proposal | # |
[fix] Please make the docstring explain what the generator actually produces. I am confused whether the content is written to the file object or returned in the generator; also how fileobj interacts with dest.
The generator does the same thing that had been previously been done, just in chunks.
xaav (xaav) wrote : Posted in a previous version of this proposal | # |
[fix] I see all the actual 'exporters', and also 'export' itself now return generators which must be consumed for them to work. Generally if we change an api in such a way that existing callers will be wrong (rather than getting an error), we ought to rename the api too, and of course add a news entry. Another option would be to keep the existing function names doing the whole job, and add new ones that are generators.
I changed how this was done, because I didn't think the code in the finally statement would be executed properly. Please re-review this.
[tweak] Per pep8, we indent continuation lines either by 4 spaces, or to be just after the preceding parenthesis.
Where is this a problem?
xaav (xaav) : Posted in a previous version of this proposal | # |
Martin Pool (mbp) wrote : Posted in a previous version of this proposal | # |
On 3 June 2011 10:38, Geoff <email address hidden> wrote:
> [fix] Please make the docstring explain what the generator actually produces. I am confused whether the content is written to the file object or returned in the generator; also how fileobj interacts with dest.
>
> The generator does the same thing that had been previously been done, just in chunks.
Normally Python generators return something on each iteration;
typically the main result of a generator function will be produced
through the generator. This one returns None repeatedly, and the
caller is required to consume the whole thing. This is unusual enough
that it should be clearly documented.
Martin Pool (mbp) wrote : Posted in a previous version of this proposal | # |
On 3 June 2011 10:41, Geoff <email address hidden> wrote:
> [fix] I see all the actual 'exporters', and also 'export' itself now return generators which must be consumed for them to work. Generally if we change an api in such a way that existing callers will be wrong (rather than getting an error), we ought to rename the api too, and of course add a news entry. Another option would be to keep the existing function names doing the whole job, and add new ones that are generators.
>
> I changed how this was done, because I didn't think the code in the finally statement would be executed properly. Please re-review this.
It looks like you did change export() back into being a non-generator
function, but the others still are. I am concerned there will be
functions that count on the current behaviour so we ought to fix this.
I see::
- try:
- return _exporters[
- force_mtime=
- finally:
- tree.unlock()
-
+
+ for _ in _exporters[
force_mtime, per_file_
+
+ yield
+
+ tree.unlock()
+
+
[fix] It's almost always a bug, and it is in this case, to have the
unlock code not in either a finally block, or addCleanup, or some
other equivalent. If the export is interrupted or errors, the tree
won't be unlocked.
+def export_
filtered=False, force_mtime=None):
[tweak] So this doesn't actually export it, it just packs it up ready
to be exported. Please consider renaming it, and document what is
returned.
[aside] We could probably make this faster by avoiding using a
StringIO in the common case, but that's a separate existing issue.
> [tweak] Per pep8, we indent continuation lines either by 4 spaces, or to be just after the preceding parenthesis.
>
> Where is this a problem?
[tweak] Actually, that specific problem isn't there, it just looked
like it to me because some lines are very long. Our coding
convention, following <http://
that they should be wrapped before 80 columns.
hth, thanks
xaav (xaav) wrote : | # |
On 3 June 2011 10:41, Geoff <email address hidden> wrote:
> [fix] I see all the actual 'exporters', and also 'export' itself now return generators which must be consumed for them to work. Generally if we change an api in such a way that existing callers will be wrong (rather than getting an error), we ought to rename the api too, and of course add a news entry. Another option would be to keep the existing function names doing the whole job, and add new ones that are generators.
>
> I changed how this was done, because I didn't think the code in the finally statement would be executed properly. Please re-review this.
>
> It looks like you did change export() back into being a non-generator
> function, but the others still are. I am concerned there will be
> functions that count on the current behaviour so we ought to fix this.
The way the code base is currently set up, there is no need to use the other functions. They should just be passing in the information to the export() function.
I see::
- try:
- return _exporters[
- force_mtime=
- finally:
- tree.unlock()
-
+
+ for _ in _exporters[
force_mtime, per_file_
+
+ yield
+
+ tree.unlock()
+
+
> [fix] It's almost always a bug, and it is in this case, to have the
> unlock code not in either a finally block, or addCleanup, or some
> other equivalent. If the export is interrupted or errors, the tree
> won't be unlocked.
Fixed and will show up on the next push.
xaav (xaav) wrote : | # |
+def export_
filtered=False, force_mtime=None):
[tweak] So this doesn't actually export it, it just packs it up ready
to be exported. Please consider renaming it, and document what is
returned.
Fixed and will show up on the next push.
xaav (xaav) wrote : | # |
[aside] We could probably make this faster by avoiding using a
StringIO in the common case, but that's a separate existing issue.
> [tweak] Per pep8, we indent continuation lines either by 4 spaces, or to be just after the preceding parenthesis.
>
> Where is this a problem?
[tweak] Actually, that specific problem isn't there, it just looked
like it to me because some lines are very long. Our coding
convention, following <http://
that they should be wrapped before 80 columns.
Okay, I'll be sure and fix this.
xaav (xaav) wrote : | # |
Fixed. Are there any more issues?
Martin Pool (mbp) wrote : | # |
On 3 June 2011 13:40, Geoff <email address hidden> wrote:
> On 3 June 2011 10:41, Geoff <email address hidden> wrote:
>> [fix] I see all the actual 'exporters', and also 'export' itself now return generators which must be consumed for them to work. Generally if we change an api in such a way that existing callers will be wrong (rather than getting an error), we ought to rename the api too, and of course add a news entry. Another option would be to keep the existing function names doing the whole job, and add new ones that are generators.
>>
>> I changed how this was done, because I didn't think the code in the finally statement would be executed properly. Please re-review this.
>>
>> It looks like you did change export() back into being a non-generator
>> function, but the others still are. I am concerned there will be
>> functions that count on the current behaviour so we ought to fix this.
>
> The way the code base is currently set up, there is no need to use the other functions. They should just be passing in the information to the export() function.
If you're reasoning that therefore nobody actually will be using them,
you're probably right, but I wouldn't bet on it. Renaming them is
cheap.
Thanks
Martin Pool (mbp) wrote : | # |
... so the only remaining thing I would like is to rename the per-format exporters to say tar_lzma_
Jelmer Vernooij (jelmer) wrote : | # |
On Fri, 2011-06-03 at 03:56 +0000, Geoff wrote:
> Review: Abstain
> Fixed. Are there any more issues?
per_file_timestamps and force_mtime do the same thing, no function
should take both.
Cheers,
Jelmer
xaav (xaav) wrote : | # |
Okay, I'll rename them and then create wrapper functions that return the original contents.
xaav (xaav) wrote : | # |
> per_file_timestamps and force_mtime do the same thing, no function
> should take both.
Oops! I'll look into this.
xaav (xaav) wrote : | # |
Okay, this should fix all the issues pointed out so far.
xaav (xaav) wrote : | # |
If there are no further issues then I would appreciate if someone would merge this.
Hi Geoff,
On Wed, 2011-06-01 at 22:36 +0000, Geoff wrote: /bugs.launchpad .net/bzr/ +bug/791005 /code.launchpad .net/~xaav/ bzr/tarball- generator/ +merge/ 63180
> Geoff has proposed merging lp:~xaav/bzr/tarball-generator into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> Bug #791005 in Bazaar: "Impossible to get data from export during progress."
> https:/
>
> For more details, see:
> https:/
>
> This change fixes the related bug:
> - yeild after ball.addfile is called.
> - export_tarball function wrapper.
Thanks for working on this... it'd be really nice to have export support
in Loggerhead.
I think rather than making export_tarball a generator, it would make
more sense to factor out the body of the loop in export_tarball, which
could then be used by your function and the existing export_tarball
function. I.e. a export_tarball_item function that just returns a (item,
fileobj) tuple.
Can you add a trivial test function for the new method you expose,
whatever it is? That way we won't accidentally change it, and break
loggerhead.
Cheers,
Jelmer