Merge lp://staging/~leonidborisenko/bzr-hg/hacking into lp://staging/bzr-hg

Proposed by Leonid Borisenko
Status: Merged
Merged at revision: 417
Proposed branch: lp://staging/~leonidborisenko/bzr-hg/hacking
Merge into: lp://staging/bzr-hg
Diff against target: 240 lines (+177/-8)
3 files modified
fetch.py (+16/-3)
overlay.py (+1/-1)
tests/test_fetch.py (+160/-4)
To merge this branch: bzr merge lp://staging/~leonidborisenko/bzr-hg/hacking
Reviewer Review Type Date Requested Status
Jelmer Vernooij (community) Needs Fixing
Review via email: mp+44319@code.staging.launchpad.net
To post a comment you must log in.
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hi Leonid,

Thanks for the MP, and for adding a test along with your changes. I
don't the fix is quite right yet though:

  review needsfixing

On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> Leonid Borisenko has proposed merging lp:~leonidborisenko/bzr-hg/hacking into lp:bzr-hg.
>
> Requested reviews:
> bzr-hg developers (bzr-hg)
> Related bugs:
> #692901 Crash with KeyError in incremental mirroring
> https://bugs.launchpad.net/bugs/692901
>
> differences between files attachment (review-diff.txt)
> === modified file 'fetch.py'
> --- fetch.py 2010-12-18 18:28:46 +0000
> +++ fetch.py 2010-12-21 09:17:08 +0000
> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
> + pb):
> i = 0
> # Texts
> while 1:
> @@ -361,7 +370,11 @@
> fileid = mapping.generate_file_id(f)
> chunkiter = mercurial.changegroup.chunkiter(cg)
> def get_text(node):
> - key = iter(filetext_map[fileid][node]).next()
> + if node in filetext_map[fileid]:
> + key = iter(filetext_map[fileid][node]).next()
> + else:
> + key = self._get_target_fulltext_key_from_revision_ancestry(
> + fileid, first_imported_revid)
This assumes there is some significance in the first revision that's
imported. This happens to be the case in your test case, but is not
necessarily true.

Perhaps we can pass csid to get_text and then use that to look up the
matching revision id and use that to find the proper full text?

> +
> + # Pull commited changeset to Bazaar branch.
> + #
> + # Prefer named function instead lambda to slightly more informative
> + # fail message.
> + def pull_to_bzr_repo_with_existing_text_metadata():
> + bzrtree.pull(hgbranch)
> +
> + # Not expected KeyError looked like:
> + #
> + # <...full traceback skipped...>
> + # File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
> + # (fileid, ie.revision))
> + # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
> + self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
> + Not(raises(KeyError)))
I think we should simply just call bzrtree.pull(hgbranch) here. We don't
have a pattern of using testtools anywhere else in bzr-hg, and the error
message it prints in this case is very unhelpful (it eats the
backtrace).

Cheers,

Jelmer

review: Needs Fixing
Revision history for this message
Leonid Borisenko (leonidborisenko) wrote :
Download full text (4.0 KiB)

Hello Jelmer,

On 21.12.2010 13:49, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
>> differences between files attachment (review-diff.txt)
>> === modified file 'fetch.py'
>> --- fetch.py 2010-12-18 18:28:46 +0000
>> +++ fetch.py 2010-12-21 09:17:08 +0000
>> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
>> + pb):
>> i = 0
>> # Texts
>> while 1:
>> @@ -361,7 +370,11 @@
>> fileid = mapping.generate_file_id(f)
>> chunkiter = mercurial.changegroup.chunkiter(cg)
>> def get_text(node):
>> - key = iter(filetext_map[fileid][node]).next()
>> + if node in filetext_map[fileid]:
>> + key = iter(filetext_map[fileid][node]).next()
>> + else:
>> + key = self._get_target_fulltext_key_from_revision_ancestry(
>> + fileid, first_imported_revid)
> This assumes there is some significance in the first revision that's
> imported. This happens to be the case in your test case, but is not
> necessarily true.
>
> Perhaps we can pass csid to get_text and then use that to look up the
> matching revision id and use that to find the proper full text?

  I think this is don't improve the solution.

  Any cases of getting full file text that are satsified by knowledge of
current revision id or revision ids of it's parents in boundaries of
current fetching session are already handled right.

  There is variable fulltext_cache in function
<email address hidden> and this variable (in phase of importing
texts) is the mapping from file parent nodeid to file text.

  All texts in boundaries of fetching session are imported strictly
sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
function. After importing in target repository full file text is
remembered in fulltext_cache.

  So if revision with required file parent nodeid was fetched in current
session, then text will be taken directly from fulltext_cache.

  So the only cases, that are handled by my patch, are ones that needs
looking into revisions that are in target repository, but was imported
in any previous fetching sessions. These revisions are exactly ancestry
of first revision imported in current fetching session.

  Am I missing some important point here? I really can't imagine the
case in which fulltext_cache miss means that:
    * full file text can be restored with knowledge of csid (assuming
that csid is the equivalent of the id of second or third or ... imported
revision)
    * AND in process of restoring full text it doesn't getting to id of
first imported revision.

>> +
>> + # Pull commited changeset to Bazaar branch.
>> + #
>> + # Prefer named function instead lambda to slightly more informative
>> + # fail message.
>> + def pull_to_bzr_repo_with_existing_text_metadata():
>> + bzrtree.pull(hgbranch)
>> +
>> + # Not expected KeyError looked like:
>> + #
>> + # <...full traceback skipped...>
>> + # File "/tmp/hg/fetch.py", line 208, in manifest...

Read more...

Revision history for this message
Jelmer Vernooij (jelmer) wrote :
Download full text (5.0 KiB)

Hi Leonid,

On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:

> On 21.12.2010 13:49, Jelmer Vernooij wrote:
> > On Tue, 2010-12-21 at 09:17 +0000, Leonid Borisenko wrote:
> >> differences between files attachment (review-diff.txt)
> >> === modified file 'fetch.py'
> >> --- fetch.py 2010-12-18 18:28:46 +0000
> >> +++ fetch.py 2010-12-21 09:17:08 +0000
> >> + def _unpack_texts(self, cg, mapping, filetext_map, first_imported_revid,
> >> + pb):
> >> i = 0
> >> # Texts
> >> while 1:
> >> @@ -361,7 +370,11 @@
> >> fileid = mapping.generate_file_id(f)
> >> chunkiter = mercurial.changegroup.chunkiter(cg)
> >> def get_text(node):
> >> - key = iter(filetext_map[fileid][node]).next()
> >> + if node in filetext_map[fileid]:
> >> + key = iter(filetext_map[fileid][node]).next()
> >> + else:
> >> + key = self._get_target_fulltext_key_from_revision_ancestry(
> >> + fileid, first_imported_revid)
> > This assumes there is some significance in the first revision that's
> > imported. This happens to be the case in your test case, but is not
> > necessarily true.
> >
> > Perhaps we can pass csid to get_text and then use that to look up the
> > matching revision id and use that to find the proper full text?
> I think this is don't improve the solution.

> Any cases of getting full file text that are satsified by knowledge of
> current revision id or revision ids of it's parents in boundaries of
> current fetching session are already handled right.

> There is variable fulltext_cache in function
> <email address hidden> and this variable (in phase of importing
> texts) is the mapping from file parent nodeid to file text.
>
> All texts in boundaries of fetching session are imported strictly
> sequencely in 'for chunk in chunk_iter: ...' cycle of unpack_chunk_iter
> function. After importing in target repository full file text is
> remembered in fulltext_cache.

> So if revision with required file parent nodeid was fetched in current
> session, then text will be taken directly from fulltext_cache.

> So the only cases, that are handled by my patch, are ones that needs
> looking into revisions that are in target repository, but was imported
> in any previous fetching sessions. These revisions are exactly ancestry
> of first revision imported in current fetching session.
That's not necessarily true. Assume the following graph:

A
| \
B C
| |
D F
|/
E

If the first fetch pulls in up to C and D, and the second fetch pulls in
E then the first imported revision will probably be D but the parent of
F will be C - which is not the first imported revision, nor will it be
in its ancestry.

> Am I missing some important point here? I really can't imagine the
> case in which fulltext_cache miss means that:
> * full file text can be restored with knowledge of csid (assuming
> that csid is the equivalent of the id of second or third or ... imported
> revision)
I specifically meant csid of one of the texts that was /not/ fetched
during this run but a...

Read more...

331. By Leonid Borisenko

Remove testtools import from test.

332. By Leonid Borisenko

Cosmetic fix of tests.

Make use of uniformly quoted strings.

333. By Leonid Borisenko

Remove unnecessary creating of directory in tests.

mercurial.localrepo.localrepository class creates unexisted directory by
itself in constructor.

334. By Leonid Borisenko

Add tests for incremental fetching of non-linear history.

Revision history for this message
Leonid Borisenko (leonidborisenko) wrote :

Hello Jelmer,

On 22.12.2010 20:41, Jelmer Vernooij wrote:
> On Tue, 2010-12-21 at 21:43 +0000, Leonid Borisenko wrote:
> [skipped]
>> So the only cases, that are handled by my patch, are ones that needs
>> looking into revisions that are in target repository, but was imported
>> in any previous fetching sessions. These revisions are exactly ancestry
>> of first revision imported in current fetching session.
> That's not necessarily true. Assume the following graph:
>
> A
> | \
> B C
> | |
> D F
> |/
> E
>
> If the first fetch pulls in up to C and D, and the second fetch pulls in
> E then the first imported revision will probably be D but the parent of
> F will be C - which is not the first imported revision, nor will it be
> in its ancestry.

  Thanks for example.
  I've tried to reproduce this graph in test cases (look at my updated
branch) and one of two cases revealed new problem. It crashes at
importing manifests (i.e. before importing texts), but root of bug seems
the same (it concerns with filetext_map variable).
  So my solution should be rethinked and reworked.

> [skipped]
>>>> + # Not expected KeyError looked like:
>>>> + #
>>>> + # <...full traceback skipped...>
>>>> + # File "/tmp/hg/fetch.py", line 208, in manifest_to_inventory_delta
>>>> + # (fileid, ie.revision))
>>>> + # KeyError: ('hg:f1', 'hg-v1:97562cfbcf3b26e7eacf17ca9b6f742f98bd0719')
>>>> + self.assertThat(pull_to_bzr_repo_with_existing_text_metadata,
>>>> + Not(raises(KeyError)))
>>> I think we should simply just call bzrtree.pull(hgbranch) here. We don't
>>> have a pattern of using testtools anywhere else in bzr-hg, and the error
>>> message it prints in this case is very unhelpful (it eats the
>>> backtrace).
>>
>> Ok, I'll get rid of testtools import and using of imported matchers.
>>
>> But I think that just calling 'bzrtree.pull(hgbranch)' making this
>> test case too generous (so I've intentionally set assertThat expression
>> to express strict expectation of error and narrow the test case).
>>
>> Does it make any sense? Or just leaving the comment about not expected
>> KeyError near calling 'bzrtree.pull' is sufficient?
> I don't see how not explicitly checking for that exception is too
> "generous". If that function does raise an exception then the test
> runner will report it. What does explicitly checking for that exception
> add?
>
> IMHO just leaving the comment is sufficient.

  OK, I've implemented this.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches