Merge lp://staging/~jameinel/bzr-fastimport/less-sticky into lp://staging/bzr-fastimport

Proposed by John A Meinel
Status: Merged
Approved by: Ian Clatworthy
Approved revision: not available
Merged at revision: not available
Proposed branch: lp://staging/~jameinel/bzr-fastimport/less-sticky
Merge into: lp://staging/bzr-fastimport
Diff against target: 595 lines (+414/-45)
6 files modified
bzr_commit_handler.py (+20/-16)
cache_manager.py (+28/-0)
revision_store.py (+157/-19)
tests/__init__.py (+11/-10)
tests/test_generic_processor.py (+51/-0)
tests/test_revision_store.py (+147/-0)
To merge this branch: bzr merge lp://staging/~jameinel/bzr-fastimport/less-sticky
Reviewer Review Type Date Requested Status
Ian Clatworthy Approve
Review via email: mp+15458@code.staging.launchpad.net
To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

This is my earlier branch that I proposed. I just wasn't sure if it was properly put up for review.

There is also one very small tweak, because the way the inv_sha1 fix landed in bzr is slightly different from how I originally envisioned it.

Revision history for this message
John A Meinel (jameinel) wrote :
Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

Thanks a bucket-load for these changes. I'll take a detailed look next Monday when my brain is a little more awake.

FWIW, my main concern is bzr version compatibility. Until now, fastimport has supported early versions of bzr: 1.1 according to README.txt. I'm happy to bump that to 2.0.0 (say) but I don't want to depend on any APIs only in the 2.1.0 branch yet. Off the top of your head, do we need to tweak any of this patch accordingly?

review: Needs Information
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Review: Needs Information
> Thanks a bucket-load for these changes. I'll take a detailed look next Monday when my brain is a little more awake.
>
> FWIW, my main concern is bzr version compatibility. Until now, fastimport has supported early versions of bzr: 1.1 according to README.txt. I'm happy to bump that to 2.0.0 (say) but I don't want to depend on any APIs only in the 2.1.0 branch yet. Off the top of your head, do we need to tweak any of this patch accordingly?

I don't know of anything, but it is best to run an import with an older
bzr to make sure :).

There is at least one bugfix that landed in trunk, but the code already
checks for that

if isinstance(builder.inv_sha1, tuple):
...

KnownGraph was introduced in bzr 1.17, and though 'add_node' is new, it
also has a 'getattr()' check around that (though I think that is a
different patch come to think of it.)

I think that if the switch to CommitBuilder is finalized, then we should
look into cleaning up the code, as you have a lot of duplicate logic in
there.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksZIrEACgkQJdeBCYSNAANViwCg1d18S7ygPWdG0e89xvSa0IXK
UjcAoIKWth4vZzo8+PxuMIHShsPpuOP7
=cwJW
-----END PGP SIGNATURE-----

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

I've been experimenting with this branch on numerous different inputs and the results need some further digging. On bzr itself for example, fastimport time increases from 24 minutes to 33 minutes and the memory consumption increases a little as well. On Firefox, fastimport time increases from 52 minutes to 79 minutes (I didn't check memory). Hmm.

My initial reaction was that it might be the disk blob caching. However, disabling that only drops the time for bzr to 32 minutes so I'm suspecting it's more related to using CommitBuilder. At least in that case.

I'm certainly happy with most of the code I've checked and I fully trust you to do a better job of writing it than me. Even so, I might proceed by merging this in a few different pieces.

review: Approve
Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Review: Approve
> I've been experimenting with this branch on numerous different inputs and the results need some further digging. On bzr itself for example, fastimport time increases from 24 minutes to 33 minutes and the memory consumption increases a little as well. On Firefox, fastimport time increases from 52 minutes to 79 minutes (I didn't check memory). Hmm.
>
> My initial reaction was that it might be the disk blob caching. However, disabling that only drops the time for bzr to 32 minutes so I'm suspecting it's more related to using CommitBuilder. At least in that case.
>
> I'm certainly happy with most of the code I've checked and I fully trust you to do a better job of writing it than me. Even so, I might proceed by merging this in a few different pieces.
>
>

Does this include the KnownGraph code changes? I'm wondering what sort
of result you would get from that. I would guess that this code is,
indeed, slower than what you had, as it is doing more work. (At merge
time it takes the diff against all parents not just the left-hand one.)

However, if we can make parts faster, then 'bzr commit' will also
benefit. Adding KnownGraph to handle heads() should be quite a bit
better, though there are some small things like how many actual merges
there are in the codebase.

xserver is probably not a great example, as it is quite linear for most
of the history. I should probably try a different one going forward.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksd9DMACgkQJdeBCYSNAAP2mQCfelbgTJKCIXhv5iWaEUP8hd4L
YRcAn1vvc53M/7Wci5y/s2KLyeouQ2b6
=g/BY
-----END PGP SIGNATURE-----

Revision history for this message
Ian Clatworthy (ian-clatworthy) wrote :

John A Meinel wrote:

> Does this include the KnownGraph code changes?

Not yet. That's a separate review.

> xserver is probably not a great example, as it is quite linear for most
> of the history. I should probably try a different one going forward.

It's worth testing on a few projects from different sources, e.g.
xserver from git, bzr from bzr and Thunderbird from hg, say. All of
those are medium size but should complete quickly enough. Every now and
then, try the really big stuff like the kernel, mysql and OOo.

Some questions about TreeShim now that I'm looking at the code harder:

1. id2path() looks wrong. It should return newpath if it's not None?

2. What value does get_reference_revision() add?

Ian C.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> John A Meinel wrote:
>
>> Does this include the KnownGraph code changes?
>
> Not yet. That's a separate review.
>
>> xserver is probably not a great example, as it is quite linear for most
>> of the history. I should probably try a different one going forward.
>
> It's worth testing on a few projects from different sources, e.g.
> xserver from git, bzr from bzr and Thunderbird from hg, say. All of
> those are medium size but should complete quickly enough. Every now and
> then, try the really big stuff like the kernel, mysql and OOo.
>
> Some questions about TreeShim now that I'm looking at the code harder:
>
> 1. id2path() looks wrong. It should return newpath if it's not None?

I think you are correct. I certainly did the lookup, it would make sense
to return that value.

The code as written seems to only trap for when an object is deleted.

That said, I would hope we don't use id2path very often in the commit
code. I'm pretty sure it is done by some code paths (as I added it
because I was auditing the commit code). It seems to be called in the
"unchanged_merged" code path.

Which is 'things different from a right-hand parent'. So I *think* for
it to fail, you would need to add a file versus both parents. Which
means that at merge time, you 'bzr add' a new file. Which is probably a
rather rare condition. Since it is unrecommended behavior to do a lot of
"unrelated" changes during a merge.

However, it certainly sounds like something worth adding a test case. :)

>
> 2. What value does get_reference_revision() add?

Not much yet, but it is one of the function that CommitBuilder would
look at if we had trees with references, I don't have any idea what that
would look like for a fast-import stream. But since git has submodule
support, I would assume there is *something* that it can put in the fast
import stream, which we could then map to a tree_reference.

If you want to take it out, that is fine. But I figured a
NotImplementedError would probably be better than an AttributeError.

>
> Ian C.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkseiJcACgkQJdeBCYSNAAMjywCfcIJR7JQU0nWw+fg+oYD6RVM8
EbUAoIMA/0DattP8h9BQYL0OJnmVhMh0
=m8Nz
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>> 1. id2path() looks wrong. It should return newpath if it's not None?
>
> I think you are correct. I certainly did the lookup, it would make sense
> to return that value.
>
> The code as written seems to only trap for when an object is deleted.
>
> That said, I would hope we don't use id2path very often in the commit
> code. I'm pretty sure it is done by some code paths (as I added it
> because I was auditing the commit code). It seems to be called in the
> "unchanged_merged" code path.
>
> Which is 'things different from a right-hand parent'. So I *think* for
> it to fail, you would need to add a file versus both parents. Which
> means that at merge time, you 'bzr add' a new file. Which is probably a
> rather rare condition. Since it is unrecommended behavior to do a lot of
> "unrelated" changes during a merge.
>
> However, it certainly sounds like something worth adding a test case. :)
>

So I think I'm wrong about how to trigger this. I'll try to track it
down. As it stands today, only 1 test case triggers id2path and it is
the one I added for merge+revert. The conditions are

1) The file must be considered modified in a merged revision relative to
   BASE. This puts it into the 'merged_ids' list.

2) The file must not be considered modified in THIS versus BASE. This
   puts it into the 'unchanged_merged' set.

If (1) and (2) aren't true, then id2path is not called for the file.
However, as an interesting side effect:

3) It must be in the THIS vs BASE delta for it to show up in
   _new_info_by_id. And this negates (2).

   Which sounds like what we really need is just an assertion that
   file_id is not in _neW_info_by_id.

Anyway, so it doesn't actually matter that it is broken. We could add
some direct tests for it, but I'm inclined to keep _TreeShim as minimal
as possible, and only implement stuff that can be tested during the
import. I suppose commit builder could change in the future...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkseq9oACgkQJdeBCYSNAAPPfACdGnTkABFPtwQxifA1HoULo/b+
iIUAn0Pe6hoM1Y/koT6zajHwwaUywJhg
=qzFm
-----END PGP SIGNATURE-----

272. By John A Meinel

Add a bunch of direct tests for the _TreeShim interface.

Revision history for this message
John A Meinel (jameinel) wrote :

...

> Some questions about TreeShim now that I'm looking at the code harder:
>
> 1. id2path() looks wrong. It should return newpath if it's not None?

I did end up adding some direct tests to _TreeShim, and correcting that code path.

>
> 2. What value does get_reference_revision() add?
>
> Ian C.

As mentioned, it means if we ever had tree references, you'll get a NotImplementedError rather than an AttributeError. Otherwise, it doesn't add much of anything.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzr_commit_handler.py'
2--- bzr_commit_handler.py 2009-11-30 21:31:31 +0000
3+++ bzr_commit_handler.py 2009-12-09 20:15:27 +0000
4@@ -71,9 +71,9 @@
5 """Prepare for committing."""
6 self.revision_id = self.gen_revision_id()
7 # cache of texts for this commit, indexed by file-id
8- self.lines_for_commit = {}
9+ self.data_for_commit = {}
10 #if self.rev_store.expects_rich_root():
11- self.lines_for_commit[inventory.ROOT_ID] = []
12+ self.data_for_commit[inventory.ROOT_ID] = []
13
14 # Track the heads and get the real parent list
15 parents = self.cache_mgr.track_heads(self.command)
16@@ -126,9 +126,13 @@
17 self.cache_mgr.inventories[revision_id] = inv
18 return inv
19
20+ def _get_data(self, file_id):
21+ """Get the data bytes for a file-id."""
22+ return self.data_for_commit[file_id]
23+
24 def _get_lines(self, file_id):
25 """Get the lines for a file-id."""
26- return self.lines_for_commit[file_id]
27+ return osutils.split_lines(self._get_data(file_id))
28
29 def _get_per_file_parents(self, file_id):
30 """Get the lines for a file-id."""
31@@ -288,20 +292,20 @@
32 ie.revision = self.revision_id
33 if kind == 'file':
34 ie.executable = is_executable
35- lines = osutils.split_lines(data)
36- ie.text_sha1 = osutils.sha_strings(lines)
37- ie.text_size = sum(map(len, lines))
38- self.lines_for_commit[file_id] = lines
39+ # lines = osutils.split_lines(data)
40+ ie.text_sha1 = osutils.sha_string(data)
41+ ie.text_size = len(data)
42+ self.data_for_commit[file_id] = data
43 elif kind == 'directory':
44 self.directory_entries[path] = ie
45 # There are no lines stored for a directory so
46 # make sure the cache used by get_lines knows that
47- self.lines_for_commit[file_id] = []
48+ self.data_for_commit[file_id] = ''
49 elif kind == 'symlink':
50 ie.symlink_target = data.encode('utf8')
51 # There are no lines stored for a symlink so
52 # make sure the cache used by get_lines knows that
53- self.lines_for_commit[file_id] = []
54+ self.data_for_commit[file_id] = ''
55 else:
56 self.warning("Cannot import items of kind '%s' yet - ignoring '%s'"
57 % (kind, path))
58@@ -345,7 +349,7 @@
59 self.directory_entries[dirname] = ie
60 # There are no lines stored for a directory so
61 # make sure the cache used by get_lines knows that
62- self.lines_for_commit[dir_file_id] = []
63+ self.data_for_commit[dir_file_id] = ''
64
65 # It's possible that a file or symlink with that file-id
66 # already exists. If it does, we need to delete it.
67@@ -415,7 +419,7 @@
68 kind = ie.kind
69 if kind == 'file':
70 if newly_changed:
71- content = ''.join(self.lines_for_commit[file_id])
72+ content = self.data_for_commit[file_id]
73 else:
74 content = self.rev_store.get_file_text(self.parents[0], file_id)
75 self._modify_item(dest_path, kind, ie.executable, content, inv)
76@@ -451,7 +455,7 @@
77 # that means the loader then needs to know what the "new" text is.
78 # We therefore must go back to the revision store to get it.
79 lines = self.rev_store.get_file_lines(rev_id, file_id)
80- self.lines_for_commit[file_id] = lines
81+ self.data_for_commit[file_id] = ''.join(lines)
82
83 def _delete_all_items(self, inv):
84 for name, root_item in inv.root.children.iteritems():
85@@ -499,7 +503,7 @@
86 """Save the revision."""
87 self.cache_mgr.inventories[self.revision_id] = self.inventory
88 self.rev_store.load(self.revision, self.inventory, None,
89- lambda file_id: self._get_lines(file_id),
90+ lambda file_id: self._get_data(file_id),
91 lambda file_id: self._get_per_file_parents(file_id),
92 lambda revision_ids: self._get_inventories(revision_ids))
93
94@@ -598,9 +602,9 @@
95 delta = self._get_final_delta()
96 inv = self.rev_store.load_using_delta(self.revision,
97 self.basis_inventory, delta, None,
98- lambda file_id: self._get_lines(file_id),
99- lambda file_id: self._get_per_file_parents(file_id),
100- lambda revision_ids: self._get_inventories(revision_ids))
101+ self._get_data,
102+ self._get_per_file_parents,
103+ self._get_inventories)
104 self.cache_mgr.inventories[self.revision_id] = inv
105 #print "committed %s" % self.revision_id
106
107
108=== modified file 'cache_manager.py'
109--- cache_manager.py 2009-12-08 06:26:34 +0000
110+++ cache_manager.py 2009-12-09 20:15:26 +0000
111@@ -54,6 +54,34 @@
112 shutils.rmtree(self.tempdir)
113
114
115+
116+class _Cleanup(object):
117+ """This class makes sure we clean up when CacheManager goes away.
118+
119+ We use a helper class to ensure that we are never in a refcycle.
120+ """
121+
122+ def __init__(self, disk_blobs):
123+ self.disk_blobs = disk_blobs
124+ self.tempdir = None
125+ self.small_blobs = None
126+
127+ def __del__(self):
128+ self.finalize()
129+
130+ def finalize(self):
131+ if self.disk_blobs is not None:
132+ for info in self.disk_blobs.itervalues():
133+ if info[-1] is not None:
134+ os.unlink(info[-1])
135+ self.disk_blobs = None
136+ if self.small_blobs is not None:
137+ self.small_blobs.close()
138+ self.small_blobs = None
139+ if self.tempdir is not None:
140+ shutils.rmtree(self.tempdir)
141+
142+
143 class CacheManager(object):
144
145 _small_blob_threshold = 25*1024
146
147=== modified file 'revision_store.py'
148--- revision_store.py 2009-10-25 22:05:48 +0000
149+++ revision_store.py 2009-12-09 20:15:27 +0000
150@@ -1,4 +1,4 @@
151-# Copyright (C) 2008 Canonical Ltd
152+# Copyright (C) 2008, 2009 Canonical Ltd
153 #
154 # This program is free software; you can redistribute it and/or modify
155 # it under the terms of the GNU General Public License as published by
156@@ -16,11 +16,137 @@
157
158 """An abstraction of a repository providing just the bits importing needs."""
159
160+import cStringIO
161
162 from bzrlib import errors, inventory, knit, lru_cache, osutils, trace
163 from bzrlib import revision as _mod_revision
164
165
166+class _TreeShim(object):
167+ """Fake a Tree implementation.
168+
169+ This implements just enough of the tree api to make commit builder happy.
170+ """
171+
172+ def __init__(self, repo, basis_inv, inv_delta, content_provider):
173+ self._repo = repo
174+ self._content_provider = content_provider
175+ self._basis_inv = basis_inv
176+ self._inv_delta = inv_delta
177+ self._new_info_by_id = dict([(file_id, (new_path, ie))
178+ for _, new_path, file_id, ie in inv_delta])
179+
180+ def id2path(self, file_id):
181+ if file_id in self._new_info_by_id:
182+ new_path = self._new_info_by_id[file_id][0]
183+ if new_path is None:
184+ raise errors.NoSuchId(self, file_id)
185+ return new_path
186+ return self._basis_inv.id2path(file_id)
187+
188+ def path2id(self, path):
189+ # CommitBuilder currently only requires access to the root id. We don't
190+ # build a map of renamed files, etc. One possibility if we ever *do*
191+ # need more than just root, is to defer to basis_inv.path2id() and then
192+ # check if the file_id is in our _new_info_by_id dict. And in that
193+ # case, return _new_info_by_id[file_id][0]
194+ if path != '':
195+ raise NotImplementedError(_TreeShim.path2id)
196+ # TODO: Handle root renames?
197+ return self._basis_inv.root.file_id
198+
199+ def get_file_with_stat(self, file_id, path=None):
200+ try:
201+ content = self._content_provider(file_id)
202+ except KeyError:
203+ # The content wasn't shown as 'new'. Just validate this fact
204+ assert file_id not in self._new_info_by_id
205+ old_ie = self._basis_inv[file_id]
206+ old_text_key = (file_id, old_ie.revision)
207+ stream = self._repo.texts.get_record_stream([old_text_key],
208+ 'unordered', True)
209+ content = stream.next().get_bytes_as('fulltext')
210+ sio = cStringIO.StringIO(content)
211+ return sio, None
212+
213+ def get_symlink_target(self, file_id):
214+ if file_id in self._new_info_by_id:
215+ ie = self._new_info_by_id[file_id][1]
216+ return ie.symlink_target
217+ return self._basis_inv[file_id].symlink_target
218+
219+ def get_reference_revision(self, file_id, path=None):
220+ raise NotImplementedError(_TreeShim.get_reference_revision)
221+
222+ def _delta_to_iter_changes(self):
223+ """Convert the inv_delta into an iter_changes repr."""
224+ # iter_changes is:
225+ # (file_id,
226+ # (old_path, new_path),
227+ # content_changed,
228+ # (old_versioned, new_versioned),
229+ # (old_parent_id, new_parent_id),
230+ # (old_name, new_name),
231+ # (old_kind, new_kind),
232+ # (old_exec, new_exec),
233+ # )
234+ basis_inv = self._basis_inv
235+ for old_path, new_path, file_id, ie in self._inv_delta:
236+ # Perf: Would this be faster if we did 'if file_id in basis_inv'?
237+ # Since the *very* common case is that the file already exists, it
238+ # probably is better to optimize for that
239+ try:
240+ old_ie = basis_inv[file_id]
241+ except errors.NoSuchId:
242+ old_ie = None
243+ if ie is None:
244+ raise AssertionError('How is both old and new None?')
245+ change = (file_id,
246+ (old_path, new_path),
247+ False,
248+ (False, False),
249+ (None, None),
250+ (None, None),
251+ (None, None),
252+ (None, None),
253+ )
254+ change = (file_id,
255+ (old_path, new_path),
256+ True,
257+ (False, True),
258+ (None, ie.parent_id),
259+ (None, ie.name),
260+ (None, ie.kind),
261+ (None, ie.executable),
262+ )
263+ else:
264+ if ie is None:
265+ change = (file_id,
266+ (old_path, new_path),
267+ True,
268+ (True, False),
269+ (old_ie.parent_id, None),
270+ (old_ie.name, None),
271+ (old_ie.kind, None),
272+ (old_ie.executable, None),
273+ )
274+ else:
275+ content_modified = (ie.text_sha1 != old_ie.text_sha1
276+ or ie.text_size != old_ie.text_size)
277+ # TODO: ie.kind != old_ie.kind
278+ # TODO: symlinks changing targets, content_modified?
279+ change = (file_id,
280+ (old_path, new_path),
281+ content_modified,
282+ (True, True),
283+ (old_ie.parent_id, ie.parent_id),
284+ (old_ie.name, ie.name),
285+ (old_ie.kind, ie.kind),
286+ (old_ie.executable, ie.executable),
287+ )
288+ yield change
289+
290+
291 class AbstractRevisionStore(object):
292
293 def __init__(self, repo):
294@@ -224,29 +350,41 @@
295 including an empty inventory for the missing revisions
296 If None, a default implementation is provided.
297 """
298- # Get the non-ghost parents and their inventories
299- if inventories_provider is None:
300- inventories_provider = self._default_inventories_provider
301- present_parents, parent_invs = inventories_provider(rev.parent_ids)
302+ # TODO: set revision_id = rev.revision_id
303+ builder = self.repo._commit_builder_class(self.repo,
304+ parents=rev.parent_ids, config=None, timestamp=rev.timestamp,
305+ timezone=rev.timezone, committer=rev.committer,
306+ revprops=rev.properties, revision_id=rev.revision_id)
307
308- # Load the inventory
309- try:
310- rev_id = rev.revision_id
311- rev.inventory_sha1, inv = self._add_inventory_by_delta(
312- rev_id, basis_inv, inv_delta, present_parents, parent_invs)
313- except errors.RevisionAlreadyPresent:
314+ if rev.parent_ids:
315+ basis_rev_id = rev.parent_ids[0]
316+ else:
317+ basis_rev_id = _mod_revision.NULL_REVISION
318+ tree = _TreeShim(self.repo, basis_inv, inv_delta, text_provider)
319+ changes = tree._delta_to_iter_changes()
320+ for (file_id, path, fs_hash) in builder.record_iter_changes(
321+ tree, basis_rev_id, changes):
322+ # So far, we don't *do* anything with the result
323 pass
324+ builder.finish_inventory()
325+ # TODO: This is working around a bug in the bzrlib code base.
326+ # 'builder.finish_inventory()' ends up doing:
327+ # self.inv_sha1 = self.repository.add_inventory_by_delta(...)
328+ # However, add_inventory_by_delta returns (sha1, inv)
329+ # And we *want* to keep a handle on both of those objects
330+ if isinstance(builder.inv_sha1, tuple):
331+ builder.inv_sha1, builder.new_inventory = builder.inv_sha1
332+ # This is a duplicate of Builder.commit() since we already have the
333+ # Revision object, and we *don't* want to call commit_write_group()
334+ rev.inv_sha1 = builder.inv_sha1
335+ builder.repository.add_revision(builder._new_revision_id, rev,
336+ builder.new_inventory, builder._config)
337
338- # Load the texts, signature and revision
339- file_rev_ids_needing_texts = [(id, ie.revision)
340- for _, n, id, ie in inv_delta
341- if n is not None and ie.revision == rev_id]
342- self._load_texts_for_file_rev_ids(file_rev_ids_needing_texts,
343- text_provider, parents_provider)
344 if signature is not None:
345+ raise AssertionError('signatures not guaranteed yet')
346 self.repo.add_signature_text(rev_id, signature)
347- self._add_revision(rev, inv)
348- return inv
349+ # self._add_revision(rev, inv)
350+ return builder.revision_tree().inventory
351
352 def _non_root_entries_iter(self, inv, revision_id):
353 if hasattr(inv, 'iter_non_root_entries'):
354
355=== modified file 'tests/__init__.py'
356--- tests/__init__.py 2009-02-25 11:33:28 +0000
357+++ tests/__init__.py 2009-12-09 20:15:27 +0000
358@@ -21,15 +21,16 @@
359
360
361 def test_suite():
362- module_names = [
363- 'bzrlib.plugins.fastimport.tests.test_branch_mapper',
364- 'bzrlib.plugins.fastimport.tests.test_commands',
365- 'bzrlib.plugins.fastimport.tests.test_errors',
366- 'bzrlib.plugins.fastimport.tests.test_filter_processor',
367- 'bzrlib.plugins.fastimport.tests.test_generic_processor',
368- 'bzrlib.plugins.fastimport.tests.test_head_tracking',
369- 'bzrlib.plugins.fastimport.tests.test_helpers',
370- 'bzrlib.plugins.fastimport.tests.test_parser',
371- ]
372+ module_names = [__name__ + '.' + x for x in [
373+ 'test_branch_mapper',
374+ 'test_commands',
375+ 'test_errors',
376+ 'test_filter_processor',
377+ 'test_generic_processor',
378+ 'test_head_tracking',
379+ 'test_helpers',
380+ 'test_parser',
381+ 'test_revision_store',
382+ ]]
383 loader = TestLoader()
384 return loader.loadTestsFromModuleNames(module_names)
385
386=== modified file 'tests/test_generic_processor.py'
387--- tests/test_generic_processor.py 2009-11-12 07:51:21 +0000
388+++ tests/test_generic_processor.py 2009-12-09 20:15:26 +0000
389@@ -1832,3 +1832,54 @@
390 def test_import_symlink(self):
391 handler, branch = self.get_handler()
392 handler.process(self.get_command_iter('foo', 'symlink', 'bar'))
393+
394+
395+class TestModifyRevertInBranch(TestCaseForGenericProcessor):
396+
397+ def file_command_iter(self):
398+ # A add 'foo'
399+ # |\
400+ # | B modify 'foo'
401+ # | |
402+ # | C revert 'foo' back to A
403+ # |/
404+ # D merge 'foo'
405+ def command_list():
406+ committer_a = ['', 'a@elmer.com', time.time(), time.timezone]
407+ committer_b = ['', 'b@elmer.com', time.time(), time.timezone]
408+ committer_c = ['', 'c@elmer.com', time.time(), time.timezone]
409+ committer_d = ['', 'd@elmer.com', time.time(), time.timezone]
410+ def files_one():
411+ yield commands.FileModifyCommand('foo', 'file', False,
412+ None, "content A\n")
413+ yield commands.CommitCommand('head', '1', None,
414+ committer_a, "commit 1", None, [], files_one)
415+ def files_two():
416+ yield commands.FileModifyCommand('foo', 'file', False,
417+ None, "content B\n")
418+ yield commands.CommitCommand('head', '2', None,
419+ committer_b, "commit 2", ":1", [], files_two)
420+ def files_three():
421+ yield commands.FileModifyCommand('foo', 'file', False,
422+ None, "content A\n")
423+ yield commands.CommitCommand('head', '3', None,
424+ committer_c, "commit 3", ":2", [], files_three)
425+ yield commands.CommitCommand('head', '4', None,
426+ committer_d, "commit 4", ":1", [':3'], lambda: [])
427+ return command_list
428+
429+ def test_modify_revert(self):
430+ handler, branch = self.get_handler()
431+ handler.process(self.file_command_iter())
432+ branch.lock_read()
433+ self.addCleanup(branch.unlock)
434+ rev_d = branch.last_revision()
435+ rev_a, rev_c = branch.repository.get_parent_map([rev_d])[rev_d]
436+ rev_b = branch.repository.get_parent_map([rev_c])[rev_c][0]
437+ rtree_a, rtree_b, rtree_c, rtree_d = branch.repository.revision_trees([
438+ rev_a, rev_b, rev_c, rev_d])
439+ foo_id = rtree_a.path2id('foo')
440+ self.assertEqual(rev_a, rtree_a.inventory[foo_id].revision)
441+ self.assertEqual(rev_b, rtree_b.inventory[foo_id].revision)
442+ self.assertEqual(rev_c, rtree_c.inventory[foo_id].revision)
443+ self.assertEqual(rev_c, rtree_d.inventory[foo_id].revision)
444
445=== added file 'tests/test_revision_store.py'
446--- tests/test_revision_store.py 1970-01-01 00:00:00 +0000
447+++ tests/test_revision_store.py 2009-12-09 20:15:27 +0000
448@@ -0,0 +1,147 @@
449+# Copyright (C) 2008, 2009 Canonical Ltd
450+#
451+# This program is free software; you can redistribute it and/or modify
452+# it under the terms of the GNU General Public License as published by
453+# the Free Software Foundation; either version 2 of the License, or
454+# (at your option) any later version.
455+#
456+# This program is distributed in the hope that it will be useful,
457+# but WITHOUT ANY WARRANTY; without even the implied warranty of
458+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
459+# GNU General Public License for more details.
460+#
461+# You should have received a copy of the GNU General Public License
462+# along with this program; if not, write to the Free Software
463+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
464+
465+"""Direct tests of the revision_store classes."""
466+
467+from bzrlib import (
468+ branch,
469+ errors,
470+ inventory,
471+ osutils,
472+ tests,
473+ )
474+
475+from bzrlib.plugins.fastimport import (
476+ revision_store,
477+ )
478+
479+
480+class Test_TreeShim(tests.TestCase):
481+
482+ def invAddEntry(self, inv, path, file_id=None):
483+ if path.endswith('/'):
484+ path = path[:-1]
485+ kind = 'directory'
486+ else:
487+ kind = 'file'
488+ parent_path, basename = osutils.split(path)
489+ parent_id = inv.path2id(parent_path)
490+ inv.add(inventory.make_entry(kind, basename, parent_id, file_id))
491+
492+ def make_trivial_basis_inv(self):
493+ basis_inv = inventory.Inventory('TREE_ROOT')
494+ self.invAddEntry(basis_inv, 'foo', 'foo-id')
495+ self.invAddEntry(basis_inv, 'bar/', 'bar-id')
496+ self.invAddEntry(basis_inv, 'bar/baz', 'baz-id')
497+ return basis_inv
498+
499+ def test_id2path_no_delta(self):
500+ basis_inv = self.make_trivial_basis_inv()
501+ shim = revision_store._TreeShim(repo=None, basis_inv=basis_inv,
502+ inv_delta=[], content_provider=None)
503+ self.assertEqual('', shim.id2path('TREE_ROOT'))
504+ self.assertEqual('foo', shim.id2path('foo-id'))
505+ self.assertEqual('bar', shim.id2path('bar-id'))
506+ self.assertEqual('bar/baz', shim.id2path('baz-id'))
507+ self.assertRaises(errors.NoSuchId, shim.id2path, 'qux-id')
508+
509+ def test_id2path_with_delta(self):
510+ basis_inv = self.make_trivial_basis_inv()
511+ foo_entry = inventory.make_entry('file', 'foo2', 'TREE_ROOT', 'foo-id')
512+ inv_delta = [('foo', 'foo2', 'foo-id', foo_entry),
513+ ('bar/baz', None, 'baz-id', None),
514+ ]
515+
516+ shim = revision_store._TreeShim(repo=None, basis_inv=basis_inv,
517+ inv_delta=inv_delta,
518+ content_provider=None)
519+ self.assertEqual('', shim.id2path('TREE_ROOT'))
520+ self.assertEqual('foo2', shim.id2path('foo-id'))
521+ self.assertEqual('bar', shim.id2path('bar-id'))
522+ self.assertRaises(errors.NoSuchId, shim.id2path, 'baz-id')
523+
524+ def test_path2id(self):
525+ basis_inv = self.make_trivial_basis_inv()
526+ shim = revision_store._TreeShim(repo=None, basis_inv=basis_inv,
527+ inv_delta=[], content_provider=None)
528+ self.assertEqual('TREE_ROOT', shim.path2id(''))
529+ # We don't want to ever give a wrong value, so for now we just raise
530+ # NotImplementedError
531+ self.assertRaises(NotImplementedError, shim.path2id, 'bar')
532+
533+ def test_get_file_with_stat_content_in_stream(self):
534+ basis_inv = self.make_trivial_basis_inv()
535+
536+ def content_provider(file_id):
537+ return 'content of\n' + file_id + '\n'
538+
539+ shim = revision_store._TreeShim(repo=None, basis_inv=basis_inv,
540+ inv_delta=[],
541+ content_provider=content_provider)
542+ f_obj, stat_val = shim.get_file_with_stat('baz-id')
543+ self.assertIs(None, stat_val)
544+ self.assertEqualDiff('content of\nbaz-id\n', f_obj.read())
545+
546+ # TODO: Test when the content isn't in the stream, and we fall back to the
547+ # repository that was passed in
548+
549+ def test_get_symlink_target(self):
550+ basis_inv = self.make_trivial_basis_inv()
551+ ie = inventory.make_entry('symlink', 'link', 'TREE_ROOT', 'link-id')
552+ ie.symlink_target = u'link-target'
553+ basis_inv.add(ie)
554+ shim = revision_store._TreeShim(repo=None, basis_inv=basis_inv,
555+ inv_delta=[], content_provider=None)
556+ self.assertEqual(u'link-target', shim.get_symlink_target('link-id'))
557+
558+ def test_get_symlink_target_from_delta(self):
559+ basis_inv = self.make_trivial_basis_inv()
560+ ie = inventory.make_entry('symlink', 'link', 'TREE_ROOT', 'link-id')
561+ ie.symlink_target = u'link-target'
562+ inv_delta = [(None, 'link', 'link-id', ie)]
563+ shim = revision_store._TreeShim(repo=None, basis_inv=basis_inv,
564+ inv_delta=inv_delta,
565+ content_provider=None)
566+ self.assertEqual(u'link-target', shim.get_symlink_target('link-id'))
567+
568+ def test__delta_to_iter_changes(self):
569+ basis_inv = self.make_trivial_basis_inv()
570+ foo_entry = inventory.make_entry('file', 'foo2', 'bar-id', 'foo-id')
571+ link_entry = inventory.make_entry('symlink', 'link', 'TREE_ROOT',
572+ 'link-id')
573+ link_entry.symlink_target = u'link-target'
574+ inv_delta = [('foo', 'bar/foo2', 'foo-id', foo_entry),
575+ ('bar/baz', None, 'baz-id', None),
576+ (None, 'link', 'link-id', link_entry),
577+ ]
578+ shim = revision_store._TreeShim(repo=None, basis_inv=basis_inv,
579+ inv_delta=inv_delta,
580+ content_provider=None)
581+ changes = list(shim._delta_to_iter_changes())
582+ expected = [('foo-id', ('foo', 'bar/foo2'), False, (True, True),
583+ ('TREE_ROOT', 'bar-id'), ('foo', 'foo2'),
584+ ('file', 'file'), (False, False)),
585+ ('baz-id', ('bar/baz', None), True, (True, False),
586+ ('bar-id', None), ('baz', None),
587+ ('file', None), (False, None)),
588+ ('link-id', (None, 'link'), True, (False, True),
589+ (None, 'TREE_ROOT'), (None, 'link'),
590+ (None, 'symlink'), (None, False)),
591+ ]
592+ # from pprint import pformat
593+ # self.assertEqualDiff(pformat(expected), pformat(changes))
594+ self.assertEqual(expected, changes)
595+

Subscribers

People subscribed via source and target branches