Merge into trunk-rich : statictuples : Code : loggerhead

Status:	Work in progress
Proposed branch:	lp://staging/~mnordhoff/loggerhead/statictuples
Merge into:	lp://staging/loggerhead
Diff against target:	253 lines (+134/-12) 6 files modified loggerhead/_static_tuple_py.py (+80/-0) loggerhead/changecache.py (+11/-2) loggerhead/controllers/filediff_ui.py (+3/-0) loggerhead/history.py (+3/-2) loggerhead/util.py (+7/-0) loggerhead/wholehistory.py (+30/-8)
To merge this branch:	bzr merge lp://staging/~mnordhoff/loggerhead/statictuples
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Loggerhead Team		2009-10-21	Pending
Review via email: mp+13706@code.staging.launchpad.net

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-10-21:

#

This totally isn't ready to merge (though it does work 100%, AFAICT), but I wanted to bring it to your attention.

This branch changes Loggerhead to make use of StaticTuples, jam's recent addition to bzrlib. (If you don't know, they're very similar to tuples, but they are faster and save memory by only holding certain types of objects that can't get in reference cycles, so they can avoid the GC overhead.)

I'm bundling a copy of the Python version of StaticTuple for compatibility with older bzr versions. This has a performance cost, but hopefully it's minor, and worth it.

I don't really have information on how much of a savings this is; I haven't been running it long enough for memory usage to stabilize. But I figure it should reduce the number of tuples by 50-75%, which is 100,000-300,000 for me.

BTW, I fixed a missing import since the last time Launchpad mirrored this branch.

Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat, which has a proposal up for merging to bzr.dev.

What's left to do:

1.) It's not possible to marshal StaticTuples, so RevInfoDiskCache currently uses listcomps to convert to/from regular tuples when serializing/deserializing. This probably has a performance impact, but all I cared about at the time was getting it working.

It's not possible to pickle them either, so we can't switch to that. Besides, pickling uses more disk space than marshal. (As a quick test, I took a random row from revinfo.sql. Without changing from tuple to StaticTuple, changing it from marshal+zlib to pickle+zlib (using pickle protocol version 2) went from 39 KB to 54 KB.)

2.) Currently I work around StaticTuple.from_sequence() only supporting sequences, not generators. I'll poke jam about that next time I see him.

3.) There weren't very many of them, but I saw some tuples like ('replace', 0, 1, 0, 1) in memory, and want to StaticTupleify them too, but don't know where they come from.

4.) I convert the results of iter_ancestry to StaticTuples. I should probably send a patch to bzr.dev, or maybe just leave them as normal tuples, to save the cost of converting them (though it's probably very minor).

4.) Maybe some things should be interned? I dunno.

5.) Some of the things I StaticTupled probably have a negligible impact on memory usage, but hey, why not?

6.) After this, the only other interesting tuples I see in Dozer are related to Subversion exceptions (bug #436406) and a few like ('sha1:...',), which I haven't investigated yet. Plus a couple ('$some_revid', [$graph_cache]).

7.) It might be possible for some of Loggerhead's many lists to be turned into StaticTuples. In particular, the graph cache is a list of lists of 3 tuples. (They actually are sometimes modified where they're built, wholehistory.compute_whole_history_data(), but it may be worth the inconvenience of turning them into StaticTuples anyway.)

This totally isn't ready to merge (though it does work 100%, AFAICT), but I wanted to bring it to your attention.

This branch changes Loggerhead to make use of StaticTuples, jam's recent addition to bzrlib. (If you don't know, they're very similar to tuples, but they are faster and save memory by only holding certain types of objects that can't get in reference cycles, so they can avoid the GC overhead.)

I'm bundling a copy of the Python version of StaticTuple for compatibility with older bzr versions. This has a performance cost, but hopefully it's minor, and worth it.

I don't really have information on how much of a savings this is; I haven't been running it long enough for memory usage to stabilize. But I figure it should reduce the number of tuples by 50-75%, which is 100,000-300,000 for me.

BTW, I fixed a missing import since the last time Launchpad mirrored this branch.

Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat, which has a proposal up for merging to bzr.dev.

What's left to do:

1.) It's not possible to marshal StaticTuples, so RevInfoDiskCache currently uses listcomps to convert to/from regular tuples when serializing/deserializing. This probably has a performance impact, but all I cared about at the time was getting it working.

It's not possible to pickle them either, so we can't switch to that. Besides, pickling uses more disk space than marshal. (As a quick test, I took a random row from revinfo.sql. Without changing from tuple to StaticTuple, changing it from marshal+zlib to pickle+zlib (using pickle protocol version 2) went from 39 KB to 54 KB.)

2.) Currently I work around StaticTuple.from_sequence() only supporting sequences, not generators. I'll poke jam about that next time I see him.

3.) There weren't very many of them, but I saw some tuples like ('replace', 0, 1, 0, 1) in memory, and want to StaticTupleify them too, but don't know where they come from.

4.) I convert the results of iter_ancestry to StaticTuples. I should probably send a patch to bzr.dev, or maybe just leave them as normal tuples, to save the cost of converting them (though it's probably very minor).

4.) Maybe some things should be interned? I dunno.

5.) Some of the things I StaticTupled probably have a negligible impact on memory usage, but hey, why not?

6.) After this, the only other interesting tuples I see in Dozer are related to Subversion exceptions (bug #436406) and a few like ('sha1:...',), which I haven't investigated yet. Plus a couple ('$some_revid', [$graph_cache]).

7.) It might be possible for some of Loggerhead's many lists to be turned into StaticTuples. In particular, the graph cache is a list of lists of 3 tuples. (They actually are sometimes modified where they're built, wholehistory.compute_whole_history_data(), but it may be worth the inconvenience of turning them into StaticTuples anyway.)

Reply

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-10-21:

#

> It's not possible to pickle them either, so we can't switch to that.

If we decide to go that way, it is, of course, possible to add pickling
support to StaticTuple. I just meant that isn't supported now.

Turns out it's really trivial to implement, too. (Well, as trivial as
anything can be in C. It's a one-liner in Python. So I can probably hack
a C version together in...a few weeks? :-P )

Reply

lp://staging/~mnordhoff/loggerhead/statictuples updated on 2009-10-21

401. By Matt Nordhoff on 2009-10-21: Merge trunk, fixing conflicts.
402. By Matt Nordhoff on 2009-10-21: Merge trunk, fixing conflict

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-10-21:

#

Matt Nordhoff wrote:
> Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat,
> which has a proposal up for merging to bzr.dev.

FYI, this has landed now. Pickle support has been approved, so it will
hopefully land soon as well.

Reply

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2009-10-21:

#

Download full text (3.9 KiB)

Matt Nordhoff wrote:
> Matt Nordhoff has proposed merging lp:~mnordhoff/loggerhead/statictuples into lp:loggerhead.
>
> Requested reviews:
> Loggerhead-team (loggerhead-team)
>
>
> This totally isn't ready to merge (though it does work 100%, AFAICT), but I wanted to bring it to your attention.
>
> This branch changes Loggerhead to make use of StaticTuples, jam's recent addition to bzrlib. (If you don't know, they're very similar to tuples, but they are faster and save memory by only holding certain types of objects that can't get in reference cycles, so they can avoid the GC overhead.)

Thanks for poking at this.

> I'm bundling a copy of the Python version of StaticTuple for compatibility with older bzr versions. This has a performance cost, but hopefully it's minor, and worth it.
>
> I don't really have information on how much of a savings this is; I haven't been running it long enough for memory usage to stabilize. But I figure it should reduce the number of tuples by 50-75%, which is 100,000-300,000 for me.
>
> BTW, I fixed a missing import since the last time Launchpad mirrored this branch.
>
> Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat, which has a proposal up for merging to bzr.dev.
>
> What's left to do:
>
> 1.) It's not possible to marshal StaticTuples, so RevInfoDiskCache currently uses listcomps to convert to/from regular tuples when serializing/deserializing. This probably has a performance impact, but all I cared about at the time was getting it working.

I'd definitely want to measure the performance impact of this on a
launchpad-sized history before landing this.

> It's not possible to pickle them either, so we can't switch to that. Besides, pickling uses more disk space than marshal. (As a quick test, I took a random row from revinfo.sql. Without changing from tuple to StaticTuple, changing it from marshal+zlib to pickle+zlib (using pickle protocol version 2) went from 39 KB to 54 KB.)

FWIW, the reason I went with marshal is that it's so much quicker than
anything else I tried. I have this recollection that cPickle was a
bunch slower.

Really though, we need to find a tech that allows us to not deserialize
the whole graph all the time -- a more sophisticated schema or
something. I have other reasons to want to use a cache that can work
over the network -- maybe couch? -- but I don't know when I'll get time
to work on it :(

> 2.) Currently I work around StaticTuple.from_sequence() only supporting sequences, not generators. I'll poke jam about that next time I see him.
>
> 3.) There weren't very many of them, but I saw some tuples like ('replace', 0, 1, 0, 1) in memory, and want to StaticTupleify them too, but don't know where they come from.

I'd guess that's to do with diff generation.

> 4.) I convert the results of iter_ancestry to StaticTuples. I should probably send a patch to bzr.dev, or maybe just leave them as normal tuples, to save the cost of converting them (though it's probably very minor).
>
> 4.) Maybe some things should be interned? I dunno.

I guess this might help if you are looking at branches that share ancestry.

> 5.) Some of the things I StaticTupled probably...

Matt Nordhoff wrote:
> Matt Nordhoff has proposed merging lp:~mnordhoff/loggerhead/statictuples into lp:loggerhead.
> 
>     Requested reviews:
>     Loggerhead-team (loggerhead-team)
> 
> 
> This totally isn't ready to merge (though it does work 100%, AFAICT), but I wanted to bring it to your attention.
> 
> This branch changes Loggerhead to make use of StaticTuples, jam's recent addition to bzrlib. (If you don't know, they're very similar to tuples, but they are faster and save memory by only holding certain types of objects that can't get in reference cycles, so they can avoid the GC overhead.)

Thanks for poking at this.

> I'm bundling a copy of the Python version of StaticTuple for compatibility with older bzr versions. This has a performance cost, but hopefully it's minor, and worth it.
> 
> I don't really have information on how much of a savings this is; I haven't been running it long enough for memory usage to stabilize. But I figure it should reduce the number of tuples by 50-75%, which is 100,000-300,000 for me.
> 
> BTW, I fixed a missing import since the last time Launchpad mirrored this branch.
> 
> Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat, which has a proposal up for merging to bzr.dev.
> 
> What's left to do:
> 
> 1.) It's not possible to marshal StaticTuples, so RevInfoDiskCache currently uses listcomps to convert to/from regular tuples when serializing/deserializing. This probably has a performance impact, but all I cared about at the time was getting it working.

I'd definitely want to measure the performance impact of this on a
launchpad-sized history before landing this.

>     It's not possible to pickle them either, so we can't switch to that. Besides, pickling uses more disk space than marshal. (As a quick test, I took a random row from revinfo.sql. Without changing from tuple to StaticTuple, changing it from marshal+zlib to pickle+zlib (using pickle protocol version 2) went from 39 KB to 54 KB.)

FWIW, the reason I went with marshal is that it's so much quicker than
anything else I tried.  I have this recollection that cPickle was a
bunch slower.

Really though, we need to find a tech that allows us to not deserialize
the whole graph all the time -- a more sophisticated schema or
something.  I have other reasons to want to use a cache that can work
over the network -- maybe couch? -- but I don't know when I'll get time
to work on it :(

> 2.) Currently I work around StaticTuple.from_sequence() only supporting sequences, not generators. I'll poke jam about that next time I see him.
> 
> 3.) There weren't very many of them, but I saw some tuples like ('replace', 0, 1, 0, 1) in memory, and want to StaticTupleify them too, but don't know where they come from.

I'd guess that's to do with diff generation.

> 4.) I convert the results of iter_ancestry to StaticTuples. I should probably send a patch to bzr.dev, or maybe just leave them as normal tuples, to save the cost of converting them (though it's probably very minor).
>
> 4.) Maybe some things should be interned? I dunno.

I guess this might help if you are looking at branches that share ancestry.

> 5.) Some of the things I StaticTupled probably have a negligible impact on memory usage, but hey, why not?
> 
> 6.) After this, the only other interesting tuples I see in Dozer are related to Subversion exceptions (bug #436406) and a few like ('sha1:...',), which I haven't investigated yet. Plus a couple ('$some_revid', [$graph_cache]).

The latter are probably to do with the sqlite cache?

> 7.) It might be possible for some of Loggerhead's many lists to be turned into StaticTuples. In particular, the graph cache is a list of lists of 3 tuples. (They actually are sometimes modified where they're built, wholehistory.compute_whole_history_data(), but it may be worth the inconvenience of turning them into StaticTuples anyway.)

In general I think this should land -- but only if it doesn't hurt the
loading of a Launchpad-sized history from the cache too much.

Cheers,
mwh

Reply

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-10-22:

#

Download full text (5.1 KiB)

Michael Hudson wrote:
> Matt Nordhoff wrote:
>> Matt Nordhoff has proposed merging lp:~mnordhoff/loggerhead/statictuples into lp:loggerhead.
>>
>> Requested reviews:
>> Loggerhead-team (loggerhead-team)
>>
>>
>> This totally isn't ready to merge (though it does work 100%, AFAICT), but I wanted to bring it to your attention.
>>
>> This branch changes Loggerhead to make use of StaticTuples, jam's recent addition to bzrlib. (If you don't know, they're very similar to tuples, but they are faster and save memory by only holding certain types of objects that can't get in reference cycles, so they can avoid the GC overhead.)
>
> Thanks for poking at this.
>
>> I'm bundling a copy of the Python version of StaticTuple for compatibility with older bzr versions. This has a performance cost, but hopefully it's minor, and worth it.
>>
>> I don't really have information on how much of a savings this is; I haven't been running it long enough for memory usage to stabilize. But I figure it should reduce the number of tuples by 50-75%, which is 100,000-300,000 for me.
>>
>> BTW, I fixed a missing import since the last time Launchpad mirrored this branch.
>>
>> Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat, which has a proposal up for merging to bzr.dev.
>>
>> What's left to do:
>>
>> 1.) It's not possible to marshal StaticTuples, so RevInfoDiskCache currently uses listcomps to convert to/from regular tuples when serializing/deserializing. This probably has a performance impact, but all I cared about at the time was getting it working.
>
> I'd definitely want to measure the performance impact of this on a
> launchpad-sized history before landing this.

I don't have a copy of Launchpad, but I'd be happy to do some sort of
simple, mostly-good-enough test (:-P) on something less scary-big like
bzr.dev or Python.

>> It's not possible to pickle them either, so we can't switch to that. Besides, pickling uses more disk space than marshal. (As a quick test, I took a random row from revinfo.sql. Without changing from tuple to StaticTuple, changing it from marshal+zlib to pickle+zlib (using pickle protocol version 2) went from 39 KB to 54 KB.)
>
> FWIW, the reason I went with marshal is that it's so much quicker than
> anything else I tried. I have this recollection that cPickle was a
> bunch slower.

Oh, thanks for the explanation.

> Really though, we need to find a tech that allows us to not deserialize
> the whole graph all the time -- a more sophisticated schema or
> something. I have other reasons to want to use a cache that can work
> over the network -- maybe couch? -- but I don't know when I'll get time
> to work on it :(

That sounds good, but it's over my head. It's definitely not going to be
a part of this patch. :-P

>> 2.) Currently I work around StaticTuple.from_sequence() only supporting sequences, not generators. I'll poke jam about that next time I see him.
>>
>> 3.) There weren't very many of them, but I saw some tuples like ('replace', 0, 1, 0, 1) in memory, and want to StaticTupleify them too, but don't know where they come from.
>
> I'd guess that's to do with diff generation.

I just tracked it down. It comes f...

Michael Hudson wrote:
> Matt Nordhoff wrote:
>> Matt Nordhoff has proposed merging lp:~mnordhoff/loggerhead/statictuples into lp:loggerhead.
>>
>>     Requested reviews:
>>     Loggerhead-team (loggerhead-team)
>>
>>
>> This totally isn't ready to merge (though it does work 100%, AFAICT), but I wanted to bring it to your attention.
>>
>> This branch changes Loggerhead to make use of StaticTuples, jam's recent addition to bzrlib. (If you don't know, they're very similar to tuples, but they are faster and save memory by only holding certain types of objects that can't get in reference cycles, so they can avoid the GC overhead.)
> 
> Thanks for poking at this.
> 
>> I'm bundling a copy of the Python version of StaticTuple for compatibility with older bzr versions. This has a performance cost, but hopefully it's minor, and worth it.
>>
>> I don't really have information on how much of a savings this is; I haven't been running it long enough for memory usage to stabilize. But I figure it should reduce the number of tuples by 50-75%, which is 100,000-300,000 for me.
>>
>> BTW, I fixed a missing import since the last time Launchpad mirrored this branch.
>>
>> Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat, which has a proposal up for merging to bzr.dev.
>>
>> What's left to do:
>>
>> 1.) It's not possible to marshal StaticTuples, so RevInfoDiskCache currently uses listcomps to convert to/from regular tuples when serializing/deserializing. This probably has a performance impact, but all I cared about at the time was getting it working.
> 
> I'd definitely want to measure the performance impact of this on a
> launchpad-sized history before landing this.

I don't have a copy of Launchpad, but I'd be happy to do some sort of
simple, mostly-good-enough test (:-P) on something less scary-big like
bzr.dev or Python.

>>     It's not possible to pickle them either, so we can't switch to that. Besides, pickling uses more disk space than marshal. (As a quick test, I took a random row from revinfo.sql. Without changing from tuple to StaticTuple, changing it from marshal+zlib to pickle+zlib (using pickle protocol version 2) went from 39 KB to 54 KB.)
> 
> FWIW, the reason I went with marshal is that it's so much quicker than
> anything else I tried.  I have this recollection that cPickle was a
> bunch slower.

Oh, thanks for the explanation.

> Really though, we need to find a tech that allows us to not deserialize
> the whole graph all the time -- a more sophisticated schema or
> something.  I have other reasons to want to use a cache that can work
> over the network -- maybe couch? -- but I don't know when I'll get time
> to work on it :(

That sounds good, but it's over my head. It's definitely not going to be
a part of this patch. :-P

>> 2.) Currently I work around StaticTuple.from_sequence() only supporting sequences, not generators. I'll poke jam about that next time I see him.
>>
>> 3.) There weren't very many of them, but I saw some tuples like ('replace', 0, 1, 0, 1) in memory, and want to StaticTupleify them too, but don't know where they come from.
> 
> I'd guess that's to do with diff generation.

I just tracked it down. It comes from the sequencematcher in
bzrlib.patiencediff. PatienceDiff could probably be changed to use
StaticTuples, then, not that it's necessary.

It disturbs me that such things are sticking around in memory at all.
Shouldn't they be garbage collected as soon as the diff is generated?

>> 4.) I convert the results of iter_ancestry to StaticTuples. I should probably send a patch to bzr.dev, or maybe just leave them as normal tuples, to save the cost of converting them (though it's probably very minor).
>>
>> 4.) Maybe some things should be interned? I dunno.
> 
> I guess this might help if you are looking at branches that share ancestry.

Yeah, probably. I doubt I'll work on it, though.

>> 5.) Some of the things I StaticTupled probably have a negligible impact on memory usage, but hey, why not?
>>
>> 6.) After this, the only other interesting tuples I see in Dozer are related to Subversion exceptions (bug #436406) and a few like ('sha1:...',), which I haven't investigated yet. Plus a couple ('$some_revid', [$graph_cache]).

I just checked out the sha1 tuples. Looks like they're from
bzrlib.chk_map, which jam is working on converting to use StaticTuples.
So I guess I doubly don't need to do anything there.

> The latter are probably to do with the sqlite cache?

Ah, I bet it's RevInfoMemoryCache:

self._cache[key] = (revid, data)

>> 7.) It might be possible for some of Loggerhead's many lists to be turned into StaticTuples. In particular, the graph cache is a list of lists of 3 tuples. (They actually are sometimes modified where they're built, wholehistory.compute_whole_history_data(), but it may be worth the inconvenience of turning them into StaticTuples anyway.)
> 
> In general I think this should land -- but only if it doesn't hurt the
> loading of a Launchpad-sized history from the cache too much.

Awesomesocks. :-)

What do you think of switching to cPickle, if it proves to be faster
than the listcomp stuff? Are you okay with the performance and disk
space cost?

> Cheers,
> mwh
--

Reply

lp://staging/~mnordhoff/loggerhead/statictuples updated on 2009-10-22

403. By Matt Nordhoff on 2009-10-21

Update _static_tuple_py.py to lp:~mnordhoff/bzr/statictuple-pickling r4779, and update the TODO that says pickling is not supported.

(revid: <email address hidden>)

Revision history for this message

John A Meinel (jameinel) wrote on 2009-10-27:

#

Download full text (5.7 KiB)

I'm obviously not an official reviewer, but I figured I'd poke my head in here.

I'm a bit surprised at some of the changes that seem pretty unrelated. Such as:
195 if (base < 100) and (dot != 0):
196 - out += '.%d' % (dot,)
197 + out += '.%d' % (dot)

(I realize that the latter probably works, but if 'dot' ever ends up as a tuple, you'll get a rather cryptic failure mode.)

Also, I would be quite skeptical about this change:
@classmethod
223 - def is_installed(cls):
224 - return os.environ.get(cls._reloader_environ_key)
225 + def is_installed(self):
226 + return os.environ.get(self._reloader_environ_key)
227

^- You are changing a "classmethod" to use the "self" attribute, which is, at best, confusing. You don't have an instance here, you just have the Class object, hence why the original author used "cls" to clarify that.

279 - revision_graph[key] = tuple(parent for parent in parents if parent
280 - in revision_graph)
281 + # TODO: It'd be nice if from_sequence supported generators.
282 + revision_graph[key] = StaticTuple.from_sequence(tuple(
283 + parent for parent in parents if parent in revision_graph))

^- It probably wouldn't be hard to add this sort of thing to FromSequence. Namely change these lines:
    if (!PySequence_Check(sequence)) {
        PyErr_Format(PyExc_TypeError, "Type %s is not a sequence type",
                     Py_TYPE(sequence)->tp_name);
        return NULL;
    }

to

PyObject *as_tuple = NULL;

if (!PySequence_Check(sequence)) {
as_tuple = PySequence_Tuple(sequence);
sequence = as_tuple;
}

...

Py_XDECREF(as_tuple);

Here are some performance numbers for a Launchpad tree. This is ~66k revisions.

61 msec TIMEIT "zlib.decompress(open('test.data', 'rb').read())"
139 msec TIMEIT "o = marshal.loads(zlib.decompress(open('test.data', 'rb').read()))"
(78 msec to marshal.loads())

76 msec TIMEIT "zlib.decompress(open('test.pickle2.z', 'rb').read())"
273 msec TIMEIT "o = cPickle.loads(....)"
(196 msec to cPickle.loads())

So marshal is *almost* as fast as zlib on the same data.

97 msec TIMEIT "zlib.decompress(...'test.json.z')"
356 msec TIMEIT "simplejson.loads(...'test.json.z')"

(the raw json is 16MB)

94 msec TIMEIT "zlib.decompress(...'test.bencode.z')"
271 msec TIMEIT "o = _bencode_pyx.bdecode(...'test.bencode.z')"

So bdecode is as fast as cPickle, and we can probably teach bdecode(as_tuples=True) to return StaticTuple objects. Though

306 msec TIMEIT "o = _bencode_pyx.bdecode_as_tuple(...)"

92 msec TIMEIT "zlib.decompress(...'test.pickle2-st.z')"
335 msec TIMEIT "o = cPickle.loads(...'test.pickle2-st.z')"

So the StaticTuple form is slightly slower than the tuple form. My guess is that cPickle has a bit of extra overhead when you don't use built-in types. (Like it reads in as a list/tuple, and then passes the data to the constructor.)

Now, what about using marshal plus casting to StaticTuples:

o = marshal.loads(zlib.decompress(...))
st = static_tuple.StaticTuple
as_st = static_tuple.StaticTuple.from_sequence
o2 = [st(as_st(x[0]), as_st(x[1]), as_st(x[2])) for x in o]

256 msec TIMEIT "o2 = ..."

So it turns out that using 'marshal' + casting the data into StaticTuple instances is faste...

I'm obviously not an official reviewer, but I figured I'd poke my head in here.

I'm a bit surprised at some of the changes that seem pretty unrelated. Such as:
195	 if (base < 100) and (dot != 0):
196	- out += '.%d' % (dot,)
197	+ out += '.%d' % (dot)

(I realize that the latter probably works, but if 'dot' ever ends up as a tuple, you'll get a rather cryptic failure mode.)

Also, I would be quite skeptical about this change:
 @classmethod
223	- def is_installed(cls):
224	- return os.environ.get(cls._reloader_environ_key)
225	+ def is_installed(self):
226	+ return os.environ.get(self._reloader_environ_key)
227

^- You are changing a "classmethod" to use the "self" attribute, which is, at best, confusing. You don't have an instance here, you just have the Class object, hence why the original author used "cls" to clarify that.

279	- revision_graph[key] = tuple(parent for parent in parents if parent
280	- in revision_graph)
281	+ # TODO: It'd be nice if from_sequence supported generators.
282	+ revision_graph[key] = StaticTuple.from_sequence(tuple(
283	+ parent for parent in parents if parent in revision_graph))

^- It probably wouldn't be hard to add this sort of thing to FromSequence. Namely change these lines:
    if (!PySequence_Check(sequence)) {
        PyErr_Format(PyExc_TypeError, "Type %s is not a sequence type",
                     Py_TYPE(sequence)->tp_name);
        return NULL;
    }

to

PyObject *as_tuple = NULL;

if (!PySequence_Check(sequence)) {
  as_tuple = PySequence_Tuple(sequence);
  sequence = as_tuple;
}

...

Py_XDECREF(as_tuple);

Here are some performance numbers for a Launchpad tree. This is ~66k revisions.

61 msec  TIMEIT "zlib.decompress(open('test.data', 'rb').read())"
139 msec  TIMEIT "o = marshal.loads(zlib.decompress(open('test.data', 'rb').read()))"
  (78 msec to marshal.loads())

76 msec  TIMEIT "zlib.decompress(open('test.pickle2.z', 'rb').read())"
273 msec  TIMEIT "o = cPickle.loads(....)"
  (196 msec to cPickle.loads())

So marshal is *almost* as fast as zlib on the same data.

97 msec TIMEIT "zlib.decompress(...'test.json.z')"
356 msec TIMEIT "simplejson.loads(...'test.json.z')"

(the raw json is 16MB)

94 msec TIMEIT "zlib.decompress(...'test.bencode.z')"
271 msec TIMEIT "o = _bencode_pyx.bdecode(...'test.bencode.z')"

So bdecode is as fast as cPickle, and we can probably teach bdecode(as_tuples=True) to return StaticTuple objects. Though

306 msec TIMEIT "o = _bencode_pyx.bdecode_as_tuple(...)"

92 msec TIMEIT "zlib.decompress(...'test.pickle2-st.z')"
335 msec TIMEIT "o = cPickle.loads(...'test.pickle2-st.z')"

So the StaticTuple form is slightly slower than the tuple form. My guess is that cPickle has a bit of extra overhead when you don't use built-in types. (Like it reads in as a list/tuple, and then passes the data to the constructor.)

Now, what about using marshal plus casting to StaticTuples:

o = marshal.loads(zlib.decompress(...))
st = static_tuple.StaticTuple
as_st = static_tuple.StaticTuple.from_sequence
o2 = [st(as_st(x[0]), as_st(x[1]), as_st(x[2])) for x in o]

256 msec TIMEIT "o2 = ..."

So it turns out that using 'marshal' + casting the data into StaticTuple instances is faster than using cPickle.

As for whether things should be interned... I would say that you probably only benefit if you are sharing between branches. Namely, the entries up there are:
  [(merge-info, where-merged, parents)].
        `merge-info` is (seq, revid, merge_depth, revno_str, end_of_merge) --
        like a merged sorted list, but the revno is stringified.
        `where-merged` is a tuple of revisions that have this revision as a
        non-lefthand parent.  Finally, `parents` is just the usual list of
        parents of this revision.

However, btree indexes use a tuple of parent *keys* (tuples), not a tuple of parent-ids (strings). So you don't actually share the data there. Though graph.iter_ancestry() is probably using the same parents lists, and thus could share with this revision-info cache. (But I would have guessed that you would only have *one* of them at a time.)

That said, if you do load 2 branches of the same project at the same time, you will save *tremendously* by interning pretty much everything. You could probably intern at the "outer" tuple if you wanted. Since I think the inner data isn't going to be repeated much outside of that.

I should also note that here:
104	+ data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for
105	+ l in data_tuples]

^- You also almost definitely want to use a StaticTuple instead of that inner list. The cost of the list will otherwise be most of your savings from using ST. If it is always a 3-sequence list, you can use my form, or you could use another ".from_sequence()".

I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call.

In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress.

The bigger question is whether it is worthwhile or not. My initial testing doesn't show a significant memory savings *just* from the marshal change. In fact, it may even show an increase. However, this was on a single branch. Also, tuples don't free memory. They use a free list of up to 2k tuples (for every size from 1-20). Which means that doing:

x = (a, b, c)
x = StaticTuple.from_sequence(x)

Will actually keep the memory allocated as both a StaticTuple *and* a Tuple.

However, eventually that tuple is likely to be re-used. But specifically 'marshal.loads()' is going to create all 60k+ (*3) tuples, and then you cast them into StaticTuple, leaving a lot of unused tuples around.

Reply

Revision history for this message

John A Meinel (jameinel) wrote on 2009-10-27:

#

I should also note. I tried hard to make a pure-python StaticTuple just inherit from a regular tuple, so that they would hopefully not have any overhead versus a normal tuple. However, note that we're currently working out some details, and it may have a small amount of overhead.

Reply

Revision history for this message

John A Meinel (jameinel) wrote on 2009-10-27:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> I should also note. I tried hard to make a pure-python StaticTuple just inherit from a regular tuple, so that they would hopefully not have any overhead versus a normal tuple. However, note that we're currently working out some details, and it may have a small amount of overhead.

I just added this proposal:

https://code.launchpad.net/~jameinel/bzr/2.1-pure-static-tuple/+merge/14006

I'm still seeing weird results on Windows. But Andrew Bennetts clarified
that it does, indeed, save him 8 bytes per object on Linux w/ python
2.6.4rc2. (for me, they are always 8-bytes bigger on python 2.6.2 and
2.6.4.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrmePIACgkQJdeBCYSNAAPwMQCgoxzgjaSq4vMLlcmdxq+J5Txu
FA4AoKHt/wdpuKa6HSjFKHl75GrRwqkm
=gMTk
-----END PGP SIGNATURE-----

Reply

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-10-27:

#

Download full text (6.1 KiB)

John A Meinel wrote:
> I'm obviously not an official reviewer, but I figured I'd poke my head in here.

Thank you! :-)

> I'm a bit surprised at some of the changes that seem pretty unrelated. Such as:
> 195 if (base < 100) and (dot != 0):
> 196 - out += '.%d' % (dot,)
> 197 + out += '.%d' % (dot)
>
> (I realize that the latter probably works, but if 'dot' ever ends up as a tuple, you'll get a rather cryptic failure mode.)
>
> Also, I would be quite skeptical about this change:
> @classmethod
> 223 - def is_installed(cls):
> 224 - return os.environ.get(cls._reloader_environ_key)
> 225 + def is_installed(self):
> 226 + return os.environ.get(self._reloader_environ_key)
> 227
>
> ^- You are changing a "classmethod" to use the "self" attribute, which is, at best, confusing. You don't have an instance here, you just have the Class object, hence why the original author used "cls" to clarify that.

I didn't makes those changes intentionally. Looks like I screwed up when
resolving merge conflicts in that file and accidentally reversed some
changes. I'll fix it now.

Thanks for mentioning it; I thought it was just LP being funny so I
hadn't checked it out.

> 279 - revision_graph[key] = tuple(parent for parent in parents if parent
> 280 - in revision_graph)
> 281 + # TODO: It'd be nice if from_sequence supported generators.
> 282 + revision_graph[key] = StaticTuple.from_sequence(tuple(
> 283 + parent for parent in parents if parent in revision_graph))
>
> ^- It probably wouldn't be hard to add this sort of thing to FromSequence. Namely change these lines:
> if (!PySequence_Check(sequence)) {
> PyErr_Format(PyExc_TypeError, "Type %s is not a sequence type",
> Py_TYPE(sequence)->tp_name);
> return NULL;
> }
>
> to
>
> PyObject *as_tuple = NULL;
>
> if (!PySequence_Check(sequence)) {
> as_tuple = PySequence_Tuple(sequence);
> sequence = as_tuple;
> }
>
> ...
>
> Py_XDECREF(as_tuple);

Oh, cool!

(Are you suggesting you'll do it or I do it? Cuz I don't mind, but if
you were going to...)

(If it supports arbitrary iterables, it could be renamed to
"from_iterable", though I guess it makes sense to keep it
"from_sequence" for backwards compatibility and to discourage passing
arbitrary iterables since it's less efficient...)

> Here are some performance numbers for a Launchpad tree. This is ~66k revisions.

[snip lots of numbers]

> So it turns out that using 'marshal' + casting the data into StaticTuple instances is faster than using cPickle.

Thanks so much for doing this analysis!

I'm not awake enough to read it yet, but I *can* understand your summary
at the end. :-P

> As for whether things should be interned... I would say that you probably only benefit if you are sharing between branches. Namely, the entries up there are:
> [(merge-info, where-merged, parents)].
> `merge-info` is (seq, revid, merge_depth, revno_str, end_of_merge) --
> like a merged sorted list, but the revno is stringified.
> `where-merged` is a tuple of revisions that have this revision as a
> non-lefthand parent. Finally, `parents` is just the usual list of
> parents of this revis...

John A Meinel wrote:
> I'm obviously not an official reviewer, but I figured I'd poke my head in here.

Thank you! :-)

> I'm a bit surprised at some of the changes that seem pretty unrelated. Such as:
> 195	 if (base < 100) and (dot != 0):
> 196	- out += '.%d' % (dot,)
> 197	+ out += '.%d' % (dot)
> 
> (I realize that the latter probably works, but if 'dot' ever ends up as a tuple, you'll get a rather cryptic failure mode.)
> 
> Also, I would be quite skeptical about this change:
>  @classmethod
> 223	- def is_installed(cls):
> 224	- return os.environ.get(cls._reloader_environ_key)
> 225	+ def is_installed(self):
> 226	+ return os.environ.get(self._reloader_environ_key)
> 227	
> 
> ^- You are changing a "classmethod" to use the "self" attribute, which is, at best, confusing. You don't have an instance here, you just have the Class object, hence why the original author used "cls" to clarify that.

I didn't makes those changes intentionally. Looks like I screwed up when
resolving merge conflicts in that file and accidentally reversed some
changes. I'll fix it now.

Thanks for mentioning it; I thought it was just LP being funny so I
hadn't checked it out.

> 279	- revision_graph[key] = tuple(parent for parent in parents if parent
> 280	- in revision_graph)
> 281	+ # TODO: It'd be nice if from_sequence supported generators.
> 282	+ revision_graph[key] = StaticTuple.from_sequence(tuple(
> 283	+ parent for parent in parents if parent in revision_graph))
> 
> ^- It probably wouldn't be hard to add this sort of thing to FromSequence. Namely change these lines:
>     if (!PySequence_Check(sequence)) {
>         PyErr_Format(PyExc_TypeError, "Type %s is not a sequence type",
>                      Py_TYPE(sequence)->tp_name);
>         return NULL;
>     }
> 
> to
> 
> PyObject *as_tuple = NULL;
> 
> if (!PySequence_Check(sequence)) {
>   as_tuple = PySequence_Tuple(sequence);
>   sequence = as_tuple;
> }
> 
> ...
> 
> Py_XDECREF(as_tuple);

Oh, cool!

(Are you suggesting you'll do it or I do it? Cuz I don't mind, but if
you were going to...)

(If it supports arbitrary iterables, it could be renamed to
"from_iterable", though I guess it makes sense to keep it
"from_sequence" for backwards compatibility and to discourage passing
arbitrary iterables since it's less efficient...)

> Here are some performance numbers for a Launchpad tree. This is ~66k revisions.

[snip lots of numbers]

> So it turns out that using 'marshal' + casting the data into StaticTuple instances is faster than using cPickle.

Thanks so much for doing this analysis!

I'm not awake enough to read it yet, but I *can* understand your summary
at the end. :-P

> As for whether things should be interned... I would say that you probably only benefit if you are sharing between branches. Namely, the entries up there are:
>   [(merge-info, where-merged, parents)].
>         `merge-info` is (seq, revid, merge_depth, revno_str, end_of_merge) --
>         like a merged sorted list, but the revno is stringified.
>         `where-merged` is a tuple of revisions that have this revision as a
>         non-lefthand parent.  Finally, `parents` is just the usual list of
>         parents of this revision.
> 
> However, btree indexes use a tuple of parent *keys* (tuples), not a tuple of parent-ids (strings). So you don't actually share the data there. Though graph.iter_ancestry() is probably using the same parents lists, and thus could share with this revision-info cache. (But I would have guessed that you would only have *one* of them at a time.)
> 
> That said, if you do load 2 branches of the same project at the same time, you will save *tremendously* by interning pretty much everything. You could probably intern at the "outer" tuple if you wanted. Since I think the inner data isn't going to be repeated much outside of that.

Loggerhead will keep multiple revgraph caches around at the same time,
and they very well might be from the same project. Hm.

If we intern things, it probably makes sense to intern the whole 3-tuple.

Interning things bothers me, though, since _static_tuple_py interns
forever and I don't want to increase memory usage that much for people
with older bzrlibs.

(Also, interning hurts my brain a bit. :-P )

> I should also note that here:
> 104	+ data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for
> 105	+ l in data_tuples]
> 
> ^- You also almost definitely want to use a StaticTuple instead of that inner list. The cost of the list will otherwise be most of your savings from using ST. If it is always a 3-sequence list, you can use my form, or you could use another ".from_sequence()".

Yeah, it uses the inner list because `where-merged` is populated in a
second loop. I pretty much don't understand that code
(loggerhead/wholehistory.py) or the performance implications of changing
it, so I haven't looked into it yet.

Hmm, a quick hack is easier than I thought. I'll check it out right now.

> I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call.

Ehh, didn't figure that was significant. OK, will do.

> In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress.
> 
> The bigger question is whether it is worthwhile or not. My initial testing doesn't show a significant memory savings *just* from the marshal change. In fact, it may even show an increase. However, this was on a single branch. Also, tuples don't free memory. They use a free list of up to 2k tuples (for every size from 1-20). Which means that doing:
> 
> x = (a, b, c)
> x = StaticTuple.from_sequence(x)
> 
> Will actually keep the memory allocated as both a StaticTuple *and* a Tuple.
> 
> However, eventually that tuple is likely to be re-used. But specifically 'marshal.loads()' is going to create all 60k+ (*3) tuples, and then you cast them into StaticTuple, leaving a lot of unused tuples around.

Keeping a bunch of free tuples around is probably acceptable, since they
will be reused next time a cache gets loaded, but it's still rather
unfortunate. Huh.
--

Reply

lp://staging/~mnordhoff/loggerhead/statictuples updated on 2009-10-27

404. By Matt Nordhoff on 2009-10-27

Fix some changes I accidentally reverted when merging the trunk.

405. By Matt Nordhoff on 2009-10-27

Bind StaticTuple.from_sequence to "as_st" when using it in loops.

406. By Matt Nordhoff on 2009-10-27

Make each...item in the revgraph cache a StaticTuple instead of a list.

This is a little ugly because the items actually get mutated while the cache is being generated, but it's not a big deal.

Even if we decide to back this out later for performance, we should keep the changes to changecache.py: There's no reason they have to stay lists after being roundtripped through marshal.

I benchmarked compute_whole_history_data(bzr.dev) and any difference was within the normal variations, which was only 3%. So it's probably okay.

I didn't actually check what the memory difference is.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-10-27:

#

Download full text (4.3 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>

>>
>> PyObject *as_tuple = NULL;
>>
>> if (!PySequence_Check(sequence)) {
>> as_tuple = PySequence_Tuple(sequence);
>> sequence = as_tuple;
>> }
>>
>> ...
>>
>> Py_XDECREF(as_tuple);
>
> Oh, cool!
>
> (Are you suggesting you'll do it or I do it? Cuz I don't mind, but if
> you were going to...)
>
> (If it supports arbitrary iterables, it could be renamed to
> "from_iterable", though I guess it makes sense to keep it
> "from_sequence" for backwards compatibility and to discourage passing
> arbitrary iterables since it's less efficient...)

So I should comment a bit more. The code you have is something like:

for r, parents in graph.iteritems():
val = tuple(p for p in parents if p in graph)

However, it turns out that list comprehensions are faster than
generators. Mostly because a generator has a 'context' that list
comprehensions don't have.

In essence if you use a generator, it adds the overhead of 1 function
call. So doing the above as

for r, parents in graph.iteritems():
val = tuple([p for p in parents if p in graph])

Should actually be faster. And has the distinct advantage that it
generates a sequence rather than an iterable

for r, parents in graph.iteritems():
val = StaticTuple.from_sequence([p for p in parents if p in graph])

just works :)

The only reason not to use a list comprehension over a generator is when
the number of items would cause significant memory overhead. That
certainly isn't true in the case of parent graphs, where you only have a
small handful of keys.
...

> If we intern things, it probably makes sense to intern the whole 3-tuple.
>
> Interning things bothers me, though, since _static_tuple_py interns
> forever and I don't want to increase memory usage that much for people
> with older bzrlibs.
>
> (Also, interning hurts my brain a bit. :-P )

It is up to you, but it saves *tons* of memory otherwise. You could use
a flag based on whether you have the compiled extensions...

>
>> I should also note that here:
>> 104 + data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for
>> 105 + l in data_tuples]
>>
>> ^- You also almost definitely want to use a StaticTuple instead of that inner list. The cost of the list will otherwise be most of your savings from using ST. If it is always a 3-sequence list, you can use my form, or you could use another ".from_sequence()".
>
> Yeah, it uses the inner list because `where-merged` is populated in a
> second loop. I pretty much don't understand that code
> (loggerhead/wholehistory.py) or the performance implications of changing
> it, so I haven't looked into it yet.
>
> Hmm, a quick hack is easier than I thought. I'll check it out right now.
>
>> I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call.
>
> Ehh, didn't figure that was significant. OK, will do.
>
>> In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress.
>>
>> The bigger question is whether it is worthwhile or no...

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>

>>
>> PyObject *as_tuple = NULL;
>>
>> if (!PySequence_Check(sequence)) {
>>   as_tuple = PySequence_Tuple(sequence);
>>   sequence = as_tuple;
>> }
>>
>> ...
>>
>> Py_XDECREF(as_tuple);
> 
> Oh, cool!
> 
> (Are you suggesting you'll do it or I do it? Cuz I don't mind, but if
> you were going to...)
> 
> (If it supports arbitrary iterables, it could be renamed to
> "from_iterable", though I guess it makes sense to keep it
> "from_sequence" for backwards compatibility and to discourage passing
> arbitrary iterables since it's less efficient...)

So I should comment a bit more. The code you have is something like:

for r, parents in graph.iteritems():
  val = tuple(p for p in parents if p in graph)

However, it turns out that list comprehensions are faster than
generators. Mostly because a generator has a 'context' that list
comprehensions don't have.

In essence if you use a generator, it adds the overhead of 1 function
call. So doing the above as

for r, parents in graph.iteritems():
  val = tuple([p for p in parents if p in graph])

Should actually be faster. And has the distinct advantage that it
generates a sequence rather than an iterable

for r, parents in graph.iteritems():
  val = StaticTuple.from_sequence([p for p in parents if p in graph])

just works :)

The only reason not to use a list comprehension over a generator is when
the number of items would cause significant memory overhead. That
certainly isn't true in the case of parent graphs, where you only have a
small handful of keys.
...

> If we intern things, it probably makes sense to intern the whole 3-tuple.
> 
> Interning things bothers me, though, since _static_tuple_py interns
> forever and I don't want to increase memory usage that much for people
> with older bzrlibs.
> 
> (Also, interning hurts my brain a bit. :-P )

It is up to you, but it saves *tons* of memory otherwise. You could use
a flag based on whether you have the compiled extensions...

> 
>> I should also note that here:
>> 104	+ data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for
>> 105	+ l in data_tuples]
>>
>> ^- You also almost definitely want to use a StaticTuple instead of that inner list. The cost of the list will otherwise be most of your savings from using ST. If it is always a 3-sequence list, you can use my form, or you could use another ".from_sequence()".
> 
> Yeah, it uses the inner list because `where-merged` is populated in a
> second loop. I pretty much don't understand that code
> (loggerhead/wholehistory.py) or the performance implications of changing
> it, so I haven't looked into it yet.
> 
> Hmm, a quick hack is easier than I thought. I'll check it out right now.
> 
>> I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call.
> 
> Ehh, didn't figure that was significant. OK, will do.
> 
>> In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress.
>>
>> The bigger question is whether it is worthwhile or not. My initial testing doesn't show a significant memory savings *just* from the marshal change. In fact, it may even show an increase. However, this was on a single branch. Also, tuples don't free memory. They use a free list of up to 2k tuples (for every size from 1-20). Which means that doing:
>>
>> x = (a, b, c)
>> x = StaticTuple.from_sequence(x)
>>
>> Will actually keep the memory allocated as both a StaticTuple *and* a Tuple.
>>
>> However, eventually that tuple is likely to be re-used. But specifically 'marshal.loads()' is going to create all 60k+ (*3) tuples, and then you cast them into StaticTuple, leaving a lot of unused tuples around.
> 
> Keeping a bunch of free tuples around is probably acceptable, since they
> will be reused next time a cache gets loaded, but it's still rather
> unfortunate. Huh.

It is only 2000, though. Versus the 60,000 or so that are created. So it
isn't a huge overhead.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrm98kACgkQJdeBCYSNAAO+BgCggbnxJUaYp4mNxDQBMKQRUQeq
T70Anjm/I/WGPy/jwKb7tviK88H7mP/j
=9Vqm
-----END PGP SIGNATURE-----

Reply

lp://staging/~mnordhoff/loggerhead/statictuples updated on 2009-10-27

407. By Matt Nordhoff on 2009-10-27

Update _static_tuple_py.py to r4772 of bzr.dev

408. By Matt Nordhoff on 2009-10-27

Update/remove TODOs.

409. By Matt Nordhoff on 2009-10-27

Ah, iter_ancestry does return StaticTuples now. No need to convert then.

(Well, it likely doesn't on pre-CHK repo formats, but I don't care much.)

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-10-27:

#

(Oh, forgot about replying to this again.)

John A Meinel wrote:
> So I should comment a bit more. The code you have is something like:
>
> for r, parents in graph.iteritems():
> val = tuple(p for p in parents if p in graph)
>
> However, it turns out that list comprehensions are faster than
> generators. Mostly because a generator has a 'context' that list
> comprehensions don't have.
>
> In essence if you use a generator, it adds the overhead of 1 function
> call. So doing the above as
>
> for r, parents in graph.iteritems():
> val = tuple([p for p in parents if p in graph])
>
> Should actually be faster. And has the distinct advantage that it
> generates a sequence rather than an iterable
>
> for r, parents in graph.iteritems():
> val = StaticTuple.from_sequence([p for p in parents if p in graph])
>
> just works :)
>
> The only reason not to use a list comprehension over a generator is when
> the number of items would cause significant memory overhead. That
> certainly isn't true in the case of parent graphs, where you only have a
> small handful of keys.
> ...

OK. Done.

Although I didn't benchmark it; my system is rather overloaded right
now. And I'm lazy. :-P

>> If we intern things, it probably makes sense to intern the whole 3-tuple.
>
>> Interning things bothers me, though, since _static_tuple_py interns
>> forever and I don't want to increase memory usage that much for people
>> with older bzrlibs.
>
>> (Also, interning hurts my brain a bit. :-P )
>
> It is up to you, but it saves *tons* of memory otherwise. You could use
> a flag based on whether you have the compiled extensions...

Eh. I'll work on interning next.

I actually wrote the code for using a flag based on whether the compiled
extensions are available, but I'm not sure I'll use it yet. I don't like
the overhead. :-P

I wonder how much of an impact interning the stringified revnos would
have? They're tiny, but they add up, and most of them are duplicates...

>> Keeping a bunch of free tuples around is probably acceptable, since they
>> will be reused next time a cache gets loaded, but it's still rather
>> unfortunate. Huh.
>
> It is only 2000, though. Versus the 60,000 or so that are created. So it
> isn't a huge overhead.

2000 tuples is _nothing_. There are already 50-60k tuples in memory
before Loggerhead gets a single request, from all of the libraries and
everything.

> John
> =:->
--

(Oh, forgot about replying to this again.)

John A Meinel wrote:
> So I should comment a bit more. The code you have is something like:
> 
> for r, parents in graph.iteritems():
>   val = tuple(p for p in parents if p in graph)
> 
> However, it turns out that list comprehensions are faster than
> generators. Mostly because a generator has a 'context' that list
> comprehensions don't have.
> 
> In essence if you use a generator, it adds the overhead of 1 function
> call. So doing the above as
> 
> for r, parents in graph.iteritems():
>   val = tuple([p for p in parents if p in graph])
> 
> Should actually be faster. And has the distinct advantage that it
> generates a sequence rather than an iterable
> 
> for r, parents in graph.iteritems():
>   val = StaticTuple.from_sequence([p for p in parents if p in graph])
> 
> just works :)
> 
> The only reason not to use a list comprehension over a generator is when
> the number of items would cause significant memory overhead. That
> certainly isn't true in the case of parent graphs, where you only have a
> small handful of keys.
> ...

OK. Done.

Although I didn't benchmark it; my system is rather overloaded right
now. And I'm lazy. :-P

>> If we intern things, it probably makes sense to intern the whole 3-tuple.
> 
>> Interning things bothers me, though, since _static_tuple_py interns
>> forever and I don't want to increase memory usage that much for people
>> with older bzrlibs.
> 
>> (Also, interning hurts my brain a bit. :-P )
> 
> It is up to you, but it saves *tons* of memory otherwise. You could use
> a flag based on whether you have the compiled extensions...

Eh. I'll work on interning next.

I actually wrote the code for using a flag based on whether the compiled
extensions are available, but I'm not sure I'll use it yet. I don't like
the overhead. :-P

I wonder how much of an impact interning the stringified revnos would
have? They're tiny, but they add up, and most of them are duplicates...

>> Keeping a bunch of free tuples around is probably acceptable, since they
>> will be reused next time a cache gets loaded, but it's still rather
>> unfortunate. Huh.
> 
> It is only 2000, though. Versus the 60,000 or so that are created. So it
> isn't a huge overhead.

2000 tuples is _nothing_. There are already 50-60k tuples in memory
before Loggerhead gets a single request, from all of the libraries and
everything.

> John
> =:->
--

Reply

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-10-27:

#

Matt Nordhoff wrote:
> I wonder how much of an impact interning the stringified revnos would
> have? They're tiny, but they add up, and most of them are duplicates...

Although interning them when they're unmarshalled will lead to one ugly
list comprehension...
--

Reply

Revision history for this message

John A Meinel (jameinel) wrote on 2009-10-27:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matt Nordhoff wrote:
> Matt Nordhoff wrote:
>> I wonder how much of an impact interning the stringified revnos would
>> have? They're tiny, but they add up, and most of them are duplicates...
>
> Although interning them when they're unmarshalled will lead to one ugly
> list comprehension...

as_st(x).intern()

isn't all that bad.

st(as_st(x[0]).intern(), as_st(x[1]).intern(),
as_st(x[2]).intern()).intern()

Isn't great, but it isn't terrible, either.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrnGp4ACgkQJdeBCYSNAAP3oQCeJy17L/u1+obC2rmJpPaiu0Yd
D5QAoJD8Z1Z+b2KeTGAdZjer8SUBLm4s
=WRny
-----END PGP SIGNATURE-----

Reply

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-10-27:

#

John A Meinel wrote:
> Matt Nordhoff wrote:
>> Matt Nordhoff wrote:
>>> I wonder how much of an impact interning the stringified revnos would
>>> have? They're tiny, but they add up, and most of them are duplicates...
>> Although interning them when they're unmarshalled will lead to one ugly
>> list comprehension...
>
> as_st(x).intern()
>
> isn't all that bad.
>
> st(as_st(x[0]).intern(), as_st(x[1]).intern(),
> as_st(x[2]).intern()).intern()
>
> Isn't great, but it isn't terrible, either.
>
> John
> =:->

Yeah, but interning the individual items in the tuples would be.
--

Reply

lp://staging/~mnordhoff/loggerhead/statictuples updated on 2009-10-27

410. By Matt Nordhoff on 2009-10-27

Replace some genexps with listcomps. Hopefully faster.

I haven't tested it (my system is way to bogged down for accurate benchmarks right now), but I have it lying around, and constantly (un)shelving it is inconvenient. :P

411. By Matt Nordhoff on 2009-10-27

Go back to converting iter_ancestry to ST to be safe.

412. By Matt Nordhoff on 2009-10-27

Ehh, convert the iter_ancestry thing back to a generator expression.

It's a lot of data, and besides, one little function call is nothing compared to the rest of what iter_ancestry is doing.

413. By Matt Nordhoff on 2009-10-27

Fix some 3-space indentation

414. By Matt Nordhoff on 2009-10-27

StaticTuple interning!

I also made some of the code longer but hopefully more readable.

Testing compute_whole_history_data(bzr.dev), this does help, but it's not as much as I was expecting.

OTOH, a list and dict each 28k items long is not free.

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-11-18:

#

Comparing the speed of RevInfoDiskCache on lp:launchpad/devel in the trunk and r415 of my branch, in seconds:

_ trunk statictuples
get 0.28 1.33
set 1.25 1.40

That's unfortunate. :\

Reply

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2009-11-18:

#

Same thing, on lp:bzr:

_ trunk statictuples
get 0.106 0.486
set 0.491 1.4

Sigh.

Reply

lp://staging/~mnordhoff/loggerhead/statictuples updated on 2010-02-26

415. By Matt Nordhoff on 2009-11-18

Style tweak.

This either has a negligible impact on performance or a very slight positive impact.

Curiously, making the same change to set() has a slight negative impact (~1.39 -> ~1.58 secs for LP).

(Boy, I've had this sitting on my hard drive since 2009-11-05T21:01:27Z without fully testing it...)

416. By Matt Nordhoff on 2009-12-01

Pull in _static_tuple_py.py from lp:bzr r4842.

417. By Matt Nordhoff on 2009-12-01

Merge lp:loggerhead

418. By Matt Nordhoff on 2010-02-26

Pull in _static_tuple_py.py from lp:bzr r5055

419. By Matt Nordhoff on 2010-02-26

Merge lp:loggerhead

Revision history for this message

Matt Nordhoff (mnordhoff) wrote on 2010-02-27:

#

So, I haven't worked on this in a long time. In my opinion, this is
what's blocking it (I'm probably forgetting some things, though):

* The disk caching code has to convert between StaticTuples and normal
tuples, which is ugly and quite slow.

* For people without C extensions, or with an old version of bzr,
_static_tuple_py adds some overhead. It's not much, but Loggerhead has a
frightening number of tuples, so it could be a problem.

* I'm not 100% sure StaticTuple's intern cache (a SimpleSet or dict) is
entirely thread-safe. IIRC the answer was "the GIL should handle it",
but I _did_ once have a crash from some sort of bizarre corruption.

* The interning. Honestly, interning hurts my brain. I'm not sure what
all of the consequences are. In particular, _static_tuple_py's cache
never, ever removes things, so it could get quite large. Plus, some
other things could probably be interned (e.g. revno strings).

* There are some TODO comments about minor things that should either be
removed or actually fixed.
--
Matt Nordhoff

Reply

lp://staging/~mnordhoff/loggerhead/statictuples updated on 2010-05-06

420. By Matt Nordhoff on 2010-03-30: Merge lp:loggerhead
421. By Matt Nordhoff on 2010-05-06: Merge lp:loggerhead

loggerhead

Merge lp://staging/~mnordhoff/loggerhead/statictuples into lp://staging/loggerhead

Commit message

Description of the change

Unmerged revisions

Updating diff...

Preview Diff

Subscribers