-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ... > >> >> PyObject *as_tuple = NULL; >> >> if (!PySequence_Check(sequence)) { >> as_tuple = PySequence_Tuple(sequence); >> sequence = as_tuple; >> } >> >> ... >> >> Py_XDECREF(as_tuple); > > Oh, cool! > > (Are you suggesting you'll do it or I do it? Cuz I don't mind, but if > you were going to...) > > (If it supports arbitrary iterables, it could be renamed to > "from_iterable", though I guess it makes sense to keep it > "from_sequence" for backwards compatibility and to discourage passing > arbitrary iterables since it's less efficient...) So I should comment a bit more. The code you have is something like: for r, parents in graph.iteritems(): val = tuple(p for p in parents if p in graph) However, it turns out that list comprehensions are faster than generators. Mostly because a generator has a 'context' that list comprehensions don't have. In essence if you use a generator, it adds the overhead of 1 function call. So doing the above as for r, parents in graph.iteritems(): val = tuple([p for p in parents if p in graph]) Should actually be faster. And has the distinct advantage that it generates a sequence rather than an iterable for r, parents in graph.iteritems(): val = StaticTuple.from_sequence([p for p in parents if p in graph]) just works :) The only reason not to use a list comprehension over a generator is when the number of items would cause significant memory overhead. That certainly isn't true in the case of parent graphs, where you only have a small handful of keys. ... > If we intern things, it probably makes sense to intern the whole 3-tuple. > > Interning things bothers me, though, since _static_tuple_py interns > forever and I don't want to increase memory usage that much for people > with older bzrlibs. > > (Also, interning hurts my brain a bit. :-P ) It is up to you, but it saves *tons* of memory otherwise. You could use a flag based on whether you have the compiled extensions... > >> I should also note that here: >> 104 + data_statictuples = [[StaticTuple.from_sequence(t) for t in l] for >> 105 + l in data_tuples] >> >> ^- You also almost definitely want to use a StaticTuple instead of that inner list. The cost of the list will otherwise be most of your savings from using ST. If it is always a 3-sequence list, you can use my form, or you could use another ".from_sequence()". > > Yeah, it uses the inner list because `where-merged` is populated in a > second loop. I pretty much don't understand that code > (loggerhead/wholehistory.py) or the performance implications of changing > it, so I haven't looked into it yet. > > Hmm, a quick hack is easier than I thought. I'll check it out right now. > >> I'll also note that you really do want to create a local variable for "StaticTuple.from_sequence" otherwise you pay an attribute lookup for every call. > > Ehh, didn't figure that was significant. OK, will do. > >> In short, using StaticTuple.from_sequence(marshal.loads()) costs (on my machine, with Launchpad's repo) 256ms up from 138ms for just marshal, up from 61ms to just zlib.decompress. >> >> The bigger question is whether it is worthwhile or not. My initial testing doesn't show a significant memory savings *just* from the marshal change. In fact, it may even show an increase. However, this was on a single branch. Also, tuples don't free memory. They use a free list of up to 2k tuples (for every size from 1-20). Which means that doing: >> >> x = (a, b, c) >> x = StaticTuple.from_sequence(x) >> >> Will actually keep the memory allocated as both a StaticTuple *and* a Tuple. >> >> However, eventually that tuple is likely to be re-used. But specifically 'marshal.loads()' is going to create all 60k+ (*3) tuples, and then you cast them into StaticTuple, leaving a lot of unused tuples around. > > Keeping a bunch of free tuples around is probably acceptable, since they > will be reused next time a cache gets loaded, but it's still rather > unfortunate. Huh. It is only 2000, though. Versus the 60,000 or so that are created. So it isn't a huge overhead. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkrm98kACgkQJdeBCYSNAAO+BgCggbnxJUaYp4mNxDQBMKQRUQeq T70Anjm/I/WGPy/jwKb7tviK88H7mP/j =9Vqm -----END PGP SIGNATURE-----