Matt Nordhoff wrote: > Matt Nordhoff has proposed merging lp:~mnordhoff/loggerhead/statictuples into lp:loggerhead. > > Requested reviews: > Loggerhead-team (loggerhead-team) > > > This totally isn't ready to merge (though it does work 100%, AFAICT), but I wanted to bring it to your attention. > > This branch changes Loggerhead to make use of StaticTuples, jam's recent addition to bzrlib. (If you don't know, they're very similar to tuples, but they are faster and save memory by only holding certain types of objects that can't get in reference cycles, so they can avoid the GC overhead.) Thanks for poking at this. > I'm bundling a copy of the Python version of StaticTuple for compatibility with older bzr versions. This has a performance cost, but hopefully it's minor, and worth it. > > I don't really have information on how much of a savings this is; I haven't been running it long enough for memory usage to stabilize. But I figure it should reduce the number of tuples by 50-75%, which is 100,000-300,000 for me. > > BTW, I fixed a missing import since the last time Launchpad mirrored this branch. > > Also, you need to run it against lp:~jameinel/bzr/2.1-st-concat, which has a proposal up for merging to bzr.dev. > > What's left to do: > > 1.) It's not possible to marshal StaticTuples, so RevInfoDiskCache currently uses listcomps to convert to/from regular tuples when serializing/deserializing. This probably has a performance impact, but all I cared about at the time was getting it working. I'd definitely want to measure the performance impact of this on a launchpad-sized history before landing this. > It's not possible to pickle them either, so we can't switch to that. Besides, pickling uses more disk space than marshal. (As a quick test, I took a random row from revinfo.sql. Without changing from tuple to StaticTuple, changing it from marshal+zlib to pickle+zlib (using pickle protocol version 2) went from 39 KB to 54 KB.) FWIW, the reason I went with marshal is that it's so much quicker than anything else I tried. I have this recollection that cPickle was a bunch slower. Really though, we need to find a tech that allows us to not deserialize the whole graph all the time -- a more sophisticated schema or something. I have other reasons to want to use a cache that can work over the network -- maybe couch? -- but I don't know when I'll get time to work on it :( > 2.) Currently I work around StaticTuple.from_sequence() only supporting sequences, not generators. I'll poke jam about that next time I see him. > > 3.) There weren't very many of them, but I saw some tuples like ('replace', 0, 1, 0, 1) in memory, and want to StaticTupleify them too, but don't know where they come from. I'd guess that's to do with diff generation. > 4.) I convert the results of iter_ancestry to StaticTuples. I should probably send a patch to bzr.dev, or maybe just leave them as normal tuples, to save the cost of converting them (though it's probably very minor). > > 4.) Maybe some things should be interned? I dunno. I guess this might help if you are looking at branches that share ancestry. > 5.) Some of the things I StaticTupled probably have a negligible impact on memory usage, but hey, why not? > > 6.) After this, the only other interesting tuples I see in Dozer are related to Subversion exceptions (bug #436406) and a few like ('sha1:...',), which I haven't investigated yet. Plus a couple ('$some_revid', [$graph_cache]). The latter are probably to do with the sqlite cache? > 7.) It might be possible for some of Loggerhead's many lists to be turned into StaticTuples. In particular, the graph cache is a list of lists of 3 tuples. (They actually are sometimes modified where they're built, wholehistory.compute_whole_history_data(), but it may be worth the inconvenience of turning them into StaticTuples anyway.) In general I think this should land -- but only if it doesn't hurt the loading of a Launchpad-sized history from the cache too much. Cheers, mwh