Merge lp://staging/~jameinel/loggerhead/history_db into lp://staging/loggerhead
Status: | Merged |
---|---|
Merged at revision: | 424 |
Proposed branch: | lp://staging/~jameinel/loggerhead/history_db |
Merge into: | lp://staging/loggerhead |
Diff against target: | 0 lines |
To merge this branch: | bzr merge lp://staging/~jameinel/loggerhead/history_db |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Michael Hudson-Doyle | Approve | ||
Review via email: mp+24637@code.staging.launchpad.net |
This proposal supersedes a proposal from 2010-05-03.
Description of the change
This makes bzr-history-db a strict dependency for Loggerhead. However, it does result in some nice benefits.
The key points are:
1) Strict dependency. However bzr-history-db is pretty small. If people prefer I can merge the code into the loggerhead codebase. (I can preserve file-ids so that bringing in any new changes should be trivially easy.)
2) The very first time a branch is viewed which has never been seen before and is unrelated to any other branches is slower. For example[1]:
uncached disk cached memory cached peakmem path
old 12.5s 0.618s 0.126s 77MB /emacs/
new 28.5s 0.064s 0.047s 24MB[2] /emacs/
old 12.9s 0.895s 0.475s 93.6MB /emacs/trunk/files
new 28.8s 0.944s 0.359s 45MB[3] /emacs/trunk/files
3) The 'first time' is a lot better when the second branch is related to the first. In my testing of a 100rev old branch of emacs, the time to import it into history-db is about 200ms. So the 64ms to display from disk becomes only 260ms.
uncached path
old 12.5s /emacs/
new 0.254s /emacs/
Also note that because the memory cache is gone, and the disk cache is shared, loading both trunk/changes an yamaoka/changes has:
old 128MB
new 24MB
Loggerhead seems to use an LRUCache with a max of 10 entries. So peak memory is potentially ~50MB*10=500MB, vs a flat 24MB.
4) The size of the disk cache increases. So it is even more important that we delete any temp dirs we create. Size with one branch:
old 2.9MB
new 31.0MB
However, the new cache does shared the data better. With 2 branches imported:
old 5.7MB
new 31.0MB
At 10x the size, after ~10 branches of the same project, the new form will actually save disk space. (After just 8 branches of bzr, I was at 8.7MB new vs 9.7MB old.)
5) The code uses a threading.Lock during import. But the code already did that. It should be multithreaded for read access. It is possible that we'll have higher sqlite concurrency. For something like codebrowse, we may want to switch the back-end to Postgres.
Overall, I think this can be a big win for loggerhead, but I'm expecting to need a little bit of polishing until it can land. (Given that my other 4 branches haven't landed yet, either...)
[1] Time is just to get changes, w/o rendering. This is also my 'integration' branch which has all of the perf improvements (tag cache, only load 2 pages, etc.)
[2] Peak memory for initial import is 120MB, and there seems to be about 80MB that is 'left' in memory. I need to debug that part of bzr-history-db.
[3] There seems to be a 'soft' memory leak on the 'files' page. If I reload, memory goes up by about 30MB, until I reload a few times and it drops dramatically. My guess is that there is a cycle, preventing garbage collection. And my guess is the 'History.
Sorry about the noise, I must have somehow hit submit twice, and it kicked it back at me, so I thought it was an old merge proposal.