Merge lp://staging/~jtv/storm/profile-fetches into lp://staging/storm

Proposed by Jeroen T. Vermeulen
Status: Needs review
Proposed branch: lp://staging/~jtv/storm/profile-fetches
Merge into: lp://staging/storm
Diff against target: 1109 lines (+903/-7) (has conflicts)
8 files modified
storm/database.py (+1/-0)
storm/fetch_profile.py (+255/-0)
storm/references.py (+12/-4)
storm/store.py (+97/-3)
tests/fetch_context.py (+164/-0)
tests/fetch_profile.py (+64/-0)
tests/fetch_statistics.py (+99/-0)
tests/store/base.py (+211/-0)
Text conflict in tests/store/base.py
To merge this branch: bzr merge lp://staging/~jtv/storm/profile-fetches
Reviewer Review Type Date Requested Status
Storm Developers Pending
Storm Developers Pending
Review via email: mp+43323@code.staging.launchpad.net

Description of the change

= Fetch-Profiling =

Profile dependencies between object fetches from the database.

This work has been raised on the launchpad-dev mailing list, and in a later stage, discussed with Jamu Kakar, Stuart Bishop, and others.

== The problem ==

Profiling will help map out and optimize data needs. For instance, consider this loop:

    def get_x_for(item):
        return item.other_object.x

# ...

    for item in query_items():
        total += get_x_for(item)

This will fetch item.other_object individually for each item coming out of query_items(). It's a common anti-pattern in ORM performance, and easy enough to optimize: just outer-join item.other_object inside query_items, so that it will already be in cache when get_x_for needs it.

Keeping track of all such dependencies is tedious, brittle, and a source of major abstraction leaks. Conventional ways of dealing with them involve profiling, analysis, matching query patterns to code paths, mapping out data requirements, and identifying downside risks of optimization. The result is a single "point-solution" fix. The effort produces experience as a side effect, but transfer of such experience from one human to others is relatively ineffective.

Then, after that's all done and the code has been optimized, it's difficult to keep track of which optimizations are still relevant. Code is alive, and the more intricate and beautiful an optimization is, the easier it is to break. Testing for such "semantically-neutral" breakage is often difficult, and monitoring the relevance of the tests themselves can be costly.

Refactorings in particular raise questions: will I need to port this optimization to the new structure? Will I still be hitting the right indexes? Could the new structure make the optimization unnecessary or less relevant? Am I prefetching a lot of objects that I don't need? And after I've made all those choices, how can I compare the new code's performance to the old code's performance as it's been tuned over time?

Fetch-profiling brings us closer to solving all those problems, but be patient. For now, it simplifies the mapping of data requirements by eliminating the tracing and the matching of query patterns to code paths. Read on for where we go next.

== Visibility improvements ==

Profiling would expose the problem pattern in the example very clearly. After running through the loop a few times, you'd inspect the store's statistics. The statistics will tell you how many item.other_object instances were loaded from the database (for "item"s returned by query_items) as well as how this number compares to the number of "item"s loaded by query_items, as a percentage.

The highest "item" numbers will identify the places most in need of pre-fetching optimizations. Among those, the highest percentages of "item.other_object" loads identify the places where simple prejoins are most likely to be beneficial. Lower percentages may indicate that many of the foreign-key references are null, or that most of the objects they refer to are already covered by other caching, or that most of the objects you might prefetch in the query would be irrelevant.

== Future improvements ==

This is phase 1 of a proposed multi-phase development. For phase 2 I'd like to automate the prefetching so that code like this (using features layered on top of the existing Storm API) will optimize itself, without requiring manual tweaking. After that come policy tuning and automated context definition (see below).

With automated prefetching, the most basic optimizations will no longer be specified in the application. They will work themselves out automatically based on feedback from a real, running production system.

It is at that point where the big problems resolve themselves. After a code change the individual optimizations will re-apply themselves as appropriate. There is no need to track their relevance manually.

== Concepts ==

Cast of characters:
 • A "fetch" is the retrieval and de-marshalling of an object from the database. Reading just a foreign key (e.g. to check for NULL) is not a fetch from the table it refers to; neither is retrieving an object from cache.
 • A "fetch context" is some indicator of what part of the application is executing a query. To the application, this is managed as a "call stack" of strings.
 • An "original fetch" is a free-form query, as performed using Store.find or Store.execute.
 • A "fetch origin" (within a context) is a class participating in an original fetch.
 • A "derived fetch" is the retrieval of objects that are clearly derived (directly or indirectly) from an original fetch through a chain of reference.

In the example loop, query_items() would contain at least an original fetch. The reference to item.other_object inside get_x_for is a derived fetch (derived directly from the original fetch, as it happens). Derived fetches can also be tracked across stores.

There's only room for one fetch context in this example, since derived fetches are associated with the same context as their original fetches. In a web application, the most useful context would probably be the request type, but for detailed optimization you'll want more fine-grained contexts. The typical ideal granularity for automated optimization would be just one original query per context.

Contexts form a hierarchy so as to suit all these use cases, as well as "drilling down" during analysis of data requirements. A context manager helps mark regions of code as being a specific context. Another idea would be a decorator (probably at the application level though, where it's easier to find the right store) and an optional argument to find() that selects a context for just one query.

Original fetches are identified by the fetched class as well as the context. This makes it possible to associate derived fetches with individual classes in a join, and track their dependent fetches separately.

== Implementation notes ==

I'm not planning to map out full dependency chains from fetch origins to derived fetches for now; that would probably become too costly. We'd have to see how useful that information is in practice.

You may note how fetch_context is tracked in Stores, in Results, and in ObjectInfos. The reason for this is that objects may be fetched from a result set long after the store has moved on to a different context. An object fetch should be associated with the context that the result set was produced in, which in turn is the context the store was in at the time.

All interesting analysis and optimization work will be done outside the performance-critical path of query execution. Profiling costs should be minimal, limited to simple dict lookups and counter increments.

Jeroen

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

Hi Jeroen, thanks for pointing me at this.

I think this is a very interesting project. I think the stats will be useful for manual optimisation in the short term in Launchpad.

As far as the autotuning goes long term, the jit-vm style learn-and-improve doesn't interest me for Launchpad : https://dev.launchpad.net/LEP/PersistenceLayer contains my broad plans for addressing systematic performance issues in Launchpad. I think a jit-vm auto tuning layer would be a fascinating project, but the warm-up time in many JIT's can be substantial, and is only ameliorated by loops running hundreds or thousands of times : and still at best only approaches the efficiency available by writing in a more efficient language. Thus my interest in providing a more efficient DSL than storms bound-object approach. I'd love to see storm become radically simpler in aid of that: faster short circuits in object caching - optionally no object caching at all. Constrained and parameterised references would be awesome too.

Cheers,
Rob

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

I'd be more careful in assuming similarity with a JIT VM. Differences in relative overhead aside, JIT compilers don't specialize method calls much. Actually there is a JVM that combines JIT with inlining of all code, but from what I hear that actually yields fantastic results. A lot of the startup overhead would be in the inlining, which doesn't come up per se in fetch profiling.

As an example of specialization, if Launchpad used automatic optimization based on this profiler, a given query method somewhere deep down the call stack gets optimized separately when used from the web service API or when used from the web UI. The two calls have very different needs. We don't have any decent solution for that at the moment.

If you're willing to be as aggressive in fetching objects as to do it "statically," then you might as well use automatic optimization with a warmup time of 1: generate optimization advice after a first pass through a stretch of code, then repeat periodically to cover any objects that may also be needed but weren't referenced in that first run. To amortize startup cost over more requests, pickle the optimization choices and presto: profile-driven optimizations get reused across restarts.

Separately from that, the term "efficient" is treacherous. A static approach is almost guaranteed to be more efficient in terms of computing power, yes, but less efficient when it comes to the human factors: flexibility, legibility, conceptual cleanliness. (Isn't that why we're using python in the first place?) A dynamic approach on the other hand can narrow the gap in computational efficiency without sacrificing any of those other efficiencies. It's also easier to deploy and fine-tune such optimizations across the entire application.

Jeroen

419. By Jeroen T. Vermeulen

Don't support derived_from without a known reference.

420. By Jeroen T. Vermeulen

Record origin, source, and reference; ignore cross-store dependencies.

421. By Jeroen T. Vermeulen

Cosmetic.

422. By Jeroen T. Vermeulen

Documentation; made is_root a @property.

Unmerged revisions

422. By Jeroen T. Vermeulen

Documentation; made is_root a @property.

421. By Jeroen T. Vermeulen

Cosmetic.

420. By Jeroen T. Vermeulen

Record origin, source, and reference; ignore cross-store dependencies.

419. By Jeroen T. Vermeulen

Don't support derived_from without a known reference.

418. By Jeroen T. Vermeulen

Aggregate stats by context name; nicer cumulate API.

417. By Jeroen T. Vermeulen

Context iteration.

416. By Jeroen T. Vermeulen

Test add_to_dict separately.

415. By Jeroen T. Vermeulen

Move profiling functions out of Store.

414. By Jeroen T. Vermeulen

Context manager.

413. By Jeroen T. Vermeulen

Track derived fetches across stores.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: