Merge lp://staging/~iffy/storm/lazycolumns into lp://staging/storm

Proposed by Matt Haggard
Status: Needs review
Proposed branch: lp://staging/~iffy/storm/lazycolumns
Merge into: lp://staging/storm
Diff against target: 516 lines (+203/-44)
10 files modified
TODO (+0/-26)
storm/expr.py (+8/-2)
storm/info.py (+22/-0)
storm/properties.py (+20/-6)
storm/store.py (+37/-8)
tests/info.py (+35/-1)
tests/store/base.py (+70/-1)
tests/store/mysql.py (+5/-0)
tests/store/postgres.py (+3/-0)
tests/store/sqlite.py (+3/-0)
To merge this branch: bzr merge lp://staging/~iffy/storm/lazycolumns
Reviewer Review Type Date Requested Status
Storm Developers Pending
Review via email: mp+87092@code.staging.launchpad.net

Description of the change

Implemented the lazy-loading feature as described in the TODO file. Instead of having separate "lazy" and "lazy_group" arguments, however, I opted for just requiring "lazy" which can be one of the following:
 - True: The column should be lazy all by itself.
 - True-ish values: The column should be grouped with all other columns using this same value. If any of them are accessed, all of them are loaded.

Rationale: We need this to speed up access of our bad tables with huge text fields that are rarely used.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

I'm glad you are working on this.

I have not done a full review at this point; I'd like to clarify
something though - it looks like you are making it so that all queries
will assume these columns are lazy. It is more efficient to grab the
column if needed in a single query rather than one-per-object. It
would be nice if users had the ability to control that.

For instance, say you have a changelog field in a table, which is a
big text field. Most pages in the website may not need it, but some
do. If its marked lazy, the pages that do need it will end up doing
(number of changelogs shown) separate queries to lazy-load the
changelog. If there is someway to say 'for this query, the changelog
field is needed', then that overhead can be eliminated.

And, if we have such a way to force laziness-into-eagerness, we
probably can do the reverse trivially - force eagerness into laziness
to prevent a bunch of fields being loaded on a query by query basis.

What do you think?

-Rob

Revision history for this message
Matt Haggard (iffy) wrote :

Rob,

I know what you mean. I'm working on this to fix a problem we currently have with a big table (126 fields). 90% of the time, we need about 4 fields; 10% of the time we need some combination of the others.

My current patch allows you to define column laziness globally, which improves 90% of our code. What you're suggesting is that in addition to (instead of?) defining laziness globally, you can override laziness when using a store.find, store.get, Reference or ReferenceSet? This would allow one to avoid double queries for the other 10%.

What would the syntax look like?

store.find(Foo, eager=(Foo.name, Foo.kind))
bar = Reference(bar_id, 'Bar.id', eager=('Bar.name', 'Bar.kind'))

Or would you do some kind of state thing (this tastes bad)?

Bar.eagers.push('name', 'kind')
store.find(Bar)
Bar.eagers.pop()

Maybe this is better:

store.eager_load(Bar.name, Bar.kind)
store.find(Bar)
...
store.lazy_load(Bar.name, Bar.kind)

Also, is laziness overriding (your suggestion) something that should be built in from the beginning, or can it be added as a new feature after global laziness (my patch) is added? Maybe the answer depends on the implementation.

Thanks,

Matt

Revision history for this message
Jamu Kakar (jkakar) wrote :

There are some notes in the TODO file that you might draw inspiration
from.

Revision history for this message
Matt Haggard (iffy) wrote :

> There are some notes in the TODO file that you might draw inspiration
> from.

Yes, those are great notes! This branch implements laziness as described there.

Revision history for this message
Robert Collins (lifeless) wrote :

On Wed, Jan 4, 2012 at 5:33 AM, Matt Haggard <email address hidden> wrote:
> Rob,
> Also, is laziness overriding (your suggestion) something that should be built in from the beginning, or can it be added as a new feature after global laziness (my patch) is added?  Maybe the answer depends on the implementation.

I think adding it later is fine, but we should consider what it needs
early, so that we don't conflict with the global laziness thing, which
will itself be useful.

As for syntax, perhaps
# do not load attribute1 or attribute2, and load attribute3 which is
globally set to lazy
store.find(Load(Foo, lazy=['attribute1', 'attribute2'], eager=['attribute3'])) ?

Revision history for this message
Matt Haggard (iffy) wrote :

So what's the next step on this?

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, Jan 6, 2012 at 11:19 AM, Matt Haggard <email address hidden> wrote:
> So what's the next step on this?

Well, I think for you to spend a little time coming up with an answer to
'does the patch need to change to gracefully permit per-field laziness
in future'

if the answer is no, then we review and land; if it is yes, you make
such changes as you consider necessary to future proof, then we review
and land.

-Rob

Revision history for this message
Matt Haggard (iffy) wrote :

Rob,

No, this patch doesn't need to change to support per-query laziness in the future. Per-query laziness would use some of the stuff in this patch as it is. The overlap point is probably the LazyGroupMember class, which would probably be changed to Unloaded, NotLoadedYet, NotFetchedYet or Unfetched or something like that. But that's used internally and not in the public API.

Thanks for your help on this!

Matt

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Thanks a lot for pushing this forward.

Given the question on how the syntax should look like, I'm concerned the
spirit described in the notes isn't being implemented. It doesn't seem
practical to define per field granularity for what should be loaded.

I'll have a look at the branch, hopefully next week.
On Jan 3, 2012 2:33 PM, "Matt Haggard" <email address hidden> wrote:

> Rob,
>
> I know what you mean. I'm working on this to fix a problem we currently
> have with a big table (126 fields). 90% of the time, we need about 4
> fields; 10% of the time we need some combination of the others.
>
> My current patch allows you to define column laziness globally, which
> improves 90% of our code. What you're suggesting is that in addition to
> (instead of?) defining laziness globally, you can override laziness when
> using a store.find, store.get, Reference or ReferenceSet? This would allow
> one to avoid double queries for the other 10%.
>
> What would the syntax look like?
>
> store.find(Foo, eager=(Foo.name, Foo.kind))
> bar = Reference(bar_id, 'Bar.id', eager=('Bar.name', 'Bar.kind'))
>
> Or would you do some kind of state thing (this tastes bad)?
>
> Bar.eagers.push('name', 'kind')
> store.find(Bar)
> Bar.eagers.pop()
>
> Maybe this is better:
>
> store.eager_load(Bar.name, Bar.kind)
> store.find(Bar)
> ...
> store.lazy_load(Bar.name, Bar.kind)
>
> Also, is laziness overriding (your suggestion) something that should be
> built in from the beginning, or can it be added as a new feature after
> global laziness (my patch) is added? Maybe the answer depends on the
> implementation.
>
>
> Thanks,
>
> Matt
> --
> https://code.launchpad.net/~magmatt/storm/lazycolumns/+merge/87092
> You are subscribed to branch lp:storm.
>

Revision history for this message
Matt Haggard (iffy) wrote :

Gustavo,

I'm excited for your review.

To clarify, this branch implements laziness as described in the TODO. Rob mentioned that it would be nice to sometimes override that behavior per query. That (overriding per query) is not implemented here. So the syntax closely mirrors the TODO.

A single, lazy attr:

    class C(object):
        ...
        attr = Unicode(lazy=True)

or a group of lazy attrs (loaded together if either is accessed):

    class C(object):
        ...
        attr = Unicode(lazy=2)
        attr2 = Unicode(lazy=2)

    class D(object):
        ...
        attr = Int(lazy='group 1')
        attr2 = Int(lazy='group 2')

^ Is the syntax supported by this branch. I opted for a single "lazy" arg instead of having a "lazy" and "lazy_group." It seems simpler to me.

Unmerged revisions

431. By Matt Haggard

Added docstring for lazy attribute in expr.Column and removed laziness from the
TODO file

430. By Matt Haggard

Lazy-loaded columns will not overwrite a previously set value.

429. By Matt Haggard

Finished lazy-loading of columns including some Store tests.

428. By Matt Haggard

Added lazy keyword as __init__ to Property and changed ClassInfo to
gather appropriate laziness information from that.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: