Storm

Merge lp://staging/~iffy/storm/lazycolumns into lp://staging/storm

lazycolumns
Merge into trunk

Proposed by Matt Haggard on 2011-12-29

Status:

Needs review

Proposed branch:

lp://staging/~iffy/storm/lazycolumns

Merge into:

lp://staging/storm

Diff against target:

516 lines (+203/-44)

10 files modified

TODO (+0/-26)
storm/expr.py (+8/-2)
storm/info.py (+22/-0)
storm/properties.py (+20/-6)
storm/store.py (+37/-8)
tests/info.py (+35/-1)
tests/store/base.py (+70/-1)
tests/store/mysql.py (+5/-0)
tests/store/postgres.py (+3/-0)
tests/store/sqlite.py (+3/-0)

To merge this branch:

bzr merge lp://staging/~iffy/storm/lazycolumns

Undecided

New

Link a bug report

Reviewer	Review Type	Date Requested	Status
Storm Developers		2011-12-29	Pending
Review via email: mp+87092@code.staging.launchpad.net

Description of the change

Implemented the lazy-loading feature as described in the TODO file. Instead of having separate "lazy" and "lazy_group" arguments, however, I opted for just requiring "lazy" which can be one of the following:
- True: The column should be lazy all by itself.
- True-ish values: The column should be grouped with all other columns using this same value. If any of them are accessed, all of them are loaded.

Rationale: We need this to speed up access of our bad tables with huge text fields that are rarely used.

Revision history for this message

Robert Collins (lifeless) wrote on 2011-12-31:

I'm glad you are working on this.

I have not done a full review at this point; I'd like to clarify
something though - it looks like you are making it so that all queries
will assume these columns are lazy. It is more efficient to grab the
column if needed in a single query rather than one-per-object. It
would be nice if users had the ability to control that.

For instance, say you have a changelog field in a table, which is a
big text field. Most pages in the website may not need it, but some
do. If its marked lazy, the pages that do need it will end up doing
(number of changelogs shown) separate queries to lazy-load the
changelog. If there is someway to say 'for this query, the changelog
field is needed', then that overhead can be eliminated.

And, if we have such a way to force laziness-into-eagerness, we
probably can do the reverse trivially - force eagerness into laziness
to prevent a bunch of fields being loaded on a query by query basis.

What do you think?

-Rob

Revision history for this message

Matt Haggard (iffy) wrote on 2012-01-03:

Rob,

I know what you mean. I'm working on this to fix a problem we currently have with a big table (126 fields). 90% of the time, we need about 4 fields; 10% of the time we need some combination of the others.

My current patch allows you to define column laziness globally, which improves 90% of our code. What you're suggesting is that in addition to (instead of?) defining laziness globally, you can override laziness when using a store.find, store.get, Reference or ReferenceSet? This would allow one to avoid double queries for the other 10%.

What would the syntax look like?

store.find(Foo, eager=(Foo.name, Foo.kind))
bar = Reference(bar_id, 'Bar.id', eager=('Bar.name', 'Bar.kind'))

Or would you do some kind of state thing (this tastes bad)?

Bar.eagers.push('name', 'kind')
store.find(Bar)
Bar.eagers.pop()

Maybe this is better:

store.eager_load(Bar.name, Bar.kind)
store.find(Bar)
...
store.lazy_load(Bar.name, Bar.kind)

Also, is laziness overriding (your suggestion) something that should be built in from the beginning, or can it be added as a new feature after global laziness (my patch) is added? Maybe the answer depends on the implementation.

Thanks,

Matt

Revision history for this message

Jamu Kakar (jkakar) wrote on 2012-01-03:

There are some notes in the TODO file that you might draw inspiration
from.

Revision history for this message

Matt Haggard (iffy) wrote on 2012-01-03:

> There are some notes in the TODO file that you might draw inspiration
> from.

Yes, those are great notes! This branch implements laziness as described there.

Revision history for this message

Robert Collins (lifeless) wrote on 2012-01-04:

On Wed, Jan 4, 2012 at 5:33 AM, Matt Haggard <email address hidden> wrote:
> Rob,
> Also, is laziness overriding (your suggestion) something that should be built in from the beginning, or can it be added as a new feature after global laziness (my patch) is added? Maybe the answer depends on the implementation.

I think adding it later is fine, but we should consider what it needs
early, so that we don't conflict with the global laziness thing, which
will itself be useful.

As for syntax, perhaps
# do not load attribute1 or attribute2, and load attribute3 which is
globally set to lazy
store.find(Load(Foo, lazy=['attribute1', 'attribute2'], eager=['attribute3'])) ?

Revision history for this message

Matt Haggard (iffy) wrote on 2012-01-05:

So what's the next step on this?

Revision history for this message

Robert Collins (lifeless) wrote on 2012-01-06:

On Fri, Jan 6, 2012 at 11:19 AM, Matt Haggard <email address hidden> wrote:
> So what's the next step on this?

Well, I think for you to spend a little time coming up with an answer to
'does the patch need to change to gracefully permit per-field laziness
in future'

if the answer is no, then we review and land; if it is yes, you make
such changes as you consider necessary to future proof, then we review
and land.

-Rob

Revision history for this message

Matt Haggard (iffy) wrote on 2012-01-07:

Rob,

No, this patch doesn't need to change to support per-query laziness in the future. Per-query laziness would use some of the stuff in this patch as it is. The overlap point is probably the LazyGroupMember class, which would probably be changed to Unloaded, NotLoadedYet, NotFetchedYet or Unfetched or something like that. But that's used internally and not in the public API.

Thanks for your help on this!

Matt

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2012-01-07:

Thanks a lot for pushing this forward.

Given the question on how the syntax should look like, I'm concerned the
spirit described in the notes isn't being implemented. It doesn't seem
practical to define per field granularity for what should be loaded.

I'll have a look at the branch, hopefully next week.
On Jan 3, 2012 2:33 PM, "Matt Haggard" <email address hidden> wrote:

> Rob,
>
> I know what you mean. I'm working on this to fix a problem we currently
> have with a big table (126 fields). 90% of the time, we need about 4
> fields; 10% of the time we need some combination of the others.
>
> My current patch allows you to define column laziness globally, which
> improves 90% of our code. What you're suggesting is that in addition to
> (instead of?) defining laziness globally, you can override laziness when
> using a store.find, store.get, Reference or ReferenceSet? This would allow
> one to avoid double queries for the other 10%.
>
> What would the syntax look like?
>
> store.find(Foo, eager=(Foo.name, Foo.kind))
> bar = Reference(bar_id, 'Bar.id', eager=('Bar.name', 'Bar.kind'))
>
> Or would you do some kind of state thing (this tastes bad)?
>
> Bar.eagers.push('name', 'kind')
> store.find(Bar)
> Bar.eagers.pop()
>
> Maybe this is better:
>
> store.eager_load(Bar.name, Bar.kind)
> store.find(Bar)
> ...
> store.lazy_load(Bar.name, Bar.kind)
>
> Also, is laziness overriding (your suggestion) something that should be
> built in from the beginning, or can it be added as a new feature after
> global laziness (my patch) is added? Maybe the answer depends on the
> implementation.
>
>
> Thanks,
>
> Matt
> --
> https://code.launchpad.net/~magmatt/storm/lazycolumns/+merge/87092
> You are subscribed to branch lp:storm.
>

Revision history for this message

Matt Haggard (iffy) wrote on 2012-01-13:

Gustavo,

I'm excited for your review.

To clarify, this branch implements laziness as described in the TODO. Rob mentioned that it would be nice to sometimes override that behavior per query. That (overriding per query) is not implemented here. So the syntax closely mirrors the TODO.

A single, lazy attr:

    class C(object):
        ...
        attr = Unicode(lazy=True)

or a group of lazy attrs (loaded together if either is accessed):

    class C(object):
        ...
        attr = Unicode(lazy=2)
        attr2 = Unicode(lazy=2)

    class D(object):
        ...
        attr = Int(lazy='group 1')
        attr2 = Int(lazy='group 2')

^ Is the syntax supported by this branch. I opted for a single "lazy" arg instead of having a "lazy" and "lazy_group." It seems simpler to me.

Unmerged revisions

431. By Matt Haggard on 2011-12-29: Added docstring for lazy attribute in expr.Column and removed laziness from the
TODO file
430. By Matt Haggard on 2011-12-29: Lazy-loaded columns will not overwrite a previously set value.
429. By Matt Haggard on 2011-12-29: Finished lazy-loading of columns including some Store tests.
428. By Matt Haggard on 2011-12-28: Added lazy keyword as __init__ to Property and changed ClassInfo to
gather appropriate laziness information from that.