Merge lp://staging/~adeuring/launchpad/bug-594247 into lp://staging/launchpad

Proposed by Abel Deuring
Status: Merged
Approved by: Gavin Panella
Approved revision: no longer in the source branch.
Merged at revision: 11933
Proposed branch: lp://staging/~adeuring/launchpad/bug-594247
Merge into: lp://staging/launchpad
Diff against target: 495 lines (+233/-121)
2 files modified
lib/lp/bugs/model/bugtask.py (+166/-117)
lib/lp/bugs/tests/test_bugtask_search.py (+67/-4)
To merge this branch: bzr merge lp://staging/~adeuring/launchpad/bug-594247
Reviewer Review Type Date Requested Status
Gavin Panella (community) code Approve
Graham Binns (community) code Approve
Review via email: mp+40713@code.staging.launchpad.net

Commit message

[r=allenap,gmb][ui=none][bug=594247] use JOIN and LEFT JOIN to link tables needed to search for bugtask releated to a structural subscriber; refactoring of BugTaskSet.search() and BugTaskSet.serachBugIds(); more tests for BugTaskSet.search()

Description of the change

This branch fixes bug 594247: searchTasks with structural_subscriber
times out regularly.

The fix uses stub's suggestion from a comment in the bug report
to (LEFT) JOIN the tables Product and StructuralSubscription and
to define the filter clause "StructuralSubscriber=somebody"
directly in the main query.

This required some larger modification of the method
BugTaskSet.buildQuery(): Up to now, it returned four variables,
query, clauseTables, orderby_arg, decorator. query contains a string
with the WHERE clause, clauseTables contains a list of tables
used in the WHERE clause.

Simply adding Storm Join instances for

  LEFT OUTER JOIN Product ON BugTask.product=Product.id

and

  JOIN StructuralSubscription ON (some_longer_expression)

to clauseTables has at least one problem: The method BugTaskSet.search()
optionally joins some tables, including Product, in order to pre-load
some objects that will be needed later to process a request.

So it could happen that the table Product is JOINed twice: The first time
in clauseTables because it is needed in the WHERE expression for
structural subscription, the second time to pre-load Storm Product
instances.

Hence I did not add the Storm Join instances to clauseTables; instead,
BugTaskSet.buildQuery() now returns another parameter join_tables,
which contains a sequence of tuples (storm_table, join_expression).

A new method BugTaskSet.buildOrigin() now processes join_expression and
clauseTables together with a sequence of Join instances needed to
pre-load Product, Bug, SourcePackageName instances, so that each
table appears exactly once in the WHERE clause of the SQL query.

The new way to find bugtasks related to a structural subscription
can return bug tasks twice, for example, if somebody has
a structural subscription to a distribution and to a distro source
package. So BugTaskSet.buildQuery() now returns one more parameter,
has_duplicate_results.

BugTaskSet.buildQuery() was called by BugTaskSet.search() and
by BugTaskSet.searchBugIds(). Processing the data returned by
buildQuery() became a bit more complicated, and BugTaskSet.search()
could already create two variants of the SQL query: one to retrieve
only BugTask instances and another to retrieve related Product,
Bug, SourcePackageName instances. We can consider searchBugIds()
to be variant of search() which just defines another type of data to
retrieve, so the SQL query is now built in a new method _search()
which is called by search() and by searchBugIds().

This required another tweak: BugTaskSet.buildQuery() returns a
decorator for the result set whichs avoids later SQL queries
ensuring that the current user is allowed to view a bug. (see
lp.bugs.model.bugtask.get_bug_privacy_filter_with_decorator())

This decorator assumes that the Storm query returns BugTask
instances, which is not the case for BugTaskSet.searchBugIds().
So BugTaskSet._search() has an optional parameter "undecorated".
If it is True, _search() returns a plain Storm result set,
otherwise, it returns a DecoratedResultSet.

Treatment of queries which may have duplicate result rows
("if has_duplicate_result_rows:" in _search()): We cannot simply
call resultset.config(distinct=True) because we may get two rows
for one bugtask if somebody is subscribed to a distribution and
to a distro source package and if we preload SourcePackageName.

Using a DISTINCT ON BugTask.id query would require sorting by
BugTask.id, so I simply used a sub-query.

The part of BugTaskSet.search() (now in _search()) which handles
queries with more that one BugTaskSearchParams instance now
preloads/prejoins Bug, Product etc only if _noprejoins is False.

Finally, I added some more tests to test_bugtask_search.py

no lint.

test: ./bin/test -vvt test_bugtask_search

It might make sense to change BugTaskSet.getAssignedMilestonesFromSearch()
in a follow-up brnach so that it calls _search() directly,
instead of calling search(), extracting milestone IDs
from the result set and running another SQL query to retrieve
these milestones.

To post a comment you must log in.
Revision history for this message
Graham Binns (gmb) wrote :

Hi Abel,

As we discussed on IRC, I think you should get a review from someone more SQL / Storm savvy than I, since my brain seems to just skip off those parts of the code.

I'm happy with the rest of the diff, with just two comments:

> 94 + # Circular.

Please expand this to something less terse ;). ("Prevent circular
import problems." or something similar.)

> 319 + # Circular.

Here too.

review: Approve (code)
Revision history for this message
Gavin Panella (allenap) wrote :
Download full text (5.8 KiB)

Cool :) Lots of comments, but none of them are very important.

[1]

+ Or(BugTask.sourcepackagenameID ==
+ StructuralSubscription.sourcepackagenameID,
+ StructuralSubscription.sourcepackagename == None)))

Indentation is a little confusing there.

[2]

+ join_clause = (
+ BugTask.productID == StructuralSubscription.productID)
+ join_clause = Or(
+ join_clause,
+ (BugTask.productseriesID ==
+ StructuralSubscription.productseriesID))
+ # Circular.
+ from lp.registry.model.product import Product
+ join_clause = Or(
+ join_clause,
+ And(Product.projectID == StructuralSubscription.projectID,
+ BugTask.product == Product.id))
+ join_clause = Or(
+ join_clause,
+ And((BugTask.distributionID ==
+ StructuralSubscription.distributionID),
+ Or(BugTask.sourcepackagenameID ==
+ StructuralSubscription.sourcepackagenameID,
+ StructuralSubscription.sourcepackagename == None)))
+ join_clause = Or(
+ join_clause,
+ (BugTask.distroseriesID ==
+ StructuralSubscription.distroseriesID))
+ join_clause = Or(
+ join_clause,
+ BugTask.milestoneID == StructuralSubscription.milestoneID)

I had trouble reading this, so I refactored it to use more descriptive
names. If you like what I've done then I think it would be a good
change to make:

    ssub_match_product = (
        BugTask.productID ==
        StructuralSubscription.productID)
    ssub_match_productseries = (
        BugTask.productseriesID ==
        StructuralSubscription.productseriesID)
    ssub_match_project = And(
        Product.projectID ==
        StructuralSubscription.projectID,
        BugTask.product == Product.id)
    ssub_match_distribution = (
        BugTask.distributionID ==
        StructuralSubscription.distributionID)
    ssub_match_sourcepackagename = (
        BugTask.sourcepackagenameID ==
        StructuralSubscription.sourcepackagenameID)
    ssub_match_null_sourcepackagename = (
        StructuralSubscription.sourcepackagename == None)
    ssub_match_distribution_with_optional_package = And(
        ssub_match_distribution, Or(
            ssub_match_sourcepackagename,
            ssub_match_null_sourcepackagename))
    ssub_match_distribution_series = (
        BugTask.distroseriesID ==
        StructuralSubscription.distroseriesID)
    ssub_match_milestone = (
        BugTask.milestoneID ==
        StructuralSubscription.milestoneID)

    join_clause = Or(
        ssub_match_product,
        ssub_match_productseries,
        ssub_match_project,
        ssub_match_distribution_with_optional_package,
        ssub_match_distribution_series,
        ssub_match_milestone)

[3]

+ query, clauseTables, orderby_arg, decorator, join_tables,

Just because everything else seems to not be camel-case any more,
consider doing s/clauseTables/clause_...

Read more...

review: Approve (code)
Revision history for this message
Abel Deuring (adeuring) wrote :
Download full text (7.7 KiB)

Hi Gavin,

thanks for your thorough review!

On 12.11.2010 17:29, Gavin Panella wrote:
> Review: Approve code
> Cool :) Lots of comments, but none of them are very important.
>
>
> [1]
>
> + Or(BugTask.sourcepackagenameID ==
> + StructuralSubscription.sourcepackagenameID,
> + StructuralSubscription.sourcepackagename == None)))
>
> Indentation is a little confusing there.

yeah... But your proposal below to make the huge join clause more
readable fixes it already :)

>
>
> [2]
>
> + join_clause = (
> + BugTask.productID == StructuralSubscription.productID)
> + join_clause = Or(
> + join_clause,
> + (BugTask.productseriesID ==
> + StructuralSubscription.productseriesID))
> + # Circular.
> + from lp.registry.model.product import Product
> + join_clause = Or(
> + join_clause,
> + And(Product.projectID == StructuralSubscription.projectID,
> + BugTask.product == Product.id))
> + join_clause = Or(
> + join_clause,
> + And((BugTask.distributionID ==
> + StructuralSubscription.distributionID),
> + Or(BugTask.sourcepackagenameID ==
> + StructuralSubscription.sourcepackagenameID,
> + StructuralSubscription.sourcepackagename == None)))
> + join_clause = Or(
> + join_clause,
> + (BugTask.distroseriesID ==
> + StructuralSubscription.distroseriesID))
> + join_clause = Or(
> + join_clause,
> + BugTask.milestoneID == StructuralSubscription.milestoneID)
>
> I had trouble reading this, so I refactored it to use more descriptive
> names. If you like what I've done then I think it would be a good
> change to make:
>
> ssub_match_product = (
> BugTask.productID ==
> StructuralSubscription.productID)
> ssub_match_productseries = (
> BugTask.productseriesID ==
> StructuralSubscription.productseriesID)
> ssub_match_project = And(
> Product.projectID ==
> StructuralSubscription.projectID,
> BugTask.product == Product.id)
> ssub_match_distribution = (
> BugTask.distributionID ==
> StructuralSubscription.distributionID)
> ssub_match_sourcepackagename = (
> BugTask.sourcepackagenameID ==
> StructuralSubscription.sourcepackagenameID)
> ssub_match_null_sourcepackagename = (
> StructuralSubscription.sourcepackagename == None)
> ssub_match_distribution_with_optional_package = And(
> ssub_match_distribution, Or(
> ssub_match_sourcepackagename,
> ssub_match_null_sourcepackagename))
> ssub_match_distribution_series = (
> BugTask.distroseriesID ==
> StructuralSubscription.distroseriesID)
> ssub_match_milestone = (
> BugTask.milestoneID ==
> StructuralSubscription.milestoneID)
>
> join_clause = Or(
> ssub_match_product,
> ...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

"
This decorator assumes that the Storm query returns BugTask
instances, which is not the case for BugTaskSet.searchBugIds().
So BugTaskSet._search() has an optional parameter "undecorated".
If it is True, _search() returns a plain Storm result set,
otherwise, it returns a DecoratedResultSet."

I have a thought here: the decorator is meant to be opaque. So why not
just return an appropriate one for the query being constructed?

I guess I'm saying that 'undecorated' is the what, not the why, and
some other approach might be clearer.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.