Merge lp://staging/~gary/launchpad/bug723999 into lp://staging/launchpad

Proposed by Gary Poster
Status: Merged
Approved by: Gary Poster
Approved revision: no longer in the source branch.
Merged at revision: 12526
Proposed branch: lp://staging/~gary/launchpad/bug723999
Merge into: lp://staging/launchpad
Diff against target: 1058 lines (+474/-351)
5 files modified
lib/lp/bugs/model/bug.py (+15/-37)
lib/lp/bugs/model/bugtask.py (+14/-69)
lib/lp/bugs/model/structuralsubscription.py (+418/-237)
lib/lp/bugs/model/tests/test_bugtask.py (+5/-3)
lib/lp/bugs/tests/test_structuralsubscriptiontarget.py (+22/-5)
To merge this branch: bzr merge lp://staging/~gary/launchpad/bug723999
Reviewer Review Type Date Requested Status
j.c.sackett (community) Approve
Review via email: mp+52133@code.staging.launchpad.net

Commit message

[r=jcsackett][bug=723999] Reduce the SQL size and expense of calculating the structural subscribers for a given change.

Description of the change

This branch makes significant changes to how we calculate the structural subscribers for a given notification. The previous set-based approach was elegant and may have had similar or even slightly better performance characteristics in some simple cases; however it created gigantic SQL queries and performed poorly when there were a few bugtasks for a bug, and filters for every subscription.

This branch takes a more classic approach to building a SQL query. In tests, this brought a query that was taking 1.2-1.4 seconds to taking 300 milliseconds or less.

It also significantly reduced the SQL needed. SQL is 2/3 size in the simplest case with a single bugtask and no tags; 1/2 size in a more complex case with a single bugtask with tags. Once you have two tasks and on a bug with tags, the new approach's SQL is 1/3 the size, and as you add bugtasks, the ratio gets smaller and smaller.

It's worth noting that Storm support for the WITH statement would really help things. Then I would do one WITH table for the first query, and in the case of the UNION for the tags, I might do another WITH query for the shared bits. As it is, I'm trying to compromise between the cost of duplicating work in the database and the cost of making a separate Storm query. The WITH statement support would also shorten the size of the SQL. I'll be tempted to try and add that to Storm later, now that Postgres supports it.

I ended up having to use some somewhat esoteric SQL. In addition to having a pre-implementation call with Danilo about this, I also ran it past him after I was finished. He seemed pleased with explanations I gave verbally. I have copiously commented the core function, so I hope that you will be able to follow along there as well.

This is a big branch--noticeably bigger than our 800 line goal. "make lint" forced some of the size on me--it's happy, but at a cost. However, even before linting, I was still over the limit. I tried to keep it as small as possible, and have left out some clean-ups that need to happen. In particular, I tried to make minimal changes to the tests, so some methods that no longer serve any purpose in the codebase are left because they are exercised by valuable tests. I will follow up with one or more later branches that clean these things up and move tests around appropriately.

Danilo and I agreed that we might want to remove the features of "include_any_tags" (the filter only accepts any bug that has one or more tag) and "exclude_any_tags" (the filter only accepts bugs that have no tags); and we felt that it might be worth reconsidering how the excluded tags work. However, I kept those out of this branch, as out of scope.

That said, I did make one semantic change to an edge case. Before, NOT find_all_tags meant (in part) that excluded tags (==NOT include) must *all* match. I thought that was opposite what I would expect (and in fact initially implemented the other approach without realizing I was contradicting the tests). After my changes, NOT find_all_tags means that *any* excluded tag must match. For example, if a bug has the tags "foo" and "bar", a filter with find_all_tags = False and "-foo" (NOT include "foo") will now match because "bar" is not "foo". (If find_all_tags = True, the filter will not match. If find_all_tags = False and the filters tags are "-foo" and "-bar", it will not match). Again, I might like to reconsider the broader approach to this feature in the future, but this is not the branch, or the time for that.

I won't say more here, in the hopes that the comments will guide you.

Thank you for looking at this over-sized branch.

To post a comment you must log in.
Revision history for this message
j.c.sackett (jcsackett) wrote :
Download full text (32.4 KiB)

Gary--

This all largely looks good; I've got a couple of comments/questions in the diff below.

> === modified file 'lib/lp/bugs/model/bugtask.py'
> --- lib/lp/bugs/model/bugtask.py 2011-02-27 19:45:44 +0000
> +++ lib/lp/bugs/model/bugtask.py 2011-03-03 21:04:20 +0000
>
 <snip>
>
 For some reason, having one method just wrap a function from another module bugs me. I see no cleaner way of doing this though, so consider this useless kvetching.
>
> === modified file 'lib/lp/bugs/model/structuralsubscription.py'
> --- lib/lp/bugs/model/structuralsubscription.py 2011-02-21 15:55:48 +0000
> +++ lib/lp/bugs/model/structuralsubscription.py 2011-03-03 21:04:20 +0000
> @@ -471,243 +478,412 @@
>
> def getSubscriptionsForBugTask(self, bugtask, level):
> """See `IStructuralSubscriptionTarget`."""
> - set_builder = BugFilterSetBuilder(
> - bugtask, level, self.__helper.join)
> - return Store.of(self.__helper.pillar).find(
> - StructuralSubscription, In(
> - StructuralSubscription.id,
> - set_builder.subscriptions))
> + # Note that this method does not take into account
> + # structural subscriptions without filters. Since it is only
> + # used for tests at this point, that's not a problem; moreover,
> + # we intend all structural subscriptions to have filters.
> + candidates, filter_id_query = (
> + _get_structural_subscription_filter_id_query(
> + bugtask.bug, [bugtask], level))
> + if not candidates:
> + return EmptyResultSet()
> + return IStore(StructuralSubscription).find(
> + StructuralSubscription,
> + BugSubscriptionFilter.structural_subscription_id ==
> + StructuralSubscription.id,
> + In(BugSubscriptionFilter.id,
> + filter_id_query)).config(distinct=True)
> +
> +
> +def get_structural_subscription_targets(bugtasks):
> + """Return (bugtask, target) pairs for each target of the bugtasks.
> +
> + Each bugtask may be responsible theoretically for 0 or more targets.
> + In practice, each generates one, two or three.
> + """
> + for bugtask in bugtasks:
> + if IStructuralSubscriptionTarget.providedBy(bugtask.target):
> + yield (bugtask, bugtask.target)
> + if bugtask.target.parent_subscription_target is not None:
> + yield (bugtask, bugtask.target.parent_subscription_target)
> + if ISourcePackage.providedBy(bugtask.target):
> + # Distribution series bug tasks with a package have the source
> + # package set as their target, so we add the distroseries
> + # explicitly to the set of subscription targets.
> + yield (bugtask, bugtask.distroseries)
> + if bugtask.milestone is not None:
> + yield (bugtask, bugtask.milestone)
>
 So, if this is only one to three returns per bugtask, is there a combination of yields here that can't happen?
>
> +def _get_all_structural_subscriptions(find, targets, *conditions):
> + """Find the structural subscriptions for the given targets.
> +
> + "find" should be w...

review: Needs Information
Revision history for this message
Gary Poster (gary) wrote :
Download full text (36.2 KiB)

On Mar 3, 2011, at 6:14 PM, j.c.sackett wrote:

> Review: Needs Information
> Gary--
>
> This all largely looks good; I've got a couple of comments/questions in the diff below.
>
>
>> === modified file 'lib/lp/bugs/model/bugtask.py'
>> --- lib/lp/bugs/model/bugtask.py 2011-02-27 19:45:44 +0000
>> +++ lib/lp/bugs/model/bugtask.py 2011-03-03 21:04:20 +0000
>>
> <snip>
>>
> For some reason, having one method just wrap a function from another module bugs me. I see no cleaner way of doing this though, so consider this useless kvetching.

I completely agree, on both counts.

I have an idea on how to make it better for the second branch, but no guarantees.

>>
>> === modified file 'lib/lp/bugs/model/structuralsubscription.py'
>> --- lib/lp/bugs/model/structuralsubscription.py 2011-02-21 15:55:48 +0000
>> +++ lib/lp/bugs/model/structuralsubscription.py 2011-03-03 21:04:20 +0000
>> @@ -471,243 +478,412 @@
>>
>> def getSubscriptionsForBugTask(self, bugtask, level):
>> """See `IStructuralSubscriptionTarget`."""
>> - set_builder = BugFilterSetBuilder(
>> - bugtask, level, self.__helper.join)
>> - return Store.of(self.__helper.pillar).find(
>> - StructuralSubscription, In(
>> - StructuralSubscription.id,
>> - set_builder.subscriptions))
>> + # Note that this method does not take into account
>> + # structural subscriptions without filters. Since it is only
>> + # used for tests at this point, that's not a problem; moreover,
>> + # we intend all structural subscriptions to have filters.
>> + candidates, filter_id_query = (
>> + _get_structural_subscription_filter_id_query(
>> + bugtask.bug, [bugtask], level))
>> + if not candidates:
>> + return EmptyResultSet()
>> + return IStore(StructuralSubscription).find(
>> + StructuralSubscription,
>> + BugSubscriptionFilter.structural_subscription_id ==
>> + StructuralSubscription.id,
>> + In(BugSubscriptionFilter.id,
>> + filter_id_query)).config(distinct=True)
>> +
>> +
>> +def get_structural_subscription_targets(bugtasks):
>> + """Return (bugtask, target) pairs for each target of the bugtasks.
>> +
>> + Each bugtask may be responsible theoretically for 0 or more targets.
>> + In practice, each generates one, two or three.
>> + """
>> + for bugtask in bugtasks:
>> + if IStructuralSubscriptionTarget.providedBy(bugtask.target):
>> + yield (bugtask, bugtask.target)
>> + if bugtask.target.parent_subscription_target is not None:
>> + yield (bugtask, bugtask.target.parent_subscription_target)
>> + if ISourcePackage.providedBy(bugtask.target):
>> + # Distribution series bug tasks with a package have the source
>> + # package set as their target, so we add the distroseries
>> + # explicitly to the set of subscription targets.
>> + yield (bugtask, bugtask.distroseries)
>> + if bugtask.milestone is not None:
>> + yield (bugtask, bugtask.milestone)
>>
> So, if this is on...

Revision history for this message
j.c.sackett (jcsackett) wrote :

Re: the three vs four, I'm fine leaving it as is, I was just confused. A quick comment wouldn't suck, but I'm not a stickler.

Re: breaking the huge function into little bits. I think you're probably right about it not being doable. I might agree with you on the ORM vs raw SQL design patterns, but let's take that discussion out of this medium and to irc or email.

For everything else, thanks for agreeing to the changes and thanks for this impressive branch.

review: Approve
Revision history for this message
Gary Poster (gary) wrote :

Thank you for the review!

bug 728818

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.