Launchpad itself

Merge lp://staging/~gary/launchpad/bug723999 into lp://staging/launchpad

bug723999
Merge into devel

Proposed by Gary Poster on 2011-03-03

Status:

Merged

Approved by:

Gary Poster on 2011-03-04

Approved revision:

no longer in the source branch.

Merged at revision:

12526

Proposed branch:

lp://staging/~gary/launchpad/bug723999

Merge into:

lp://staging/launchpad

Diff against target:

1058 lines (+474/-351)

5 files modified

lib/lp/bugs/model/bug.py (+15/-37)
lib/lp/bugs/model/bugtask.py (+14/-69)
lib/lp/bugs/model/structuralsubscription.py (+418/-237)
lib/lp/bugs/model/tests/test_bugtask.py (+5/-3)
lib/lp/bugs/tests/test_structuralsubscriptiontarget.py (+22/-5)

To merge this branch:

bzr merge lp://staging/~gary/launchpad/bug723999

Critical

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
j.c.sackett (community)		2011-03-03	Approve on 2011-03-04
Review via email: mp+52133@code.staging.launchpad.net

Commit message

[r=jcsackett][bug=723999] Reduce the SQL size and expense of calculating the structural subscribers for a given change.

Description of the change

This branch makes significant changes to how we calculate the structural subscribers for a given notification. The previous set-based approach was elegant and may have had similar or even slightly better performance characteristics in some simple cases; however it created gigantic SQL queries and performed poorly when there were a few bugtasks for a bug, and filters for every subscription.

This branch takes a more classic approach to building a SQL query. In tests, this brought a query that was taking 1.2-1.4 seconds to taking 300 milliseconds or less.

It also significantly reduced the SQL needed. SQL is 2/3 size in the simplest case with a single bugtask and no tags; 1/2 size in a more complex case with a single bugtask with tags. Once you have two tasks and on a bug with tags, the new approach's SQL is 1/3 the size, and as you add bugtasks, the ratio gets smaller and smaller.

It's worth noting that Storm support for the WITH statement would really help things. Then I would do one WITH table for the first query, and in the case of the UNION for the tags, I might do another WITH query for the shared bits. As it is, I'm trying to compromise between the cost of duplicating work in the database and the cost of making a separate Storm query. The WITH statement support would also shorten the size of the SQL. I'll be tempted to try and add that to Storm later, now that Postgres supports it.

I ended up having to use some somewhat esoteric SQL. In addition to having a pre-implementation call with Danilo about this, I also ran it past him after I was finished. He seemed pleased with explanations I gave verbally. I have copiously commented the core function, so I hope that you will be able to follow along there as well.

This is a big branch--noticeably bigger than our 800 line goal. "make lint" forced some of the size on me--it's happy, but at a cost. However, even before linting, I was still over the limit. I tried to keep it as small as possible, and have left out some clean-ups that need to happen. In particular, I tried to make minimal changes to the tests, so some methods that no longer serve any purpose in the codebase are left because they are exercised by valuable tests. I will follow up with one or more later branches that clean these things up and move tests around appropriately.

Danilo and I agreed that we might want to remove the features of "include_any_tags" (the filter only accepts any bug that has one or more tag) and "exclude_any_tags" (the filter only accepts bugs that have no tags); and we felt that it might be worth reconsidering how the excluded tags work. However, I kept those out of this branch, as out of scope.

That said, I did make one semantic change to an edge case. Before, NOT find_all_tags meant (in part) that excluded tags (==NOT include) must *all* match. I thought that was opposite what I would expect (and in fact initially implemented the other approach without realizing I was contradicting the tests). After my changes, NOT find_all_tags means that *any* excluded tag must match. For example, if a bug has the tags "foo" and "bar", a filter with find_all_tags = False and "-foo" (NOT include "foo") will now match because "bar" is not "foo". (If find_all_tags = True, the filter will not match. If find_all_tags = False and the filters tags are "-foo" and "-bar", it will not match). Again, I might like to reconsider the broader approach to this feature in the future, but this is not the branch, or the time for that.

I won't say more here, in the hopes that the comments will guide you.

Thank you for looking at this over-sized branch.

Revision history for this message

j.c.sackett (jcsackett) wrote on 2011-03-03:

Download full text (32.4 KiB)

Gary--

This all largely looks good; I've got a couple of comments/questions in the diff below.

Gary--

This all largely looks good; I've got a couple of comments/questions in the diff below.

> === modified file 'lib/lp/bugs/model/bugtask.py'
> --- lib/lp/bugs/model/bugtask.py	2011-02-27 19:45:44 +0000
> +++ lib/lp/bugs/model/bugtask.py	2011-03-03 21:04:20 +0000
> 
 <snip>
> 
 For some reason, having one method just wrap a function from another module bugs me. I see no cleaner way of doing this though, so consider this useless kvetching. 
> 
> === modified file 'lib/lp/bugs/model/structuralsubscription.py'
> --- lib/lp/bugs/model/structuralsubscription.py	2011-02-21 15:55:48 +0000
> +++ lib/lp/bugs/model/structuralsubscription.py	2011-03-03 21:04:20 +0000
> @@ -471,243 +478,412 @@
>  
>      def getSubscriptionsForBugTask(self, bugtask, level):
>          """See `IStructuralSubscriptionTarget`."""
> -        set_builder = BugFilterSetBuilder(
> -            bugtask, level, self.__helper.join)
> -        return Store.of(self.__helper.pillar).find(
> -            StructuralSubscription, In(
> -                StructuralSubscription.id,
> -                set_builder.subscriptions))
> +        # Note that this method does not take into account
> +        # structural subscriptions without filters.  Since it is only
> +        # used for tests at this point, that's not a problem; moreover,
> +        # we intend all structural subscriptions to have filters.
> +        candidates, filter_id_query = (
> +            _get_structural_subscription_filter_id_query(
> +                bugtask.bug, [bugtask], level))
> +        if not candidates:
> +            return EmptyResultSet()
> +        return IStore(StructuralSubscription).find(
> +            StructuralSubscription,
> +            BugSubscriptionFilter.structural_subscription_id ==
> +            StructuralSubscription.id,
> +            In(BugSubscriptionFilter.id,
> +               filter_id_query)).config(distinct=True)
> +
> +
> +def get_structural_subscription_targets(bugtasks):
> +    """Return (bugtask, target) pairs for each target of the bugtasks.
> +
> +    Each bugtask may be responsible theoretically for 0 or more targets.
> +    In practice, each generates one, two or three.
> +    """
> +    for bugtask in bugtasks:
> +        if IStructuralSubscriptionTarget.providedBy(bugtask.target):
> +            yield (bugtask, bugtask.target)
> +            if bugtask.target.parent_subscription_target is not None:
> +                yield (bugtask, bugtask.target.parent_subscription_target)
> +        if ISourcePackage.providedBy(bugtask.target):
> +            # Distribution series bug tasks with a package have the source
> +            # package set as their target, so we add the distroseries
> +            # explicitly to the set of subscription targets.
> +            yield (bugtask, bugtask.distroseries)
> +        if bugtask.milestone is not None:
> +            yield (bugtask, bugtask.milestone)
> 
 So, if this is only one to three returns per bugtask, is there a combination of yields here that can't happen?
> 
> +def _get_all_structural_subscriptions(find, targets, *conditions):
> +    """Find the structural subscriptions for the given targets.
> +
> +    "find" should be what to find (typically StructuralSubscription or
> +    StructuralSubscription.id).  "targets" is an iterable of (bugtask,
> +    target) pairs, as returned by get_structural_subscription_targets.
> +    You can add in zero or more additional conditions to filter the
> +    results.
> +    """
> +    return list(
> +        IStore(StructuralSubscription).find(
> +            find,
> +            Or(*[IStructuralSubscriptionTargetHelper(target).join
> +                 for target
> +                 in set(target for bugtask, target
> +                        in targets)]),
> +            *conditions))
> 
 This is sort of hard to parse; maybe:
 
 +    target_set = set(target for bugtask, target in targets)
 +    or_list = [IStructuralSubscriptionTargetHelper(target).join
 +                   for target in target_set]
 +    return list(
 +        IStore(StructuralSubscription).find(
 +            find,
 +            Or(*or_list), *conditions))
 
 This is just a suggestion; I think that's easier to read, you might disagree.
> 
> +def get_all_structural_subscriptions(bugtasks, person=None):
> +    if not bugtasks:
> +        return EmptyResultSet()
> +    conditions = []
> +    if person is not None:
> +        conditions.append(
> +            StructuralSubscription.subscriber == person)
> +    return _get_all_structural_subscriptions(
> +        StructuralSubscription,
> +        get_structural_subscription_targets(bugtasks),
> +        *conditions)
> +
> +
> +def _get_structural_subscribers(candidates, filter_id_query, recipients):
> +    if not candidates:
> +        return EmptyResultSet()
> +    # This is here because of a circular import.
> +    from lp.registry.model.person import Person
> +    source = IStore(StructuralSubscription).using(
> +        StructuralSubscription,
> +        # We need to do this LeftJoin because we still have structural
> +        # subscriptions without filters in qastaging and production.
> +        # Once we do not, we can just use a Join.  Also see constraints
> +        # below.
> 
 Could you leave an XXX comment regarding this and file a bug?
> 
> +        LeftJoin(BugSubscriptionFilter,
> +             BugSubscriptionFilter.structural_subscription_id ==
> +             StructuralSubscription.id),
> +        Join(Person,
> +             Person.id == StructuralSubscription.subscriberID),
> +        )
> +    constraints = [
> +        # We need to do this Or because we still have structural
> +        # subscriptions without filters in qastaging and production.
> +        # Once we do not, we can simplify this to just
> +        # "In(BugSubscriptionFilter.id, filter_id_query)".  Also see
> +        # LeftJoin above.
> 
 Same XXX for this.
> 
> +        Or(In(BugSubscriptionFilter.id, filter_id_query),
> +           And(In(StructuralSubscription.id, candidates),
> +               BugSubscriptionFilter.id == None))]
> +    if recipients is None:
> +        return source.find(
> +            Person, *constraints).config(distinct=True).order_by()
> +    else:
> +        subscribers = []
> +        query_results = source.find(
> +            (Person, StructuralSubscription),
> +            *constraints).config(distinct=True)
> +        for person, subscription in query_results:
> +            # Set up results.
> +            if person not in recipients:
> +                subscribers.append(person)
> +                recipients.addStructuralSubscriber(
> +                    person, subscription.target)
> +        return subscribers
> +
> +
> +def get_structural_subscribers_for_bugtasks(bugtasks,
> +                                            recipients=None,
> +                                            level=None):
> +    """Return subscribers for structural filters for the bugtasks at "level".
> +
> +    All bugtasks must be for the same bug.
> +
> +    Returns None if we already know that the result is empty.
> +
> +    Excludes structural subscriptions for people who are directly subscribed
> +    to the bug.
> +
> +    Populates "recipients" (a
> +    lp.bugs.mail.bugnotificationrecipients.BugNotificationRecipients) if
> +    given."""
> +    if not bugtasks:
> +        return EmptyResultSet()
> +    bugs = set(bugtask.bug for bugtask in bugtasks)
> +    if len(bugs) > 1:
> +        raise NotImplementedError('Each bugtask must be from the same bug.')
> +    bug = bugs.pop()
> +    candidates, query = _get_structural_subscription_filter_id_query(
> +        bug, bugtasks, level)
> +    return _get_structural_subscribers(candidates, query, recipients)
>  
>  
>  class ArrayAgg(NamedFunc):
> +    "Aggregate values (within a GROUP BY) into an array."
>      __slots__ = ()
>      name = "ARRAY_AGG"
>  
>  
>  class ArrayContains(CompoundOper):
> +    "True iff the left side is a superset of the right side."
>      __slots__ = ()
>      oper = "@>"
>  
>  
> -class BugFilterSetBuilder:
> -    """A convenience class to build queries for getSubscriptionsForBugTask."""
> -
> -    def __init__(self, bugtask, level, join_condition):
> -        """Initialize a new set builder for bug filters.
> -
> -        :param bugtask: The `IBugTask` to match against.
> -        :param level: A member of `BugNotificationLevel`.
> -        :param join_condition: A condition for selecting structural
> -            subscriptions. Generally this should limit the subscriptions to a
> -            particular target (i.e. project or distribution).
> -        """
> -        self.status = bugtask.status
> -        self.importance = bugtask.importance
> -        # The list() gets around some weirdness with security proxies; Storm
> -        # does not know how to compile an expression with a proxied list.
> -        self.tags = list(bugtask.bug.tags)
> -        # Set up common conditions.
> -        self.base_conditions = join_condition
> -        # Set up common filter conditions.
> -        self.filter_conditions = And(
> -            BugSubscriptionFilter.bug_notification_level >= level,
> -            self.base_conditions)
> -        if len(self.tags) == 0:
> -            self.filter_conditions = And(
> -                # When the bug has no tags, filters with include_any_tags set
> -                # can never match.
> -                Not(BugSubscriptionFilter.include_any_tags),
> -                self.filter_conditions)
> -        else:
> -            self.filter_conditions = And(
> -                # When the bug has tags, filters with exclude_any_tags set can
> -                # never match.
> -                Not(BugSubscriptionFilter.exclude_any_tags),
> -                self.filter_conditions)
> -
> -    @property
> -    def subscriptions_without_filters(self):
> -        """Subscriptions without filters."""
> -        return Select(
> -            StructuralSubscription.id,
> -            tables=(
> -                StructuralSubscription,
> -                LeftJoin(
> -                    BugSubscriptionFilter,
> -                    BugSubscriptionFilter.structural_subscription_id == (
> -                        StructuralSubscription.id))),
> -            where=And(
> -                BugSubscriptionFilter.id == None,
> -                self.base_conditions))
> -
> -    def _filters_matching_x(self, join, where_condition, **extra):
> -        """Return an expression yielding `(subscription_id, filter_id)` rows.
> -
> -        The expressions returned by this function are used in set (union,
> -        intersect, except) operations at the *filter* level. However, the
> -        interesting result of these set operations is the structural
> -        subscription, hence both columns are included in the expressions
> -        generated. Since a structural subscription can have zero or more
> -        filters, and a filter can never be associated with more than one
> -        subscription, the set operations are unaffected.
> -        """
> -        return Select(
> -            columns=(
> -                # Alias this column so it can be selected in
> -                # subscriptions_matching.
> -                Alias(
> -                    BugSubscriptionFilter.structural_subscription_id,
> -                    "structural_subscription_id"),
> -                BugSubscriptionFilter.id),
> -            tables=(
> -                StructuralSubscription, BugSubscriptionFilter, join),
> -            where=And(
> -                BugSubscriptionFilter.structural_subscription_id == (
> -                    StructuralSubscription.id),
> -                self.filter_conditions,
> -                where_condition),
> -            **extra)
> -
> -    @property
> -    def filters_matching_status(self):
> -        """Filters with the given bugtask's status."""
> -        join = LeftJoin(
> -            BugSubscriptionFilterStatus,
> -            BugSubscriptionFilterStatus.filter_id == (
> -                BugSubscriptionFilter.id))
> -        condition = Or(
> -            BugSubscriptionFilterStatus.id == None,
> -            BugSubscriptionFilterStatus.status == self.status)
> -        return self._filters_matching_x(join, condition)
> -
> -    @property
> -    def filters_matching_importance(self):
> -        """Filters with the given bugtask's importance."""
> -        join = LeftJoin(
> -            BugSubscriptionFilterImportance,
> -            BugSubscriptionFilterImportance.filter_id == (
> -                BugSubscriptionFilter.id))
> -        condition = Or(
> -            BugSubscriptionFilterImportance.id == None,
> -            BugSubscriptionFilterImportance.importance == self.importance)
> -        return self._filters_matching_x(join, condition)
> -
> -    @property
> -    def filters_without_include_tags(self):
> -        """Filters with no tags required."""
> -        join = LeftJoin(
> -            BugSubscriptionFilterTag,
> -            And(BugSubscriptionFilterTag.filter_id == (
> -                    BugSubscriptionFilter.id),
> -                BugSubscriptionFilterTag.include))
> -        return self._filters_matching_x(
> -            join, BugSubscriptionFilterTag.id == None)
> -
> -    @property
> -    def filters_matching_any_include_tags(self):
> -        """Filters including any of the bug's tags."""
> -        condition = And(
> -            BugSubscriptionFilterTag.filter_id == (
> -                BugSubscriptionFilter.id),
> -            BugSubscriptionFilterTag.include,
> -            Not(BugSubscriptionFilter.find_all_tags),
> -            In(BugSubscriptionFilterTag.tag, self.tags))
> -        return self._filters_matching_x(
> -            BugSubscriptionFilterTag, condition)
> -
> -    @property
> -    def filters_matching_any_exclude_tags(self):
> -        """Filters excluding any of the bug's tags."""
> -        condition = And(
> -            BugSubscriptionFilterTag.filter_id == (
> -                BugSubscriptionFilter.id),
> -            Not(BugSubscriptionFilterTag.include),
> -            Not(BugSubscriptionFilter.find_all_tags),
> -            In(BugSubscriptionFilterTag.tag, self.tags))
> -        return self._filters_matching_x(
> -            BugSubscriptionFilterTag, condition)
> -
> -    def _filters_matching_all_x_tags(self, where_condition):
> -        """Return an expression yielding `(subscription_id, filter_id)` rows.
> -
> -        This joins to `BugSubscriptionFilterTag` and calls up to
> -        `_filters_matching_x`, and groups by filter. Conditions are added to
> -        ensure that all rows in each group are a subset of the bug's tags.
> -        """
> -        tags_array = "ARRAY[%s]::TEXT[]" % ",".join(
> -            quote(tag) for tag in self.tags)
> -        return self._filters_matching_x(
> -            BugSubscriptionFilterTag,
> -            And(
> -                BugSubscriptionFilterTag.filter_id == (
> -                    BugSubscriptionFilter.id),
> -                BugSubscriptionFilter.find_all_tags,
> -                self.filter_conditions,
> -                where_condition),
> -            group_by=(
> -                BugSubscriptionFilter.structural_subscription_id,
> -                BugSubscriptionFilter.id),
> -            having=ArrayContains(
> -                SQL(tags_array), ArrayAgg(
> -                    BugSubscriptionFilterTag.tag)))
> -
> -    @property
> -    def filters_matching_all_include_tags(self):
> -        """Filters including the bug's tags."""
> -        return self._filters_matching_all_x_tags(
> -            BugSubscriptionFilterTag.include)
> -
> -    @property
> -    def filters_matching_all_exclude_tags(self):
> -        """Filters excluding the bug's tags."""
> -        return self._filters_matching_all_x_tags(
> -            Not(BugSubscriptionFilterTag.include))
> -
> -    @property
> -    def filters_matching_include_tags(self):
> -        """Filters with tag filters including the bug."""
> -        return Union(
> -            self.filters_matching_any_include_tags,
> -            self.filters_matching_all_include_tags)
> -
> -    @property
> -    def filters_matching_exclude_tags(self):
> -        """Filters with tag filters excluding the bug."""
> -        return Union(
> -            self.filters_matching_any_exclude_tags,
> -            self.filters_matching_all_exclude_tags)
> -
> -    @property
> -    def filters_matching_tags(self):
> -        """Filters with tag filters matching the bug."""
> -        if len(self.tags) == 0:
> -            # The filter's required tags must be an empty set. The filter's
> -            # excluded tags can be anything so no condition is needed.
> -            return self.filters_without_include_tags
> -        else:
> -            return Except(
> -                Union(self.filters_without_include_tags,
> -                      self.filters_matching_include_tags),
> -                self.filters_matching_exclude_tags)
> -
> -    @property
> -    def filters_matching(self):
> -        """Filters matching the bug."""
> -        return Intersect(
> -            self.filters_matching_status,
> -            self.filters_matching_importance,
> -            self.filters_matching_tags)
> -
> -    @property
> -    def subscriptions_with_matching_filters(self):
> -        """Subscriptions with one or more filters matching the bug."""
> -        return Select(
> -            # I don't know of a more Storm-like way of doing this.
> -            SQL("filters_matching.structural_subscription_id"),
> -            tables=Alias(self.filters_matching, "filters_matching"))
> -
> -    @property
> -    def subscriptions(self):
> -        return Union(
> -            self.subscriptions_without_filters,
> -            self.subscriptions_with_matching_filters)
> +class ArrayIntersects(CompoundOper):
> +    "True iff the left side shares at least one element with the right side."
> +    __slots__ = ()
> +    oper = "&&"
> +
> +
> +def _get_structural_subscription_filter_id_query(bug, bugtasks, level):
> +    """Helper function.
> +
> +    This provides the core implementation for
> +    get_structural_subscribers_for_bug and
> +    get_structural_subscribers_for_bugtask.  "bug" should
> +    ba a bug.  "bugtasks" is an iterable of one or more bugtasks of the bug.
> +    "level" is a notification level.
> +    """
> 
 I think for our sphinx docs, this would be better using the ":param" notation. Feel free to double check with someone who knows sphinx though--if it makes no difference, I'm good with this format.
> 
> +    # We get the ids because we need to use group by in order to
> +    # look at the filters' tags in aggregate.  Once we have the ids,
> +    # we can get the full set of what we need in subsuming or
> +    # subsequent SQL calls.
> +    # (Aside 1: We could in theory get all the fields we wanted with
> +    # a hack--we could use an aggregrate function like max to get
> +    # fields that we know will be unique--but Storm would not like
> +    # it.)
> +    # (Aside 2: IMO Postgres should allow getting other fields if
> +    # the group-by key is a primary key and the other fields desired
> +    # are other values from the same table as the group-by key, or
> +    # values of a table linked by a foreign key from the same table
> +    # as the group-by key...but that's dreaming.)
> +    # See the docstring of get_structural_subscription_targets.
> +    query_arguments = list(
> +        get_structural_subscription_targets(bugtasks))
> +    assert len(query_arguments) > 0, (
> +        'Programmer error: expected query arguments')
> +    # With large numbers of filters in the system, it's fastest in our
> +    # tests if we get a set of structural subscriptions pertinent to the
> +    # given targets, and then work with that.  It also comes in handy
> +    # when we have to do a union, because we can share the work across
> +    # the two queries.
> +    # We will exclude people who have a direct subscription to the bug.
> +    filters = [
> +        Not(In(StructuralSubscription.subscriberID,
> +               Select(BugSubscription.person_id,
> +                      BugSubscription.bug == bug)))]
> +    candidates = _get_all_structural_subscriptions(
> +        StructuralSubscription.id, query_arguments, *filters)
> +    if not candidates:
> +        # If there are no structural subscriptions for these targets,
> +        # then we don't need to look at the importance, status, and
> +        # tags.  We're done.
> +        return None, None
> +    # The "select_args" dictionary holds the arguments that we will
> +    # pass to one or more SELECT clauses.  We start with what will
> +    # become the FROM clause.  We always want the following Joins,
> +    # so we can add them here at the beginning.
> +    select_args = {
> +        'tables': [
> +            StructuralSubscription,
> +            Join(BugSubscriptionFilter,
> +                 BugSubscriptionFilter.structural_subscription_id ==
> +                 StructuralSubscription.id),
> +            LeftJoin(BugSubscriptionFilterStatus,
> +                     BugSubscriptionFilterStatus.filter_id ==
> +                     BugSubscriptionFilter.id),
> +            LeftJoin(BugSubscriptionFilterImportance,
> +                     BugSubscriptionFilterImportance.filter_id ==
> +                     BugSubscriptionFilter.id),
> +            LeftJoin(BugSubscriptionFilterTag,
> +                     BugSubscriptionFilterTag.filter_id ==
> +                     BugSubscriptionFilter.id)]}
> +    # The "conditions" list will eventually be passed to a Storm
> +    # "And" function, and then become the WHERE clause of our SELECT.
> +    conditions = [In(StructuralSubscription.id, candidates)]
> +    # Handling notification level is trivial, so we include that first.
> +    if level is not None:
> +        conditions.append(
> +            BugSubscriptionFilter.bug_notification_level >= level)
> +    # Now we handle importance and status, which are per bugtask.
> +    # What we do is loop through the collection of bugtask, target
> +    # in query_arguments.  Each bugtask will have one or more
> +    # targets that we have to check.  We figure out how to describe each
> +    # target using the useful IStructuralSubscriptionTargetHelper
> +    # adapter, which has a "join" attribute on it that tells us how
> +    # to distinguish that target.  Once we have all of the target
> +    # descriptins, we OR those together, and say that the filters
> +    # for those targets must either have no importance or match the
> +    # associated bugtask's importance; and have no status or match
> +    # the bugtask's status.  Once we have looked at all of the
> +    # bugtasks, we OR all of those per-bugtask target comparisons
> +    # together, and we are done with the status and importance.
> +    # The "outer_or_conditions" list holds the full clauses for each
> +    # bugtask.
> +    outer_or_conditions = []
> +    # The "or_target_conditions" list holds the clauses for each target,
> +    # and is reset for each new bugtask.
> +    or_target_conditions = []
> +
> +    def handle_bugtask_conditions(bugtask):
> +        """Helper function for building status and importance clauses.
> +
> +        Call with the previous bugtask when the bugtask changes in
> +        the iteration of query_arguments, and call with the last
> +        bugtask when the iteration is complete."""
> +        if or_target_conditions:
> +            outer_or_conditions.append(
> +                And(Or(*or_target_conditions),
> +                    Or(BugSubscriptionFilterImportance.importance ==
> +                       bugtask.importance,
> +                       BugSubscriptionFilterImportance.importance == None),
> +                    Or(BugSubscriptionFilterStatus.status == bugtask.status,
> +                       BugSubscriptionFilterStatus.status == None)))
> +            del or_target_conditions[:]
> +    last_bugtask = None
> +    for bugtask, target in query_arguments:
> +        if last_bugtask is not bugtask:
> +            handle_bugtask_conditions(last_bugtask)
> +        last_bugtask = bugtask
> +        or_target_conditions.append(
> +            IStructuralSubscriptionTargetHelper(target).join)
> +    # We know there will be at least one bugtask, because we already
> +    # escaped early "if len(query_arguments) == 0".
> +    handle_bugtask_conditions(bugtask)
> +    conditions.append(Or(*outer_or_conditions))
> +    # Now we handle tags.  If the bug has no tags, this is
> +    # relatively easy. Otherwise, not so much.
> +    tags = list(bug.tags) # This subtly removes the security proxy on
> +    # the list.  Strings are never security-proxied, so we don't have
> +    # to worry about them.
> +    if len(tags) == 0:
> +        # The bug has no tags.  We should leave out filters that
> +        # require any generic non-empty set of tags
> +        # (BugSubscriptionFilter.include_any_tags), which we do with
> +        # the conditions.  Then we can finish up the WHERE clause.
> +        # Then we have to make sure that the filter does not require
> +        # any *specific* tags. We do that with a GROUP BY on the
> +        # filters, and then a HAVING clause that aggregates the
> +        # BugSubscriptionFilterTags that are set to "include" the
> +        # tag.  (If it is not an include, that is an exclude, and a
> +        # bug without tags will not have a particular tag, so we can
> +        # ignore those in this case.)  This requires a CASE
> +        # statement within the COUNT.  After this, we are done, and
> +        # we return the fully formed SELECT query object.
> +        conditions.append(Not(BugSubscriptionFilter.include_any_tags))
> +        select_args['where'] = And(*conditions)
> +        select_args['group_by'] = (BugSubscriptionFilter.id,)
> +        select_args['having'] = Count(
> +            SQL('CASE WHEN BugSubscriptionFilterTag.include '
> +                'THEN BugSubscriptionFilterTag.tag END'))==0
> +        return candidates, Select(BugSubscriptionFilter.id, **select_args)
> +    else:
> +        # The bug has some tags.  This will require a bit of fancy
> +        # footwork. First, though, we will simply want to leave out
> +        # filters that should only match bugs without tags.
> +        conditions.append(Not(BugSubscriptionFilter.exclude_any_tags))
> +        # We're going to have to do a union with another query.  One
> +        # query will handle filters that are marked to include *any*
> +        # of the filter's selected tags, and the other query will
> +        # handle filters that include *all* of the filter's selected
> +        # tags (as determined by BugSubscriptionFilter.find_all_tags).
> +        # Every aspect of the unioned queries' WHERE clauses *other
> +        # than tags* will need to be the same. We could try making a
> +        # temporary table for the shared results, but that would
> +        # involve another separate Postgres call, and I think that
> +        # we've already gotten the big win by separating out the
> +        # structural subscriptions into "candidates," above.
> +        #
> +        # So, up to now we've been assembling the things that are shared
> +        # between the two queries, but now we start working on the
> +        # differences between the two unioned queries. "first_select"
> +        # will hold one set of arguments, and select_args will hold the
> +        # other.
> +        first_select = select_args.copy()
> +        # As mentioned, in this first SELECT we handle filters that
> +        # match any of the filter's tags.  This can be a relatively
> +        # straightforward query--we just need a bit more added to
> +        # our WHERE clause, and we don't need a GROUP BY/HAVING.
> +        first_select['where'] = And(
> +            Or(# We want filters that proclaim they simply want any tags.
> +               BugSubscriptionFilter.include_any_tags,
> +               # Also include filters that match any tag...
> +               And(Not(BugSubscriptionFilter.find_all_tags),
> +                   Or(# ...with a positive match...
> +                      And(BugSubscriptionFilterTag.include,
> +                          In(BugSubscriptionFilterTag.tag, tags)),
> +                      # ...or with a negative match...
> +                      And(Not(BugSubscriptionFilterTag.include),
> +                          Not(In(BugSubscriptionFilterTag.tag, tags))),
> +                      # ...or if the filter does not specify any tags.
> +                      BugSubscriptionFilterTag.tag == None))),
> +            *conditions)
> +        first_select = Select(BugSubscriptionFilter.id, **first_select)
> +        # We have our first clause.  Now we start on the second one:
> +        # handling filters that match *all* tags. Our WHERE clause
> +        # is straightforward and, it should be clear that we are
> +        # simply focusing on BugSubscriptionFilter.find_all_tags,
> +        # when the first SELECT did not consider it.
> +        select_args['where'] = And(
> +            BugSubscriptionFilter.find_all_tags, *conditions)
> +        # The GROUP BY collects the filters together.
> +        select_args['group_by'] = (BugSubscriptionFilter.id,)
> +        # Now it is time for the HAVING clause, which is where some
> +        # tricky bits happen. We first make a SQL snippet that
> +        # represents the tags on this bug.  It is straightforward
> +        # except for one subtle hack: the addition of the empty
> +        # space in the array.  This is because we are going to be
> +        # aggregating the tags on the filters using ARRAY_AGG, which
> +        # includes NULLs (unlike most other aggregators).  That
> +        # is an issue here because we use CASE statements to divide
> +        # up the set of tags that are supposed to be included and
> +        # supposed to be excluded.  This means that if we aggregate
> +        # "CASE WHEN BugSubscriptionFilterTag.include THEN
> +        # BugSubscriptionFilterTag.tag END" then that array will
> +        # include NULL.  SQL treats NULLs as unknowns that can never
> +        # be matched, so the array of ['foo', 'bar', NULL] does not
> +        # contain the array of ['foo', NULL] ("SELECT
> +        # ARRAY['foo','bar',NULL]::TEXT[] @>
> +        # ARRAY['foo',NULL]::TEXT[];" is false).  Therefore, so we
> +        # can make the HAVING statement we want to make without
> +        # defining a custom Postgres aggregator, we use a single
> +        # space as, effectively, NULL.  This is safe because a
> +        # single space is not an acceptable tag.  Again, the
> +        # clearest alternative is defining a custom Postgres aggregator.
> +        tags_array = "ARRAY[%s,' ']::TEXT[]" % ",".join(
> +            quote(tag) for tag in tags)
> +        # Now comes the HAVING clause itself.
> +        select_args['having'] = And(
> +            # The list of tags should be a superset of the filter tags to
> +            # be included.
> +            ArrayContains(
> +                SQL(tags_array),
> +                # This next line gives us an array of the tags that the
> +                # filter wants to include.  Notice that it includes the
> +                # empty string when the condition does not match, per the
> +                # discussion above.
> +                ArrayAgg(
> +                   SQL("CASE WHEN BugSubscriptionFilterTag.include "
> +                       "THEN BugSubscriptionFilterTag.tag "
> +                       "ELSE ' '::TEXT END"))),
> +            # The list of tags should also not intersect with the
> +            # tags that the filter wants to exclude.
> +            Not(
> +                ArrayIntersects(
> +                    SQL(tags_array),
> +                    # This next line gives us an array of the tags
> +                    # that the filter wants to exclude.  We do not bother
> > +                    # with the empty string, and therefore allow NULLs
> +                    # into the array, because in this case we are
> +                    # determining whether the sets intersect, not if the
> +                    # first set subsumes the second.
> +                    ArrayAgg(
> +                       SQL('CASE WHEN '
> +                           'NOT BugSubscriptionFilterTag.include '
> +                           'THEN BugSubscriptionFilterTag.tag END')))))
> +        # Everything is ready.  Make our second SELECT statement, UNION
> +        # it, and return it.
> +        return candidates, Union(
> +            first_select,
> +            Select(
> +                BugSubscriptionFilter.id,
> +                **select_args))
>

Okay, I *think* I get this, but man is this a hard one to parse. It may, in subsequent branches, be worth trying to break this into smaller bits. It may not be a bad idea to get someone with stronger storm/sql fu to take a look at just this one function.

review: Needs Information

Revision history for this message

Gary Poster (gary) wrote on 2011-03-04:

Download full text (36.2 KiB)

On Mar 3, 2011, at 6:14 PM, j.c.sackett wrote:

> Review: Needs Information
> Gary--
>
> This all largely looks good; I've got a couple of comments/questions in the diff below.
>
>
>> === modified file 'lib/lp/bugs/model/bugtask.py'
>> --- lib/lp/bugs/model/bugtask.py 2011-02-27 19:45:44 +0000
>> +++ lib/lp/bugs/model/bugtask.py 2011-03-03 21:04:20 +0000
>>
> <snip>
>>
> For some reason, having one method just wrap a function from another module bugs me. I see no cleaner way of doing this though, so consider this useless kvetching.

I completely agree, on both counts.

I have an idea on how to make it better for the second branch, but no guarantees.

>>
>> === modified file 'lib/lp/bugs/model/structuralsubscription.py'
>> --- lib/lp/bugs/model/structuralsubscription.py 2011-02-21 15:55:48 +0000
>> +++ lib/lp/bugs/model/structuralsubscription.py 2011-03-03 21:04:20 +0000
>> @@ -471,243 +478,412 @@
>>
>> def getSubscriptionsForBugTask(self, bugtask, level):
>> """See `IStructuralSubscriptionTarget`."""
>> - set_builder = BugFilterSetBuilder(
>> - bugtask, level, self.__helper.join)
>> - return Store.of(self.__helper.pillar).find(
>> - StructuralSubscription, In(
>> - StructuralSubscription.id,
>> - set_builder.subscriptions))
>> + # Note that this method does not take into account
>> + # structural subscriptions without filters. Since it is only
>> + # used for tests at this point, that's not a problem; moreover,
>> + # we intend all structural subscriptions to have filters.
>> + candidates, filter_id_query = (
>> + _get_structural_subscription_filter_id_query(
>> + bugtask.bug, [bugtask], level))
>> + if not candidates:
>> + return EmptyResultSet()
>> + return IStore(StructuralSubscription).find(
>> + StructuralSubscription,
>> + BugSubscriptionFilter.structural_subscription_id ==
>> + StructuralSubscription.id,
>> + In(BugSubscriptionFilter.id,
>> + filter_id_query)).config(distinct=True)
>> +
>> +
>> +def get_structural_subscription_targets(bugtasks):
>> + """Return (bugtask, target) pairs for each target of the bugtasks.
>> +
>> + Each bugtask may be responsible theoretically for 0 or more targets.
>> + In practice, each generates one, two or three.
>> + """
>> + for bugtask in bugtasks:
>> + if IStructuralSubscriptionTarget.providedBy(bugtask.target):
>> + yield (bugtask, bugtask.target)
>> + if bugtask.target.parent_subscription_target is not None:
>> + yield (bugtask, bugtask.target.parent_subscription_target)
>> + if ISourcePackage.providedBy(bugtask.target):
>> + # Distribution series bug tasks with a package have the source
>> + # package set as their target, so we add the distroseries
>> + # explicitly to the set of subscription targets.
>> + yield (bugtask, bugtask.distroseries)
>> + if bugtask.milestone is not None:
>> + yield (bugtask, bugtask.milestone)
>>
> So, if this is on...

On Mar 3, 2011, at 6:14 PM, j.c.sackett wrote:

> Review: Needs Information
> Gary--
> 
> This all largely looks good; I've got a couple of comments/questions in the diff below.
> 
> 
>> === modified file 'lib/lp/bugs/model/bugtask.py'
>> --- lib/lp/bugs/model/bugtask.py	2011-02-27 19:45:44 +0000
>> +++ lib/lp/bugs/model/bugtask.py	2011-03-03 21:04:20 +0000
>> 
> <snip>
>> 
> For some reason, having one method just wrap a function from another module bugs me. I see no cleaner way of doing this though, so consider this useless kvetching.

I completely agree, on both counts.

I have an idea on how to make it better for the second branch, but no guarantees.

>> 
>> === modified file 'lib/lp/bugs/model/structuralsubscription.py'
>> --- lib/lp/bugs/model/structuralsubscription.py	2011-02-21 15:55:48 +0000
>> +++ lib/lp/bugs/model/structuralsubscription.py	2011-03-03 21:04:20 +0000
>> @@ -471,243 +478,412 @@
>> 
>>     def getSubscriptionsForBugTask(self, bugtask, level):
>>         """See `IStructuralSubscriptionTarget`."""
>> -        set_builder = BugFilterSetBuilder(
>> -            bugtask, level, self.__helper.join)
>> -        return Store.of(self.__helper.pillar).find(
>> -            StructuralSubscription, In(
>> -                StructuralSubscription.id,
>> -                set_builder.subscriptions))
>> +        # Note that this method does not take into account
>> +        # structural subscriptions without filters.  Since it is only
>> +        # used for tests at this point, that's not a problem; moreover,
>> +        # we intend all structural subscriptions to have filters.
>> +        candidates, filter_id_query = (
>> +            _get_structural_subscription_filter_id_query(
>> +                bugtask.bug, [bugtask], level))
>> +        if not candidates:
>> +            return EmptyResultSet()
>> +        return IStore(StructuralSubscription).find(
>> +            StructuralSubscription,
>> +            BugSubscriptionFilter.structural_subscription_id ==
>> +            StructuralSubscription.id,
>> +            In(BugSubscriptionFilter.id,
>> +               filter_id_query)).config(distinct=True)
>> +
>> +
>> +def get_structural_subscription_targets(bugtasks):
>> +    """Return (bugtask, target) pairs for each target of the bugtasks.
>> +
>> +    Each bugtask may be responsible theoretically for 0 or more targets.
>> +    In practice, each generates one, two or three.
>> +    """
>> +    for bugtask in bugtasks:
>> +        if IStructuralSubscriptionTarget.providedBy(bugtask.target):
>> +            yield (bugtask, bugtask.target)
>> +            if bugtask.target.parent_subscription_target is not None:
>> +                yield (bugtask, bugtask.target.parent_subscription_target)
>> +        if ISourcePackage.providedBy(bugtask.target):
>> +            # Distribution series bug tasks with a package have the source
>> +            # package set as their target, so we add the distroseries
>> +            # explicitly to the set of subscription targets.
>> +            yield (bugtask, bugtask.distroseries)
>> +        if bugtask.milestone is not None:
>> +            yield (bugtask, bugtask.milestone)
>> 
> So, if this is only one to three returns per bugtask, is there a combination of yields here that can't happen?

Well...realize that this is my attempt to backwards engineer code that already existed (in at least two places).  So, yes, the pre-existing code path that I moved here suggests that up to four could be yielded.  However, I *think* that you could change the second if to an elif--that is, either it is an IStructuralSubscriptionTarget or it is an ISourcePackage.  However, I'm kind of loathe to commit to that in code, to be honest.

These are the options I would be happiest for you to choose from:

- Leave as is.
- Leave as is, but explain that I count three because I don't think that ISourcePackage is an IStructuralSubscriptionTarget.
- Change the docstring to say up to four, and forget about my belief that it will really only ever turn out to be up to three.

I would be decreasingly less happy with these.

- Make a bug and an XXX about the fact that I think ISourcePackage can be an elif.
- Promise to determine the answer in a subsequent branch.
- Figure it out now and either fix the docstring or add the elif.

>> 
>> +def _get_all_structural_subscriptions(find, targets, *conditions):
>> +    """Find the structural subscriptions for the given targets.
>> +
>> +    "find" should be what to find (typically StructuralSubscription or
>> +    StructuralSubscription.id).  "targets" is an iterable of (bugtask,
>> +    target) pairs, as returned by get_structural_subscription_targets.
>> +    You can add in zero or more additional conditions to filter the
>> +    results.
>> +    """
>> +    return list(
>> +        IStore(StructuralSubscription).find(
>> +            find,
>> +            Or(*[IStructuralSubscriptionTargetHelper(target).join
>> +                 for target
>> +                 in set(target for bugtask, target
>> +                        in targets)]),
>> +            *conditions))
>> 
> This is sort of hard to parse; maybe:
> 
> +    target_set = set(target for bugtask, target in targets)
> +    or_list = [IStructuralSubscriptionTargetHelper(target).join
> +                   for target in target_set]
> +    return list(
> +        IStore(StructuralSubscription).find(
> +            find,
> +            Or(*or_list), *conditions))
> 
> This is just a suggestion; I think that's easier to read, you might disagree.

I definitely agree with the readability of the target_set.  I'm less confident about the or_list but will prefer your opinion to mine, since I'm too close to it to be trusted.  IOW, I'll do it.

>> 
>> +def get_all_structural_subscriptions(bugtasks, person=None):
>> +    if not bugtasks:
>> +        return EmptyResultSet()
>> +    conditions = []
>> +    if person is not None:
>> +        conditions.append(
>> +            StructuralSubscription.subscriber == person)
>> +    return _get_all_structural_subscriptions(
>> +        StructuralSubscription,
>> +        get_structural_subscription_targets(bugtasks),
>> +        *conditions)
>> +
>> +
>> +def _get_structural_subscribers(candidates, filter_id_query, recipients):
>> +    if not candidates:
>> +        return EmptyResultSet()
>> +    # This is here because of a circular import.
>> +    from lp.registry.model.person import Person
>> +    source = IStore(StructuralSubscription).using(
>> +        StructuralSubscription,
>> +        # We need to do this LeftJoin because we still have structural
>> +        # subscriptions without filters in qastaging and production.
>> +        # Once we do not, we can just use a Join.  Also see constraints
>> +        # below.
>> 
> Could you leave an XXX comment regarding this and file a bug?

Yes, agreed.  (We need this for our feature work, and it is done on staging, and it was done on production until we had to roll it back because of this bug, so it will be done and I didn't feel the need for a bug; but it is the proper thing to do.)

>> 
>> +        LeftJoin(BugSubscriptionFilter,
>> +             BugSubscriptionFilter.structural_subscription_id ==
>> +             StructuralSubscription.id),
>> +        Join(Person,
>> +             Person.id == StructuralSubscription.subscriberID),
>> +        )
>> +    constraints = [
>> +        # We need to do this Or because we still have structural
>> +        # subscriptions without filters in qastaging and production.
>> +        # Once we do not, we can simplify this to just
>> +        # "In(BugSubscriptionFilter.id, filter_id_query)".  Also see
>> +        # LeftJoin above.
>> 
> Same XXX for this.

Ack.

>> 
>> +        Or(In(BugSubscriptionFilter.id, filter_id_query),
>> +           And(In(StructuralSubscription.id, candidates),
>> +               BugSubscriptionFilter.id == None))]
>> +    if recipients is None:
>> +        return source.find(
>> +            Person, *constraints).config(distinct=True).order_by()
>> +    else:
>> +        subscribers = []
>> +        query_results = source.find(
>> +            (Person, StructuralSubscription),
>> +            *constraints).config(distinct=True)
>> +        for person, subscription in query_results:
>> +            # Set up results.
>> +            if person not in recipients:
>> +                subscribers.append(person)
>> +                recipients.addStructuralSubscriber(
>> +                    person, subscription.target)
>> +        return subscribers
>> +
>> +
>> +def get_structural_subscribers_for_bugtasks(bugtasks,
>> +                                            recipients=None,
>> +                                            level=None):
>> +    """Return subscribers for structural filters for the bugtasks at "level".
>> +
>> +    All bugtasks must be for the same bug.
>> +
>> +    Returns None if we already know that the result is empty.
>> +
>> +    Excludes structural subscriptions for people who are directly subscribed
>> +    to the bug.
>> +
>> +    Populates "recipients" (a
>> +    lp.bugs.mail.bugnotificationrecipients.BugNotificationRecipients) if
>> +    given."""
>> +    if not bugtasks:
>> +        return EmptyResultSet()
>> +    bugs = set(bugtask.bug for bugtask in bugtasks)
>> +    if len(bugs) > 1:
>> +        raise NotImplementedError('Each bugtask must be from the same bug.')
>> +    bug = bugs.pop()
>> +    candidates, query = _get_structural_subscription_filter_id_query(
>> +        bug, bugtasks, level)
>> +    return _get_structural_subscribers(candidates, query, recipients)
>> 
>> 
>> class ArrayAgg(NamedFunc):
>> +    "Aggregate values (within a GROUP BY) into an array."
>>     __slots__ = ()
>>     name = "ARRAY_AGG"
>> 
>> 
>> class ArrayContains(CompoundOper):
>> +    "True iff the left side is a superset of the right side."
>>     __slots__ = ()
>>     oper = "@>"
>> 
>> 
>> -class BugFilterSetBuilder:
>> -    """A convenience class to build queries for getSubscriptionsForBugTask."""
>> -
>> -    def __init__(self, bugtask, level, join_condition):
>> -        """Initialize a new set builder for bug filters.
>> -
>> -        :param bugtask: The `IBugTask` to match against.
>> -        :param level: A member of `BugNotificationLevel`.
>> -        :param join_condition: A condition for selecting structural
>> -            subscriptions. Generally this should limit the subscriptions to a
>> -            particular target (i.e. project or distribution).
>> -        """
>> -        self.status = bugtask.status
>> -        self.importance = bugtask.importance
>> -        # The list() gets around some weirdness with security proxies; Storm
>> -        # does not know how to compile an expression with a proxied list.
>> -        self.tags = list(bugtask.bug.tags)
>> -        # Set up common conditions.
>> -        self.base_conditions = join_condition
>> -        # Set up common filter conditions.
>> -        self.filter_conditions = And(
>> -            BugSubscriptionFilter.bug_notification_level >= level,
>> -            self.base_conditions)
>> -        if len(self.tags) == 0:
>> -            self.filter_conditions = And(
>> -                # When the bug has no tags, filters with include_any_tags set
>> -                # can never match.
>> -                Not(BugSubscriptionFilter.include_any_tags),
>> -                self.filter_conditions)
>> -        else:
>> -            self.filter_conditions = And(
>> -                # When the bug has tags, filters with exclude_any_tags set can
>> -                # never match.
>> -                Not(BugSubscriptionFilter.exclude_any_tags),
>> -                self.filter_conditions)
>> -
>> -    @property
>> -    def subscriptions_without_filters(self):
>> -        """Subscriptions without filters."""
>> -        return Select(
>> -            StructuralSubscription.id,
>> -            tables=(
>> -                StructuralSubscription,
>> -                LeftJoin(
>> -                    BugSubscriptionFilter,
>> -                    BugSubscriptionFilter.structural_subscription_id == (
>> -                        StructuralSubscription.id))),
>> -            where=And(
>> -                BugSubscriptionFilter.id == None,
>> -                self.base_conditions))
>> -
>> -    def _filters_matching_x(self, join, where_condition, **extra):
>> -        """Return an expression yielding `(subscription_id, filter_id)` rows.
>> -
>> -        The expressions returned by this function are used in set (union,
>> -        intersect, except) operations at the *filter* level. However, the
>> -        interesting result of these set operations is the structural
>> -        subscription, hence both columns are included in the expressions
>> -        generated. Since a structural subscription can have zero or more
>> -        filters, and a filter can never be associated with more than one
>> -        subscription, the set operations are unaffected.
>> -        """
>> -        return Select(
>> -            columns=(
>> -                # Alias this column so it can be selected in
>> -                # subscriptions_matching.
>> -                Alias(
>> -                    BugSubscriptionFilter.structural_subscription_id,
>> -                    "structural_subscription_id"),
>> -                BugSubscriptionFilter.id),
>> -            tables=(
>> -                StructuralSubscription, BugSubscriptionFilter, join),
>> -            where=And(
>> -                BugSubscriptionFilter.structural_subscription_id == (
>> -                    StructuralSubscription.id),
>> -                self.filter_conditions,
>> -                where_condition),
>> -            **extra)
>> -
>> -    @property
>> -    def filters_matching_status(self):
>> -        """Filters with the given bugtask's status."""
>> -        join = LeftJoin(
>> -            BugSubscriptionFilterStatus,
>> -            BugSubscriptionFilterStatus.filter_id == (
>> -                BugSubscriptionFilter.id))
>> -        condition = Or(
>> -            BugSubscriptionFilterStatus.id == None,
>> -            BugSubscriptionFilterStatus.status == self.status)
>> -        return self._filters_matching_x(join, condition)
>> -
>> -    @property
>> -    def filters_matching_importance(self):
>> -        """Filters with the given bugtask's importance."""
>> -        join = LeftJoin(
>> -            BugSubscriptionFilterImportance,
>> -            BugSubscriptionFilterImportance.filter_id == (
>> -                BugSubscriptionFilter.id))
>> -        condition = Or(
>> -            BugSubscriptionFilterImportance.id == None,
>> -            BugSubscriptionFilterImportance.importance == self.importance)
>> -        return self._filters_matching_x(join, condition)
>> -
>> -    @property
>> -    def filters_without_include_tags(self):
>> -        """Filters with no tags required."""
>> -        join = LeftJoin(
>> -            BugSubscriptionFilterTag,
>> -            And(BugSubscriptionFilterTag.filter_id == (
>> -                    BugSubscriptionFilter.id),
>> -                BugSubscriptionFilterTag.include))
>> -        return self._filters_matching_x(
>> -            join, BugSubscriptionFilterTag.id == None)
>> -
>> -    @property
>> -    def filters_matching_any_include_tags(self):
>> -        """Filters including any of the bug's tags."""
>> -        condition = And(
>> -            BugSubscriptionFilterTag.filter_id == (
>> -                BugSubscriptionFilter.id),
>> -            BugSubscriptionFilterTag.include,
>> -            Not(BugSubscriptionFilter.find_all_tags),
>> -            In(BugSubscriptionFilterTag.tag, self.tags))
>> -        return self._filters_matching_x(
>> -            BugSubscriptionFilterTag, condition)
>> -
>> -    @property
>> -    def filters_matching_any_exclude_tags(self):
>> -        """Filters excluding any of the bug's tags."""
>> -        condition = And(
>> -            BugSubscriptionFilterTag.filter_id == (
>> -                BugSubscriptionFilter.id),
>> -            Not(BugSubscriptionFilterTag.include),
>> -            Not(BugSubscriptionFilter.find_all_tags),
>> -            In(BugSubscriptionFilterTag.tag, self.tags))
>> -        return self._filters_matching_x(
>> -            BugSubscriptionFilterTag, condition)
>> -
>> -    def _filters_matching_all_x_tags(self, where_condition):
>> -        """Return an expression yielding `(subscription_id, filter_id)` rows.
>> -
>> -        This joins to `BugSubscriptionFilterTag` and calls up to
>> -        `_filters_matching_x`, and groups by filter. Conditions are added to
>> -        ensure that all rows in each group are a subset of the bug's tags.
>> -        """
>> -        tags_array = "ARRAY[%s]::TEXT[]" % ",".join(
>> -            quote(tag) for tag in self.tags)
>> -        return self._filters_matching_x(
>> -            BugSubscriptionFilterTag,
>> -            And(
>> -                BugSubscriptionFilterTag.filter_id == (
>> -                    BugSubscriptionFilter.id),
>> -                BugSubscriptionFilter.find_all_tags,
>> -                self.filter_conditions,
>> -                where_condition),
>> -            group_by=(
>> -                BugSubscriptionFilter.structural_subscription_id,
>> -                BugSubscriptionFilter.id),
>> -            having=ArrayContains(
>> -                SQL(tags_array), ArrayAgg(
>> -                    BugSubscriptionFilterTag.tag)))
>> -
>> -    @property
>> -    def filters_matching_all_include_tags(self):
>> -        """Filters including the bug's tags."""
>> -        return self._filters_matching_all_x_tags(
>> -            BugSubscriptionFilterTag.include)
>> -
>> -    @property
>> -    def filters_matching_all_exclude_tags(self):
>> -        """Filters excluding the bug's tags."""
>> -        return self._filters_matching_all_x_tags(
>> -            Not(BugSubscriptionFilterTag.include))
>> -
>> -    @property
>> -    def filters_matching_include_tags(self):
>> -        """Filters with tag filters including the bug."""
>> -        return Union(
>> -            self.filters_matching_any_include_tags,
>> -            self.filters_matching_all_include_tags)
>> -
>> -    @property
>> -    def filters_matching_exclude_tags(self):
>> -        """Filters with tag filters excluding the bug."""
>> -        return Union(
>> -            self.filters_matching_any_exclude_tags,
>> -            self.filters_matching_all_exclude_tags)
>> -
>> -    @property
>> -    def filters_matching_tags(self):
>> -        """Filters with tag filters matching the bug."""
>> -        if len(self.tags) == 0:
>> -            # The filter's required tags must be an empty set. The filter's
>> -            # excluded tags can be anything so no condition is needed.
>> -            return self.filters_without_include_tags
>> -        else:
>> -            return Except(
>> -                Union(self.filters_without_include_tags,
>> -                      self.filters_matching_include_tags),
>> -                self.filters_matching_exclude_tags)
>> -
>> -    @property
>> -    def filters_matching(self):
>> -        """Filters matching the bug."""
>> -        return Intersect(
>> -            self.filters_matching_status,
>> -            self.filters_matching_importance,
>> -            self.filters_matching_tags)
>> -
>> -    @property
>> -    def subscriptions_with_matching_filters(self):
>> -        """Subscriptions with one or more filters matching the bug."""
>> -        return Select(
>> -            # I don't know of a more Storm-like way of doing this.
>> -            SQL("filters_matching.structural_subscription_id"),
>> -            tables=Alias(self.filters_matching, "filters_matching"))
>> -
>> -    @property
>> -    def subscriptions(self):
>> -        return Union(
>> -            self.subscriptions_without_filters,
>> -            self.subscriptions_with_matching_filters)
>> +class ArrayIntersects(CompoundOper):
>> +    "True iff the left side shares at least one element with the right side."
>> +    __slots__ = ()
>> +    oper = "&&"
>> +
>> +
>> +def _get_structural_subscription_filter_id_query(bug, bugtasks, level):
>> +    """Helper function.
>> +
>> +    This provides the core implementation for
>> +    get_structural_subscribers_for_bug and
>> +    get_structural_subscribers_for_bugtask.  "bug" should
>> +    ba a bug.  "bugtasks" is an iterable of one or more bugtasks of the bug.
>> +    "level" is a notification level.
>> +    """
>> 
> I think for our sphinx docs, this would be better using the ":param" notation. Feel free to double check with someone who knows sphinx though--if it makes no difference, I'm good with this format.

Ah right.  I used to know Sphinx. :-P  I'll change it, good catch.

>> 
>> +    # We get the ids because we need to use group by in order to
>> +    # look at the filters' tags in aggregate.  Once we have the ids,
>> +    # we can get the full set of what we need in subsuming or
>> +    # subsequent SQL calls.
>> +    # (Aside 1: We could in theory get all the fields we wanted with
>> +    # a hack--we could use an aggregrate function like max to get
>> +    # fields that we know will be unique--but Storm would not like
>> +    # it.)
>> +    # (Aside 2: IMO Postgres should allow getting other fields if
>> +    # the group-by key is a primary key and the other fields desired
>> +    # are other values from the same table as the group-by key, or
>> +    # values of a table linked by a foreign key from the same table
>> +    # as the group-by key...but that's dreaming.)
>> +    # See the docstring of get_structural_subscription_targets.
>> +    query_arguments = list(
>> +        get_structural_subscription_targets(bugtasks))
>> +    assert len(query_arguments) > 0, (
>> +        'Programmer error: expected query arguments')
>> +    # With large numbers of filters in the system, it's fastest in our
>> +    # tests if we get a set of structural subscriptions pertinent to the
>> +    # given targets, and then work with that.  It also comes in handy
>> +    # when we have to do a union, because we can share the work across
>> +    # the two queries.
>> +    # We will exclude people who have a direct subscription to the bug.
>> +    filters = [
>> +        Not(In(StructuralSubscription.subscriberID,
>> +               Select(BugSubscription.person_id,
>> +                      BugSubscription.bug == bug)))]
>> +    candidates = _get_all_structural_subscriptions(
>> +        StructuralSubscription.id, query_arguments, *filters)
>> +    if not candidates:
>> +        # If there are no structural subscriptions for these targets,
>> +        # then we don't need to look at the importance, status, and
>> +        # tags.  We're done.
>> +        return None, None
>> +    # The "select_args" dictionary holds the arguments that we will
>> +    # pass to one or more SELECT clauses.  We start with what will
>> +    # become the FROM clause.  We always want the following Joins,
>> +    # so we can add them here at the beginning.
>> +    select_args = {
>> +        'tables': [
>> +            StructuralSubscription,
>> +            Join(BugSubscriptionFilter,
>> +                 BugSubscriptionFilter.structural_subscription_id ==
>> +                 StructuralSubscription.id),
>> +            LeftJoin(BugSubscriptionFilterStatus,
>> +                     BugSubscriptionFilterStatus.filter_id ==
>> +                     BugSubscriptionFilter.id),
>> +            LeftJoin(BugSubscriptionFilterImportance,
>> +                     BugSubscriptionFilterImportance.filter_id ==
>> +                     BugSubscriptionFilter.id),
>> +            LeftJoin(BugSubscriptionFilterTag,
>> +                     BugSubscriptionFilterTag.filter_id ==
>> +                     BugSubscriptionFilter.id)]}
>> +    # The "conditions" list will eventually be passed to a Storm
>> +    # "And" function, and then become the WHERE clause of our SELECT.
>> +    conditions = [In(StructuralSubscription.id, candidates)]
>> +    # Handling notification level is trivial, so we include that first.
>> +    if level is not None:
>> +        conditions.append(
>> +            BugSubscriptionFilter.bug_notification_level >= level)
>> +    # Now we handle importance and status, which are per bugtask.
>> +    # What we do is loop through the collection of bugtask, target
>> +    # in query_arguments.  Each bugtask will have one or more
>> +    # targets that we have to check.  We figure out how to describe each
>> +    # target using the useful IStructuralSubscriptionTargetHelper
>> +    # adapter, which has a "join" attribute on it that tells us how
>> +    # to distinguish that target.  Once we have all of the target
>> +    # descriptins, we OR those together, and say that the filters
>> +    # for those targets must either have no importance or match the
>> +    # associated bugtask's importance; and have no status or match
>> +    # the bugtask's status.  Once we have looked at all of the
>> +    # bugtasks, we OR all of those per-bugtask target comparisons
>> +    # together, and we are done with the status and importance.
>> +    # The "outer_or_conditions" list holds the full clauses for each
>> +    # bugtask.
>> +    outer_or_conditions = []
>> +    # The "or_target_conditions" list holds the clauses for each target,
>> +    # and is reset for each new bugtask.
>> +    or_target_conditions = []
>> +
>> +    def handle_bugtask_conditions(bugtask):
>> +        """Helper function for building status and importance clauses.
>> +
>> +        Call with the previous bugtask when the bugtask changes in
>> +        the iteration of query_arguments, and call with the last
>> +        bugtask when the iteration is complete."""
>> +        if or_target_conditions:
>> +            outer_or_conditions.append(
>> +                And(Or(*or_target_conditions),
>> +                    Or(BugSubscriptionFilterImportance.importance ==
>> +                       bugtask.importance,
>> +                       BugSubscriptionFilterImportance.importance == None),
>> +                    Or(BugSubscriptionFilterStatus.status == bugtask.status,
>> +                       BugSubscriptionFilterStatus.status == None)))
>> +            del or_target_conditions[:]
>> +    last_bugtask = None
>> +    for bugtask, target in query_arguments:
>> +        if last_bugtask is not bugtask:
>> +            handle_bugtask_conditions(last_bugtask)
>> +        last_bugtask = bugtask
>> +        or_target_conditions.append(
>> +            IStructuralSubscriptionTargetHelper(target).join)
>> +    # We know there will be at least one bugtask, because we already
>> +    # escaped early "if len(query_arguments) == 0".
>> +    handle_bugtask_conditions(bugtask)
>> +    conditions.append(Or(*outer_or_conditions))
>> +    # Now we handle tags.  If the bug has no tags, this is
>> +    # relatively easy. Otherwise, not so much.
>> +    tags = list(bug.tags) # This subtly removes the security proxy on
>> +    # the list.  Strings are never security-proxied, so we don't have
>> +    # to worry about them.
>> +    if len(tags) == 0:
>> +        # The bug has no tags.  We should leave out filters that
>> +        # require any generic non-empty set of tags
>> +        # (BugSubscriptionFilter.include_any_tags), which we do with
>> +        # the conditions.  Then we can finish up the WHERE clause.
>> +        # Then we have to make sure that the filter does not require
>> +        # any *specific* tags. We do that with a GROUP BY on the
>> +        # filters, and then a HAVING clause that aggregates the
>> +        # BugSubscriptionFilterTags that are set to "include" the
>> +        # tag.  (If it is not an include, that is an exclude, and a
>> +        # bug without tags will not have a particular tag, so we can
>> +        # ignore those in this case.)  This requires a CASE
>> +        # statement within the COUNT.  After this, we are done, and
>> +        # we return the fully formed SELECT query object.
>> +        conditions.append(Not(BugSubscriptionFilter.include_any_tags))
>> +        select_args['where'] = And(*conditions)
>> +        select_args['group_by'] = (BugSubscriptionFilter.id,)
>> +        select_args['having'] = Count(
>> +            SQL('CASE WHEN BugSubscriptionFilterTag.include '
>> +                'THEN BugSubscriptionFilterTag.tag END'))==0
>> +        return candidates, Select(BugSubscriptionFilter.id, **select_args)
>> +    else:
>> +        # The bug has some tags.  This will require a bit of fancy
>> +        # footwork. First, though, we will simply want to leave out
>> +        # filters that should only match bugs without tags.
>> +        conditions.append(Not(BugSubscriptionFilter.exclude_any_tags))
>> +        # We're going to have to do a union with another query.  One
>> +        # query will handle filters that are marked to include *any*
>> +        # of the filter's selected tags, and the other query will
>> +        # handle filters that include *all* of the filter's selected
>> +        # tags (as determined by BugSubscriptionFilter.find_all_tags).
>> +        # Every aspect of the unioned queries' WHERE clauses *other
>> +        # than tags* will need to be the same. We could try making a
>> +        # temporary table for the shared results, but that would
>> +        # involve another separate Postgres call, and I think that
>> +        # we've already gotten the big win by separating out the
>> +        # structural subscriptions into "candidates," above.
>> +        #
>> +        # So, up to now we've been assembling the things that are shared
>> +        # between the two queries, but now we start working on the
>> +        # differences between the two unioned queries. "first_select"
>> +        # will hold one set of arguments, and select_args will hold the
>> +        # other.
>> +        first_select = select_args.copy()
>> +        # As mentioned, in this first SELECT we handle filters that
>> +        # match any of the filter's tags.  This can be a relatively
>> +        # straightforward query--we just need a bit more added to
>> +        # our WHERE clause, and we don't need a GROUP BY/HAVING.
>> +        first_select['where'] = And(
>> +            Or(# We want filters that proclaim they simply want any tags.
>> +               BugSubscriptionFilter.include_any_tags,
>> +               # Also include filters that match any tag...
>> +               And(Not(BugSubscriptionFilter.find_all_tags),
>> +                   Or(# ...with a positive match...
>> +                      And(BugSubscriptionFilterTag.include,
>> +                          In(BugSubscriptionFilterTag.tag, tags)),
>> +                      # ...or with a negative match...
>> +                      And(Not(BugSubscriptionFilterTag.include),
>> +                          Not(In(BugSubscriptionFilterTag.tag, tags))),
>> +                      # ...or if the filter does not specify any tags.
>> +                      BugSubscriptionFilterTag.tag == None))),
>> +            *conditions)
>> +        first_select = Select(BugSubscriptionFilter.id, **first_select)
>> +        # We have our first clause.  Now we start on the second one:
>> +        # handling filters that match *all* tags. Our WHERE clause
>> +        # is straightforward and, it should be clear that we are
>> +        # simply focusing on BugSubscriptionFilter.find_all_tags,
>> +        # when the first SELECT did not consider it.
>> +        select_args['where'] = And(
>> +            BugSubscriptionFilter.find_all_tags, *conditions)
>> +        # The GROUP BY collects the filters together.
>> +        select_args['group_by'] = (BugSubscriptionFilter.id,)
>> +        # Now it is time for the HAVING clause, which is where some
>> +        # tricky bits happen. We first make a SQL snippet that
>> +        # represents the tags on this bug.  It is straightforward
>> +        # except for one subtle hack: the addition of the empty
>> +        # space in the array.  This is because we are going to be
>> +        # aggregating the tags on the filters using ARRAY_AGG, which
>> +        # includes NULLs (unlike most other aggregators).  That
>> +        # is an issue here because we use CASE statements to divide
>> +        # up the set of tags that are supposed to be included and
>> +        # supposed to be excluded.  This means that if we aggregate
>> +        # "CASE WHEN BugSubscriptionFilterTag.include THEN
>> +        # BugSubscriptionFilterTag.tag END" then that array will
>> +        # include NULL.  SQL treats NULLs as unknowns that can never
>> +        # be matched, so the array of ['foo', 'bar', NULL] does not
>> +        # contain the array of ['foo', NULL] ("SELECT
>> +        # ARRAY['foo','bar',NULL]::TEXT[] @>
>> +        # ARRAY['foo',NULL]::TEXT[];" is false).  Therefore, so we
>> +        # can make the HAVING statement we want to make without
>> +        # defining a custom Postgres aggregator, we use a single
>> +        # space as, effectively, NULL.  This is safe because a
>> +        # single space is not an acceptable tag.  Again, the
>> +        # clearest alternative is defining a custom Postgres aggregator.
>> +        tags_array = "ARRAY[%s,' ']::TEXT[]" % ",".join(
>> +            quote(tag) for tag in tags)
>> +        # Now comes the HAVING clause itself.
>> +        select_args['having'] = And(
>> +            # The list of tags should be a superset of the filter tags to
>> +            # be included.
>> +            ArrayContains(
>> +                SQL(tags_array),
>> +                # This next line gives us an array of the tags that the
>> +                # filter wants to include.  Notice that it includes the
>> +                # empty string when the condition does not match, per the
>> +                # discussion above.
>> +                ArrayAgg(
>> +                   SQL("CASE WHEN BugSubscriptionFilterTag.include "
>> +                       "THEN BugSubscriptionFilterTag.tag "
>> +                       "ELSE ' '::TEXT END"))),
>> +            # The list of tags should also not intersect with the
>> +            # tags that the filter wants to exclude.
>> +            Not(
>> +                ArrayIntersects(
>> +                    SQL(tags_array),
>> +                    # This next line gives us an array of the tags
>> +                    # that the filter wants to exclude.  We do not bother
>>> +                    # with the empty string, and therefore allow NULLs
>> +                    # into the array, because in this case we are
>> +                    # determining whether the sets intersect, not if the
>> +                    # first set subsumes the second.
>> +                    ArrayAgg(
>> +                       SQL('CASE WHEN '
>> +                           'NOT BugSubscriptionFilterTag.include '
>> +                           'THEN BugSubscriptionFilterTag.tag END')))))
>> +        # Everything is ready.  Make our second SELECT statement, UNION
>> +        # it, and return it.
>> +        return candidates, Union(
>> +            first_select,
>> +            Select(
>> +                BugSubscriptionFilter.id,
>> +                **select_args))
>> 
> 
> Okay, I *think* I get this, but man is this a hard one to parse.

Yeah. :-/  I hope the comments helped a bit, at least.

> It may, in subsequent branches, be worth trying to break this into smaller bits.

It's not immediately obvious to me how to do this with the underlying approach.  Moreover, when I search the web for SQL optimization techniques, in general, readability usually seems to be lost...maybe even at a faster subjective rate than when other languages are optimized. But yes, I will try to figure out a way to come up with smaller semantic bits for a subsequent branch.  No promises.

I'm starting to wonder if we should be doing more "raw" SQL like this, as the replacement for ORM-y stuff or any other higher-level solutions.  ORMs seem more and more like poisoned candy the more I use them.  If that's a tenable practical position, it will require readable patterns for doing complex things like this.  Please let me know, here or over some other channel, if you have any thoughts, ideas or suggestions in this regard.

> It may not be a bad idea to get someone with stronger storm/sql fu to take a look at just this one function.

I think Danilo's review earlier today might be sufficient in that regard, at least to get this first cut in the tree (which is important because it is blocking some of my squad's work).  The tests pass, and the SQL has been reviewed by Danilo, the SQL is demonstrably significantly shorter than what we had before, and the proof will be in the pudding  when we see how this performs on staging (which is where we see the performance problem right now, because there every structural subscription has at least one filter, as described by the comment where you requested I make a bug and an XXX.

Thank you

Gary

Revision history for this message

j.c.sackett (jcsackett) wrote on 2011-03-04:

Re: the three vs four, I'm fine leaving it as is, I was just confused. A quick comment wouldn't suck, but I'm not a stickler.

Re: breaking the huge function into little bits. I think you're probably right about it not being doable. I might agree with you on the ORM vs raw SQL design patterns, but let's take that discussion out of this medium and to irc or email.

For everything else, thanks for agreeing to the changes and thanks for this impressive branch.

review: Approve

Revision history for this message

Gary Poster (gary) wrote on 2011-03-04:

Thank you for the review!

bug 728818