Merge lp://staging/~gmb/launchpad/stored-proc-for-bug-heat-bug-582195 into lp://staging/launchpad/db-devel

Proposed by Graham Binns
Status: Merged
Merged at revision: 9414
Proposed branch: lp://staging/~gmb/launchpad/stored-proc-for-bug-heat-bug-582195
Merge into: lp://staging/launchpad/db-devel
Diff against target: 387 lines (+211/-33)
6 files modified
database/schema/security.cfg (+1/-0)
database/schema/trusted.sql (+115/-0)
lib/lp/bugs/configure.zcml (+4/-2)
lib/lp/bugs/doc/bug-heat.txt (+66/-20)
lib/lp/bugs/interfaces/bug.py (+3/-0)
lib/lp/bugs/model/bug.py (+22/-11)
To merge this branch: bzr merge lp://staging/~gmb/launchpad/stored-proc-for-bug-heat-bug-582195
Reviewer Review Type Date Requested Status
Curtis Hovey (community) rc Approve
Björn Tillenius (community) db Approve
Stuart Bishop (community) db Approve
Canonical Launchpad Engineering code Pending
Review via email: mp+26064@code.staging.launchpad.net

This proposal supersedes a proposal from 2010-05-26.

Commit message

Add a stored procedure to update bug heat.

Description of the change

This branch takes the code in lib/lp/bugs/scripts/bugheat.py and converts it into a plpython stored procedure.

The idea behind this is that we need to do bug heat calculations in the DB rather than in code, since the setup and teardown surrounding doing the calculations in code is far to slow and cumbersome.

I've added an updateHeat() method to IBug, which calls the new stored procedure. I've used this in setPrivate() and setSecurity() to replace the manual adding-and-subtracting of bug heat that we currently do at those callsites.

In psql:

  SELECT calculate_bug_heat(1);

In iharness:

  >>> from lp.bugs.scripts.bugheat import BugHeatCalculator
  >>> bug_1 = getUtility(IBugSet).get(1)
  >>> calculator = BugHeatCalculator(bug_1)
  >>> calculator.getBugHeat()

To post a comment you must log in.
Revision history for this message
Björn Tillenius (bjornt) wrote : Posted in a previous version of this proposal

On Wed, May 26, 2010 at 10:50:49AM -0000, Graham Binns wrote:
> The stored procedure is currently not hooked up to anything, but once
> the patch is applied it's easy to check the results of the stored
> procedure against the existing code using psql and an iharness session,
> thus:

Why not hook it up directly? It would be useful to have this tested,
since it's hard to check that it's doing what it should be doing.

--
Björn Tillenius | https://launchpad.net/~bjornt

Revision history for this message
Graham Binns (gmb) wrote : Posted in a previous version of this proposal

2010/5/26 Björn Tillenius <email address hidden>:
> On Wed, May 26, 2010 at 10:50:49AM -0000, Graham Binns wrote:
>> The stored procedure is currently not hooked up to anything, but once
>> the patch is applied it's easy to check the results of the stored
>> procedure against the existing code using psql and an iharness session,
>> thus:
>
> Why not hook it up directly? It would be useful to have this tested,
> since it's hard to check that it's doing what it should be doing.
>

I was going to do it in a subsequent branch, but since it's now late
in stub's day anyway I'll do it in this one and resubmit.

--
Graham Binns | PGP Key: EC66FA7D

Revision history for this message
Graham Binns (gmb) wrote :

Resubmitted. I've created an updateHeat() method on IBug which calls the stored procedure to update the bug's heat. I've hooked that up to all the sites which currently updated heat immediately (i.e. Bug.setPrivate() and setSecurity()). I'll start another branch to completely replace all the uses of CalculateBugHeatJob since that's likely to be several hundred lines long and would almost definitely exceed the limit were I to do it here.

Revision history for this message
Stuart Bishop (stub) wrote :
Download full text (7.2 KiB)

On Wed, May 26, 2010 at 10:26 PM, Graham Binns <email address hidden> wrote:

> === added file 'database/schema/patch-2207-99-0.sql'

There is no need for a database patch - the stored procedure should be
added to trusted.sql (which is becoming unwieldy - I think I need to
split it up).

> +CREATE OR REPLACE FUNCTION calculate_bug_heat(bug_id integer) RETURNS integer
> +LANGUAGE plpythonu AS $$
> +    from datetime import datetime
> +
> +    class BugHeatConstants:
> +        PRIVACY = 150
> +        SECURITY = 250
> +        DUPLICATE = 6
> +        AFFECTED_USER = 4
> +        SUBSCRIBER = 2
> +
> +
> +    def get_max_heat_for_bug(bug_id):
> +        bug_tasks = plpy.execute(
> +            "SELECT * FROM BugTask WHERE bug = %s" % bug_id)
> +
> +        max_heats = []
> +        for bug_task in bug_tasks:
> +            if bug_task['product'] is not None:
> +                product = plpy.execute(
> +                    "SELECT max_bug_heat FROM Product WHERE id = %s" %
> +                    bug_task['product'])[0]
> +                max_heats.append(product['max_bug_heat'])
> +            elif bug_task['distribution']:
> +                distribution = plpy.execute(
> +                    "SELECT max_bug_heat FROM Distribution WHERE id = %s" %
> +                    bug_task['distribution'])[0]
> +                max_heats.append(distribution['max_bug_heat'])
> +            elif bug_task['productseries'] is not None:
> +                product_series = plpy.execute("""
> +                    SELECT Product.max_bug_heat
> +                      FROM ProductSeries, Product
> +                     WHERE ProductSeries.Product = Product.id
> +                       AND ProductSeries.id = %s"""%
> +                    bug_task['productseries'])[0]
> +                max_heats.append(product_series['max_bug_heat'])
> +            elif bug_task['distroseries']:
> +                distro_series = plpy.execute("""
> +                    SELECT Distribution.max_bug_heat
> +                      FROM DistroSeries, Distribution
> +                     WHERE DistroSeries.Distribution = Distribution.id
> +                       AND DistroSeries.id = %s"""%
> +                    bug_task['distroseries'])[0]
> +                max_heats.append(distro_series['max_bug_heat'])
> +            else:
> +                pass
> +
> +        return max(max_heats)

You can do this in a single query rather than one per task:

SELECT MAX(GREATEST(Product.max_bug_heat, Distribution.max_bug_heat))
FROM BugTask
LEFT OUTER JOIN ProductSeries ON BugTask.productseries = ProductSeries.id
LEFT OUTER JOIN Product ON (
    BugTask.product = Product.id
    OR ProductSeries.product = Product.id)
LEFT OUTER JOIN DistroSeries ON BugTask.distroseries = DistroSeries.id
LEFT OUTER JOIN Distribution ON (
    BugTask.distribution = Distribution.id
    OR DistroSeries.distribution = Distribution.id)
WHERE
    BugTask.bug = 1

> +    # It would be nice to be able to just SELECT * here, but we need to
> +    # format the timestamps so that datetime.strptime() won't choke on
> +    # them.

Using epoch time would avoid the need for parsing strings (use
'EXTRACT(epoch FROM datecreated)' i...

Read more...

Revision history for this message
Stuart Bishop (stub) wrote :
Download full text (7.2 KiB)

On Wed, May 26, 2010 at 10:26 PM, Graham Binns <email address hidden> wrote:

> === added file 'database/schema/patch-2207-99-0.sql'

There is no need for a database patch - the stored procedure should be added to trusted.sql (which is becoming unwieldy - I think I need to split it up).

> +CREATE OR REPLACE FUNCTION calculate_bug_heat(bug_id integer) RETURNS integer
> +LANGUAGE plpythonu AS $$
> + from datetime import datetime
> +
> + class BugHeatConstants:
> + PRIVACY = 150
> + SECURITY = 250
> + DUPLICATE = 6
> + AFFECTED_USER = 4
> + SUBSCRIBER = 2
> +
> +
> + def get_max_heat_for_bug(bug_id):
> + bug_tasks = plpy.execute(
> + "SELECT * FROM BugTask WHERE bug = %s" % bug_id)
> +
> + max_heats = []
> + for bug_task in bug_tasks:
> + if bug_task['product'] is not None:
> + product = plpy.execute(
> + "SELECT max_bug_heat FROM Product WHERE id = %s" %
> + bug_task['product'])[0]
> + max_heats.append(product['max_bug_heat'])
> + elif bug_task['distribution']:
> + distribution = plpy.execute(
> + "SELECT max_bug_heat FROM Distribution WHERE id = %s" %
> + bug_task['distribution'])[0]
> + max_heats.append(distribution['max_bug_heat'])
> + elif bug_task['productseries'] is not None:
> + product_series = plpy.execute("""
> + SELECT Product.max_bug_heat
> + FROM ProductSeries, Product
> + WHERE ProductSeries.Product = Product.id
> + AND ProductSeries.id = %s"""%
> + bug_task['productseries'])[0]
> + max_heats.append(product_series['max_bug_heat'])
> + elif bug_task['distroseries']:
> + distro_series = plpy.execute("""
> + SELECT Distribution.max_bug_heat
> + FROM DistroSeries, Distribution
> + WHERE DistroSeries.Distribution = Distribution.id
> + AND DistroSeries.id = %s"""%
> + bug_task['distroseries'])[0]
> + max_heats.append(distro_series['max_bug_heat'])
> + else:
> + pass
> +
> + return max(max_heats)

You can do this in a single query rather than one per task:

SELECT MAX(GREATEST(Product.max_bug_heat, Distribution.max_bug_heat))
FROM BugTask
LEFT OUTER JOIN ProductSeries ON BugTask.productseries = ProductSeries.id
LEFT OUTER JOIN Product ON (
    BugTask.product = Product.id
    OR ProductSeries.product = Product.id)
LEFT OUTER JOIN DistroSeries ON BugTask.distroseries = DistroSeries.id
LEFT OUTER JOIN Distribution ON (
    BugTask.distribution = Distribution.id
    OR DistroSeries.distribution = Distribution.id)
WHERE
    BugTask.bug = 1

> + # It would be nice to be able to just SELECT * here, but we need to
> + # format the timestamps so that datetime.strptime() won't choke on
> + # them.

Using epoch time would avoid the need for parsing strings (use 'EXTRACT(epoch FROM datecreated)' i...

Read more...

review: Needs Fixing (db)
Revision history for this message
Graham Binns (gmb) wrote :
Download full text (6.9 KiB)

On Thu, May 27, 2010 at 05:13:27AM -0000, Stuart Bishop wrote:
> Review: Needs Fixing db
> On Wed, May 26, 2010 at 10:26 PM, Graham Binns <email address hidden> wrote:
>
> > === added file 'database/schema/patch-2207-99-0.sql'
>
> There is no need for a database patch - the stored procedure should be added to trusted.sql (which is becoming unwieldy - I think I need to split it up).

Righto, I've moved it.

>[snip]
>
> You can do this in a single query rather than one per task:
>
> SELECT MAX(GREATEST(Product.max_bug_heat, Distribution.max_bug_heat))
> FROM BugTask
> LEFT OUTER JOIN ProductSeries ON BugTask.productseries = ProductSeries.id
> LEFT OUTER JOIN Product ON (
> BugTask.product = Product.id
> OR ProductSeries.product = Product.id)
> LEFT OUTER JOIN DistroSeries ON BugTask.distroseries = DistroSeries.id
> LEFT OUTER JOIN Distribution ON (
> BugTask.distribution = Distribution.id
> OR DistroSeries.distribution = Distribution.id)
> WHERE
> BugTask.bug = 1
>

Sweet; fixed.

> > + # It would be nice to be able to just SELECT * here, but we need to
> > + # format the timestamps so that datetime.strptime() won't choke on
> > + # them.
>
> Using epoch time would avoid the need for parsing strings (use
> 'EXTRACT(epoch FROM datecreated)' instead of 'TO_CHAR(...)'). It
> probably doesn't save any significant time though.

I've used epoch time instead (see below).

>
> > + if bug_data.nrows() == 0:
> > + return 0
>
> I'd return None or raise an exception here - the bug doesn't exist so
> the heat is undefined.

I've gone for raising an exception; is it worth defining an exception
subclass to use here or is Exception() okay?

> > + bug = bug_data[0]
> > + if bug['duplicateof'] is not None:
> > + return 0
>
> Again, None or the heat of the master bug. We should catch cases where
> we are attempting to calculate the bug heat of non-existent or
> duplicate bugs.
>

Okay. I'll update Bug.updateHeat() so that it doesn't try calling this
function if the bug is a duplicate and have this return None.

> > + if bug['private']:
> > + heat['privacy'] = BugHeatConstants.PRIVACY
> > + if bug['security_related']:
> > + heat['security'] = BugHeatConstants.SECURITY
> > +
> > + # Get the heat from subscribers.
> > + sub_count = plpy.execute("""
> > + SELECT bug, COUNT(id) AS sub_count
> > + FROM BugSubscription
> > + WHERE bug = %s
> > + GROUP BY bug
> > + """ % bug_id)
>
> This is unnecessarily complex. The simpler query is just "SELECT COUNT(*) FROM BugSubscription WHERE bug=%s".
>

Right; this is because I ripped it off from the query we were using to
update the bugs' heat manually :). Fixed now.

> > + if sub_count.nrows() > 0:
> > + heat['subscribers'] = (
> > + BugHeatConstants.SUBSCRIBER * sub_count[0]['sub_count'])
>
> You don't need the nrows() guard, as the query will always return one row.

Ok.

> > + # Get the heat from subscribers via duplicates.
> > + subs_from_dupes = plpy.execute("""
> > + SELECT bug.duplicateof AS dupe_of,
> > + COUNT(bugsubscription.id) AS...

Read more...

Revision history for this message
Stuart Bishop (stub) wrote :

On Thu, May 27, 2010 at 6:42 PM, Graham Binns <email address hidden> wrote:

> I've gone for raising an exception; is it worth defining an exception
> subclass to use here or is Exception() okay?

Exception() is fine. Nothing outside this function would know about
your custom exception anyway.

>> You sure about that? Storm should be applying updates in order, so
>> when calculate_bug_heat is actually executed (at commit time or when
>> heat is next accessed) changes will be implicitly flushed.
>
> Hmm. I put the flush() in there because in testing it seemed like heat
> lagged behind where it should be. If I remove this, I get this result:

> So I think this is definitely necessary (or maybe it's just necessary
> for testing; I don't know. What are your recommendations?)

Your empirical results beat my fantasy, so leave it in.

Please change 'LANGUAGE plpythonu' to 'LANGUAGE plpythonu STABLE
RETURNS NULL ON NULL INPUT'

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Revision history for this message
Stuart Bishop (stub) :
review: Approve (db)
Revision history for this message
Graham Binns (gmb) wrote :

On 27 May 2010 15:33, Stuart Bishop <email address hidden> wrote:
> On Thu, May 27, 2010 at 6:42 PM, Graham Binns <email address hidden> wrote:
>
>> I've gone for raising an exception; is it worth defining an exception
>> subclass to use here or is Exception() okay?
>
> Exception() is fine. Nothing outside this function would know about
> your custom exception anyway.

Cool.

>>> You sure about that? Storm should be applying updates in order, so
>>> when calculate_bug_heat is actually executed (at commit time or when
>>> heat is next accessed) changes will be implicitly flushed.
>>
>> Hmm. I put the flush() in there because in testing it seemed like heat
>> lagged behind where it should be. If I remove this, I get this result:
>
>> So I think this is definitely necessary (or maybe it's just necessary
>> for testing; I don't know. What are your recommendations?)
>
> Your empirical results beat my fantasy, so leave it in.

Fair enough.

> Please change 'LANGUAGE plpythonu' to 'LANGUAGE plpythonu STABLE
> RETURNS NULL ON NULL INPUT'

Done.

Revision history for this message
Björn Tillenius (bjornt) :
review: Approve (db)
Revision history for this message
Curtis Hovey (sinzui) wrote :

Thanks for asking for the RC before PQM closes. I do not want to delay QA of this.

review: Approve (rc)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: