Code review comment for lp://staging/~gmb/launchpad/stored-proc-for-bug-heat-bug-582195

Revision history for this message
Graham Binns (gmb) wrote :

On Thu, May 27, 2010 at 05:13:27AM -0000, Stuart Bishop wrote:
> Review: Needs Fixing db
> On Wed, May 26, 2010 at 10:26 PM, Graham Binns <email address hidden> wrote:
>
> > === added file 'database/schema/patch-2207-99-0.sql'
>
> There is no need for a database patch - the stored procedure should be added to trusted.sql (which is becoming unwieldy - I think I need to split it up).

Righto, I've moved it.

>[snip]
>
> You can do this in a single query rather than one per task:
>
> SELECT MAX(GREATEST(Product.max_bug_heat, Distribution.max_bug_heat))
> FROM BugTask
> LEFT OUTER JOIN ProductSeries ON BugTask.productseries = ProductSeries.id
> LEFT OUTER JOIN Product ON (
> BugTask.product = Product.id
> OR ProductSeries.product = Product.id)
> LEFT OUTER JOIN DistroSeries ON BugTask.distroseries = DistroSeries.id
> LEFT OUTER JOIN Distribution ON (
> BugTask.distribution = Distribution.id
> OR DistroSeries.distribution = Distribution.id)
> WHERE
> BugTask.bug = 1
>

Sweet; fixed.

> > + # It would be nice to be able to just SELECT * here, but we need to
> > + # format the timestamps so that datetime.strptime() won't choke on
> > + # them.
>
> Using epoch time would avoid the need for parsing strings (use
> 'EXTRACT(epoch FROM datecreated)' instead of 'TO_CHAR(...)'). It
> probably doesn't save any significant time though.

I've used epoch time instead (see below).

>
> > + if bug_data.nrows() == 0:
> > + return 0
>
> I'd return None or raise an exception here - the bug doesn't exist so
> the heat is undefined.

I've gone for raising an exception; is it worth defining an exception
subclass to use here or is Exception() okay?

> > + bug = bug_data[0]
> > + if bug['duplicateof'] is not None:
> > + return 0
>
> Again, None or the heat of the master bug. We should catch cases where
> we are attempting to calculate the bug heat of non-existent or
> duplicate bugs.
>

Okay. I'll update Bug.updateHeat() so that it doesn't try calling this
function if the bug is a duplicate and have this return None.

> > + if bug['private']:
> > + heat['privacy'] = BugHeatConstants.PRIVACY
> > + if bug['security_related']:
> > + heat['security'] = BugHeatConstants.SECURITY
> > +
> > + # Get the heat from subscribers.
> > + sub_count = plpy.execute("""
> > + SELECT bug, COUNT(id) AS sub_count
> > + FROM BugSubscription
> > + WHERE bug = %s
> > + GROUP BY bug
> > + """ % bug_id)
>
> This is unnecessarily complex. The simpler query is just "SELECT COUNT(*) FROM BugSubscription WHERE bug=%s".
>

Right; this is because I ripped it off from the query we were using to
update the bugs' heat manually :). Fixed now.

> > + if sub_count.nrows() > 0:
> > + heat['subscribers'] = (
> > + BugHeatConstants.SUBSCRIBER * sub_count[0]['sub_count'])
>
> You don't need the nrows() guard, as the query will always return one row.

Ok.

> > + # Get the heat from subscribers via duplicates.
> > + subs_from_dupes = plpy.execute("""
> > + SELECT bug.duplicateof AS dupe_of,
> > + COUNT(bugsubscription.id) AS dupe_sub_count
> > + FROM bugsubscription, bug
> > + WHERE bug.duplicateof = %s
> > + AND Bugsubscription.bug = bug.id
> > + GROUP BY Bug.duplicateof""" % bug_id)
> > + if subs_from_dupes.nrows() > 0:
> > + heat['subcribers_from_dupes'] = (
> > + BugHeatConstants.SUBSCRIBER * sub_count[0]['dupe_sub_count'])
>
> Similar here. nrows() guard is unnecessary. The query is also overly complex:
>
> SELECT COUNT(*) FROM BugSubscription, Bug WHERE Bug.duplicateof = %s AND BugSubscription.bug = Bug.id
>
> I notice that if a person is subscribed to both a duplicate and the master job, they are counted twice. If this is not correct, we can collapse the above two sections into one using the query:
>
> SELECT COUNT(DISTINCT BugSubscription.person)
> FROM BugSubscription, Bug
> WHERE Bug.id = BugSubscription.bug
> AND (Bug.id = %s OR Bug.duplicateof = %s)

Thanks, I'll use this.

> If you want to expand teams to get the genuine number of people subscribed:
>
> SELECT COUNT(DISTINCT TeamParticipation.person)
> FROM TeamParticipation, BugSubscription, Bug
> WHERE
> TeamParticipation.team = BugSubscription.person
> AND Bug.id = BugSubscription.bug
> AND (Bug.id = %s OR Bug.duplicateof = %s)
>
>
> Bug #1 for instance has 313 subscribers without expanding the teams,
> or 678 if you do expand the teams.

Having talked with Deryck, we don't want to do this; it would inflate
the heat counts by far too much.

> > + # Bugs decay over time. Every day the bug isn't touched its heat
> > + # decreases by 1%.
> > + pg_datetime_fmt = "%Y-%m-%d %H:%M:%S"
> > + date_last_updated = datetime.strptime(
> > + bug['formatted_date_last_updated'], pg_datetime_fmt)
> > + days_since_last_update = (datetime.utcnow() - date_last_updated).days
> > + total_heat = int(total_heat * (0.99 ** days_since_last_update))
>
> Yer... this would be simpler and clearer using epoch times.

Right, I've converted it to use epoch time and fromtimestamp().

> > +INSERT INTO LaunchpadDatabaseRevision VALUES (2207, 99, 0);
>
> As before, this should go into trusted.sql rather than a DB patch. We get to treat stored procedures as any other code, unlike database patches which cannot be reverted once applied.

Right.

> > -Sometimes, when a bug changes, we want to see the changes reflected in the bug's
> > -heat value immidiately, without waiting for heat to be recalculated. Currently
> > -we adjust heat immidiately for bug privacy and security.
>
> immediately, not immidiately.

I've already fixed that (you're reading the removal part of the diff
;)).

> > + def updateHeat(self):
> > + """See `IBug`."""
> > + # We need to flush the store first to ensure that changes are
> > + # reflected in the new bug heat total.
> > + store = Store.of(self)
> > + store.flush()
>
> You sure about that? Storm should be applying updates in order, so
> when calculate_bug_heat is actually executed (at commit time or when
> heat is next accessed) changes will be implicitly flushed.

Hmm. I put the flush() in there because in testing it seemed like heat
lagged behind where it should be. If I remove this, I get this result:

    >>> bug = factory.makeBug()
    >>> bug.heat
    8

    >>> changed = bug.setPrivate(True, bug.owner)
    >>> bug.heat
    8

    >>> changed = bug.setSecurityRelated(True, bug.owner)
    >>> bug.heat
    158

Where what I should be seeing is

    >>> bug = factory.makeBug()
    >>> bug.heat
    8

    >>> changed = bug.setPrivate(True, bug.owner)
    >>> bug.heat
    158

    >>> changed = bug.setSecurityRelated(True, bug.owner)
    >>> bug.heat
    408

So I think this is definitely necessary (or maybe it's just necessary
for testing; I don't know. What are your recommendations?)

« Back to merge proposal