Merge lp://staging/~rvb/maas/nonce-bug-1190986 into lp://staging/~maas-committers/maas/trunk

Proposed by Raphaël Badin
Status: Merged
Approved by: Raphaël Badin
Approved revision: no longer in the source branch.
Merged at revision: 1697
Proposed branch: lp://staging/~rvb/maas/nonce-bug-1190986
Merge into: lp://staging/~maas-committers/maas/trunk
Diff against target: 222 lines (+213/-0)
2 files modified
src/maasserver/nonces_cleanup.py (+94/-0)
src/maasserver/tests/test_nonces_cleanup.py (+119/-0)
To merge this branch: bzr merge lp://staging/~rvb/maas/nonce-bug-1190986
Reviewer Review Type Date Requested Status
Gavin Panella (community) Approve
Review via email: mp+189790@code.staging.launchpad.net

Commit message

Add utility to remove old nonces.

Description of the change

Note that this is *just* the utility itself (function cleanup_old_nonces). Now this needs to be "croned" using either a cronjob or — preferably — a celery beat tasks + an API call.

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

Looks good, and easy to understand!

[1]

+    create_checkpoint_nonce()
+    # Delete old nonces.
+    checkpoint = find_checkpoint_nonce()
+    if checkpoint is None:
+        return 0
+    return delete_old_nonces(checkpoint)

Seems odd to create the new nonce before deleting; there is a very
slow race condition there.

[2]

+def delete_old_nonces(checkpoint):
+    """Delete nonces older than the given nonce."""
+    nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
+    count = nonce_to_delete.count()
+    nonce_to_delete.delete()
+    return count

Do we need the count?

[3]

+from __future__ import (
+    absolute_import,
+    print_function,
+    unicode_literals,
+    )
+
+__metaclass__ = type
+__all__ = []

Please merge trunk and update this to the newest template; there's a
'str = None' missing from there.

[4]

+        self.patch(module_time, "time").return_value = (
+            now - timestamp_threshold)

I'd hesitate from patching time.time() - it may be used by nose before
the patching machinery is invoked to repair its work. How about:

        from maasserver import nonces_cleanup

        timemod = self.patch(nonces_cleanup, "time")
        timemod.time.return_value = now - timestamp_threshold
        ...
        timemod.time.return_value = now

?

[5]

+        self.assertLessEqual(before, float(checkpoint.key.strip(key_suffix)))
+        self.assertGreaterEqual(
+            after, float(checkpoint.key.strip(key_suffix)))

strip() doesn't work like that; key_suffix will be used as
*characters* to strip. In this instance it works, but perhaps a
different mechanism ought to be used, e.g.:

        checkpoint_time = float(re.search(r'[\d.]+', checkpoint.key).group(0))

Also, with testtools you can do the above check in one go:

        self.assertThat(checkpoint_time, MatchesAll(
            MatchesAny(Equals(before), GreaterThan(before)),
            MatchesAny(Equals(after), LessThan(after))))

Though it's not really any clearer.

review: Approve
Revision history for this message
Raphaël Badin (rvb) wrote :

Thanks for the review!
>
>
> [1]
>
> +    create_checkpoint_nonce()
> +    # Delete old nonces.
> +    checkpoint = find_checkpoint_nonce()
> +    if checkpoint is None:
> +        return 0
> +    return delete_old_nonces(checkpoint)
>
> Seems odd to create the new nonce before deleting; there is a very
> slow race condition there.

Where do you see the race condition exactly?

> [2]
>
> +def delete_old_nonces(checkpoint):
> +    """Delete nonces older than the given nonce."""
> +    nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
> +    count = nonce_to_delete.count()
> +    nonce_to_delete.delete()
> +    return count
>
> Do we need the count?

Well, we could do without, but it's pretty cheap, and I think it's pretty useful to give the caller some feedback about the actual clean-up (i.e. did it really happen or not); especially considering the two-phase strategy.

> [3]
>
> +from __future__ import (
> +    absolute_import,
> +    print_function,
> +    unicode_literals,
> +    )
> +
> +__metaclass__ = type
> +__all__ = []
>
> Please merge trunk and update this to the newest template; there's a
> 'str = None' missing from there.

All right, done.

> [4]
>
> +        self.patch(module_time, "time").return_value = (
> +            now - timestamp_threshold)
>
> I'd hesitate from patching time.time() - it may be used by nose before
> the patching machinery is invoked to repair its work. How about:
>
>        from maasserver import nonces_cleanup
>
>        timemod = self.patch(nonces_cleanup, "time")
>        timemod.time.return_value = now - timestamp_threshold
>        ...
>        timemod.time.return_value = now

Indeed, it's a tad cleaner, done.

> [5]
>
> +        self.assertLessEqual(before, float(checkpoint.key.strip(key_suffix)))
> +        self.assertGreaterEqual(
> +            after, float(checkpoint.key.strip(key_suffix)))
>
> strip() doesn't work like that; key_suffix will be used as
> *characters* to strip. In this instance it works, but perhaps a
> different mechanism ought to be used, e.g.:
>
>        checkpoint_time = float(re.search(r'[\d.]+', checkpoint.key).group(0))
>
> Also, with testtools you can do the above check in one go:
>
>        self.assertThat(checkpoint_time, MatchesAll(
>            MatchesAny(Equals(before), GreaterThan(before)),
>            MatchesAny(Equals(after), LessThan(after))))
>
> Though it's not really any clearer.

I've refactored that code a bit, instead of the heavy artillery, I've used "checkpoint_time = checkpoint.key[:-len(key_suffix)]".

Revision history for this message
Gavin Panella (allenap) wrote :

> > [1]
> >
> > +    create_checkpoint_nonce()
> > +    # Delete old nonces.
> > +    checkpoint = find_checkpoint_nonce()
> > +    if checkpoint is None:
> > +        return 0
> > +    return delete_old_nonces(checkpoint)
> >
> > Seems odd to create the new nonce before deleting; there is a very
> > slow race condition there.
>
> Where do you see the race condition exactly?

In cleanup_old_nonces():

+    create_checkpoint_nonce()
+    # Delete old nonces.
+    checkpoint = find_checkpoint_nonce()

If there's a delay between create_checkpoint_nonce() and
find_checkpoint_nonce() of more than timestamp_threshold - which I
grant is *extremely* unlikely - then it'll delete the nonce just
created.

However, it doesn't matter anyway: by then it'll be time to delete the
new nonce! So, it's fine to leave it as it is.

>
> > [2]
> >
> > +def delete_old_nonces(checkpoint):
> > +    """Delete nonces older than the given nonce."""
> > +    nonce_to_delete = Nonce.objects.filter(id__lte=checkpoint.id)
> > +    count = nonce_to_delete.count()
> > +    nonce_to_delete.delete()
> > +    return count
> >
> > Do we need the count?
>
> Well, we could do without, but it's pretty cheap, and I think it's pretty
> useful to give the caller some feedback about the actual clean-up (i.e. did it
> really happen or not); especially considering the two-phase strategy.

The count is not too bad as long as indexes are being used -
http://wiki.postgresql.org/wiki/Slow_Counting - but note the caveat:

  PostgreSQL will still need to read the resulting rows to verify that
  they exist; other database systems may only need to reference the
  index in this situation.

I guess it's going to be visiting those rows again very soon to delete
them, so they'll be hot and the delete will be quicker, but that might
be a false assumption. When regularly deleting nonces it probably
won't be a significant overhead, but the first run of this code might
be punishing.

PostgreSQL does return a count of deleted rows, eliminating the need
for a separate count, which might be a way to mitigate this. I wonder
if Django surfaces that in some way?

See http://www.postgresql.org/docs/9.1/interactive/sql-delete.html.

Another thought: key__endswith probably creates a LIKE '%SUFFIX'
clause, which can be very inefficient. I suggest talking to Jeroen or
Stuart Bishop about this. Can this code be changed to use a prefix
match instead? Can you check that an index on Nonce.key exists, and
that it supports prefix (or suffix) matches?

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

> PostgreSQL does return a count of deleted rows, eliminating the need
> for a separate count, which might be a way to mitigate this. I wonder
> if Django surfaces that in some way?

Even if it doesn't, psycopg2 does. It's cursor.rowcount, right after the deletion.

.

> Another thought: key__endswith probably creates a LIKE '%SUFFIX'
> clause, which can be very inefficient. I suggest talking to Jeroen or
> Stuart Bishop about this. Can this code be changed to use a prefix
> match instead? Can you check that an index on Nonce.key exists, and
> that it supports prefix (or suffix) matches?

You can create a function index on the result of "LIKE '%SUFFIX'". But I do wonder: after the initial cleanup, how many nonces can there ever be? Maybe it's worth dealing with the initial cleanup as a separate situation, where we can cut corners and/or just accept a delay.

Revision history for this message
Raphaël Badin (rvb) wrote :

> > > [1]
> > >
> > > +    create_checkpoint_nonce()
> > > +    # Delete old nonces.
> > > +    checkpoint = find_checkpoint_nonce()
> > > +    if checkpoint is None:
> > > +        return 0
> > > +    return delete_old_nonces(checkpoint)
> > >
> > > Seems odd to create the new nonce before deleting; there is a very
> > > slow race condition there.
> >
> > Where do you see the race condition exactly?
>
> In cleanup_old_nonces():
>
> +    create_checkpoint_nonce()
> +    # Delete old nonces.
> +    checkpoint = find_checkpoint_nonce()
>
> If there's a delay between create_checkpoint_nonce() and
> find_checkpoint_nonce() of more than timestamp_threshold - which I
> grant is *extremely* unlikely - then it'll delete the nonce just
> created.
>
> However, it doesn't matter anyway: by then it'll be time to delete the
> new nonce! So, it's fine to leave it as it is.

That's what I had in mind, we can call that a race, because technically, this is a race but it has no bad consequence whatsoever. Plus it's highly unlikely.

> PostgreSQL does return a count of deleted rows, eliminating the need
> for a separate count, which might be a way to mitigate this. I wonder
> if Django surfaces that in some way?

I don't think Django exposes that. There is a still-open tick for it in Django bugtracker: https://code.djangoproject.com/ticket/16891.

I tried to see if I could find the 'rowcount' in Django's guts after the query has been performed but I can't so far…

> Another thought: key__endswith probably creates a LIKE '%SUFFIX'
> clause, which can be very inefficient. I suggest talking to Jeroen or
> Stuart Bishop about this. Can this code be changed to use a prefix
> match instead? Can you check that an index on Nonce.key exists, and
> that it supports prefix (or suffix) matches?

I changed that to using prefixes.

But I'm not really concerned about the efficiency here considering that only the *second* run can be a bit inefficient.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.