Launchpad itself

Merge lp://staging/~sinzui/launchpad/fast-prf into lp://staging/launchpad

fast-prf
Merge into devel

Proposed by Curtis Hovey on 2012-11-21

Status:

Merged

Approved by:

Curtis Hovey on 2012-11-22

Approved revision:

no longer in the source branch.

Merged at revision:

16302

Proposed branch:

lp://staging/~sinzui/launchpad/fast-prf

Merge into:

lp://staging/launchpad

Diff against target:

381 lines (+123/-106)

3 files modified

lib/lp/registry/scripts/productreleasefinder/finder.py (+79/-78)
lib/lp/registry/tests/test_prf_finder.py (+40/-25)
lib/lp/testing/factory.py (+4/-3)

To merge this branch:

bzr merge lp://staging/~sinzui/launchpad/fast-prf

Critical

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
William Grant (community)	code	2012-11-21	Approve on 2012-11-22
Review via email: mp+135451@code.staging.launchpad.net

Commit message

Do fewer db queries to make the product release finder fast.

Description of the change

The product release finder script works with strings. It looks up
projects, series, releases, etc that match the string. The script
attempts to minimise the number of objects it works with, but the
approach actually makes more queries than are necessary because the PRF
will always walk a directory. While each check of a matched file is
economical, the loop over all files found for a release file glob will
lookup the project, series, milestone and release multiple times to
discover that files was already uploaded.

--------------------------------------------------------------------

RULES

    Pre-implementation: no one
    * The PRF creates about 1 release per run, so we can expect it
      to do a handful of extra db calls while creating the needed objects
      for one product release each run, between 3 and 5 extra queries.
      We want:
      * 1 query to start the PRF
      * 1 query per project to get the needed information to check if a
        found release needs to be uploaded.
      * 3-5 queries to create the upload
    * Refactor ProductReleaseFinder.getFilters to get all the data in
      a single query. the PRF wants strings, so there is no need to
      populate the Storm object cache.
    * Update handleProduct to get a set of all the known release files
      and pass it to the methods that need it so that checking for
      existing files is cheap and fast.

    ADDENDUM
    * The file pointer is not explicit closed in handleRelease(). This
      can be a problem since addReleaseTarball() can raise an exception,
      which will circumvent the rule to automatically close the file
      pointer when the identifier leaves scope.
      * This may have contributed to performance issues in the past.

    * This is untestable on qastaging, but a dev instance can be used
      to crawl an example glob from production.
    * Wow! the dev instance is much faster. I tested using the Rhythmbox
      release file glob. Pulling down the tarballs was about twice
      as fast. The second run which had no work to do was about 8 times
      faster.

LINT

    lib/lp/registry/scripts/productreleasefinder/finder.py
    lib/lp/registry/tests/test_prf_finder.py
    lib/lp/testing/factory.py

LoC:

I have about 3,000 lines of credit this week.

TEST

./bin/test -vv lp.registry.tests.test_prf_finder

IMPLEMENTATION

I refactored getFilters() to query just for the needed strings. This
reduced the startup time and does not require the storm object cache to
be populated. I then added getReleaseFileNames() to get all the known
file names for the project. Again this is string data so the storm
object cache is not involved. I updated handleProduct() to call
getReleaseFileNames(), then pass it to handleRelease(). The signature
change was an invasive change to several methods and tests :/. I placed
the file pointer code into a try-finally block to ensure it is closed if
an exception is (and has been) raised.
lib/lp/registry/scripts/productreleasefinder/finder.py
lib/lp/registry/tests/test_prf_finder.py

I updated the factory to allow me to create distict release file names,
which Lp requires.
lib/lp/testing/factory.py

Revision history for this message

William Grant (wgrant) wrote on 2012-11-22:

Have you considered using the transaction decorators in lp.services.database? read_transaction, write_transaction, and retry_transaction can all be quite handy for replacing manual transaction management.

65 + for product_name, series_name, glob in found_globs:
66 + if last_product and last_product != product_name:

The rows won't necessarily be returned in product order, so this check doesn't make much sense.

114 + def hasReleaseFile(self, filename, file_names):

The new version of this method is the dictionary definition of "not worth it". I'd inline it.

101 + found_names = IStore(Product).find(
102 + (LibraryFileAlias.filename),

Those parens are redundant.

110 + file_names = set(name for name in found_names)

That's just "file_names = set(found_names)"

review: Approve (code)