Merge lp://staging/~sinzui/launchpad/fast-prf into lp://staging/launchpad
Status: | Merged | ||||
---|---|---|---|---|---|
Approved by: | Curtis Hovey | ||||
Approved revision: | no longer in the source branch. | ||||
Merged at revision: | 16302 | ||||
Proposed branch: | lp://staging/~sinzui/launchpad/fast-prf | ||||
Merge into: | lp://staging/launchpad | ||||
Diff against target: |
381 lines (+123/-106) 3 files modified
lib/lp/registry/scripts/productreleasefinder/finder.py (+79/-78) lib/lp/registry/tests/test_prf_finder.py (+40/-25) lib/lp/testing/factory.py (+4/-3) |
||||
To merge this branch: | bzr merge lp://staging/~sinzui/launchpad/fast-prf | ||||
Related bugs: |
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
William Grant (community) | code | Approve | |
Review via email: mp+135451@code.staging.launchpad.net |
Commit message
Do fewer db queries to make the product release finder fast.
Description of the change
The product release finder script works with strings. It looks up
projects, series, releases, etc that match the string. The script
attempts to minimise the number of objects it works with, but the
approach actually makes more queries than are necessary because the PRF
will always walk a directory. While each check of a matched file is
economical, the loop over all files found for a release file glob will
lookup the project, series, milestone and release multiple times to
discover that files was already uploaded.
-------
RULES
Pre-
* The PRF creates about 1 release per run, so we can expect it
to do a handful of extra db calls while creating the needed objects
for one product release each run, between 3 and 5 extra queries.
We want:
* 1 query to start the PRF
* 1 query per project to get the needed information to check if a
found release needs to be uploaded.
* 3-5 queries to create the upload
* Refactor ProductReleaseF
a single query. the PRF wants strings, so there is no need to
populate the Storm object cache.
* Update handleProduct to get a set of all the known release files
and pass it to the methods that need it so that checking for
existing files is cheap and fast.
ADDENDUM
* The file pointer is not explicit closed in handleRelease(). This
can be a problem since addReleaseTarball() can raise an exception,
which will circumvent the rule to automatically close the file
pointer when the identifier leaves scope.
* This may have contributed to performance issues in the past.
QA
* This is untestable on qastaging, but a dev instance can be used
to crawl an example glob from production.
* Wow! the dev instance is much faster. I tested using the Rhythmbox
release file glob. Pulling down the tarballs was about twice
as fast. The second run which had no work to do was about 8 times
faster.
LINT
lib/
lib/
lib/
LoC:
I have about 3,000 lines of credit this week.
TEST
./bin/test -vv lp.registry.
IMPLEMENTATION
I refactored getFilters() to query just for the needed strings. This
reduced the startup time and does not require the storm object cache to
be populated. I then added getReleaseFileN
file names for the project. Again this is string data so the storm
object cache is not involved. I updated handleProduct() to call
getReleaseFileN
change was an invasive change to several methods and tests :/. I placed
the file pointer code into a try-finally block to ensure it is closed if
an exception is (and has been) raised.
lib/
lib/
I updated the factory to allow me to create distict release file names,
which Lp requires.
lib/
Have you considered using the transaction decorators in lp.services. database? read_transaction, write_transaction, and retry_transaction can all be quite handy for replacing manual transaction management.
65 + for product_name, series_name, glob in found_globs:
66 + if last_product and last_product != product_name:
The rows won't necessarily be returned in product order, so this check doesn't make much sense.
114 + def hasReleaseFile( self, filename, file_names):
The new version of this method is the dictionary definition of "not worth it". I'd inline it.
101 + found_names = IStore( Product) .find( as.filename) ,
102 + (LibraryFileAli
Those parens are redundant.
110 + file_names = set(name for name in found_names)
That's just "file_names = set(found_names)"