Merge lp://staging/~jtv/launchpad/db-bug-752279 into lp://staging/launchpad/db-devel

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: 10430
Proposed branch: lp://staging/~jtv/launchpad/db-bug-752279
Merge into: lp://staging/launchpad/db-devel
Diff against target: 826 lines (+281/-255)
5 files modified
cronscripts/publishing/distro-parts/ubuntu/publish-distro.d/10-sign-releases (+0/-1)
cronscripts/publishing/distro-parts/ubuntu/publish-distro.d/20-remove-uncompressed-listings (+1/-1)
cronscripts/publishing/distro-parts/ubuntu/publish-distro.d/README.txt (+2/-2)
lib/lp/archivepublisher/scripts/publish_ftpmaster.py (+181/-126)
lib/lp/archivepublisher/tests/test_publish_ftpmaster.py (+97/-125)
To merge this branch: bzr merge lp://staging/~jtv/launchpad/db-bug-752279
Reviewer Review Type Date Requested Status
William Grant (community) code* Approve
Review via email: mp+57277@code.staging.launchpad.net

Commit message

[r=wgrant][bug=752279,754453] Simplify and harden directory movements in publish-ftpmaster.

Description of the change

= Summary =

The cron.publish-ftpmaster script, which we're replacing with a python rewrite, handles "dists" directories in quite a complicated way. Trying to mimick this shuffle in the python script led to various bugs. The "happy path" (the least complicated to analyze) shuffles them around like so: http://people.canonical.com/~jtv/publish-ftparchive-dists.png

What actually goes on is not all that difficult. Two directories are perpetually maintained as copies of each other, using hard-linking rsync, with one holding the "currently published" state of a distribution and the other (the "backup" dists directory) a slightly outdated copy. The latter is effectively used as a working directory to create a new version of the current one, but by maintaining it as a reasonably complete copy, the initial rsync that populates the working directory can be kept nice and fast.

With this clearer picture in hand, I simplified the directory dance to this one: http://people.canonical.com/~jtv/simplified-publish-ftparchive-dists.png

Actually that's not entirely truthful: at one point we need the backup dists directory to be in the archive root directory, where it doesn't normally live. So we still need to move it temporarily, but with a few changes:

 * The initial rsync is done before moving anything.

 * Only very small, isolated stretches of code move dists directories to this temporary location.

 * The script recognizes and recovers dists directories in these temporary locations.

 * At most one directory is in a transient state at one time.

 * Re-using the same temporary locations for different needs reduces the recovery to one simple loop.

Lots more has changed. I introduced a --post-rsync option that copies the current dists directory to the backup copy at the end, so that they'll be identical. This will speed up the initial rsync on the next run, and so help reduce the turnaround time for security updates. The cost is more work overall, but at least some of it will be out of the critical path.

== Tests ==

This is a massively slow test, and some day we'll want to try and speed it up. Right now however it's the only test we have for an absolutely vital piece of infrastructure, so I largely ignored performance considerations.

{{{
./bin/test -vvc lp.archivepublisher.tests.test_publish_ftparchive
}}}

== Demo and Q/A ==

With this in place, we can do a new round of full Q/A on the new publish-ftpmaster script. This includes verifying that the run-parts plugin scripts all work properly (insofar that can be verified on test servers).

= Launchpad lint =

Checking for conflicts and issues in changed files.

Linting changed files:
  cronscripts/publishing/distro-parts/ubuntu/publish-distro.d/README.txt
  cronscripts/publishing/distro-parts/ubuntu/publish-distro.d/20-remove-uncompressed-listings
  lib/lp/archivepublisher/scripts/publish_ftpmaster.py
  cronscripts/publishing/distro-parts/ubuntu/publish-distro.d/10-sign-releases
  lib/lp/archivepublisher/tests/test_publish_ftpmaster.py

Jeroen

To post a comment you must log in.
Revision history for this message
William Grant (wgrant) wrote :

Looks like a great simplification. A few nitpicks:

45 +def get_backup_dists(archive_config):
46 + """Return the path of an archive's backup dists directory."""
47 + return os.path.join(archive_config.archiveroot + "-distscopy", "dists")

Is archiveroot guaranteed to not have a trailing separator?

95 + The publishable files are kept in the filesystem. Most of the work
96 + is done in a working "dists" directory in each archive's dists copy
97 + root, which then replaces the current "dists" in the archive root.

Isn't that false now? Work is done in archiveroot, not distscopyroot.

331 + For each archive, this switches the dists directory and the
332 + backup dists directory around.

Lies! It rotates them through the three dirs.

409 + def publish(self, security_only=False):
410 + """Do the main publishing work.
[...]

Could the except/recoverWorkingDists/raise be a finally instead? It also seems like rsyncBackupDists might belong in this method, as installDists is there.

513 + """Write a marker file for checking direction movements.

s/direction/directory/?

review: Approve (code*)
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Thanks! You raise some good questions here.

> 45 +def get_backup_dists(archive_config):
> 46 + """Return the path of an archive's backup dists directory."""
> 47 + return os.path.join(archive_config.archiveroot + "-distscopy",
> "dists")
>
> Is archiveroot guaranteed to not have a trailing separator?

Internally it both constructs them in a way that can't produce one, and relies on there not being one in the same way that this code does.

> 95 + The publishable files are kept in the filesystem. Most of the
> work
> 96 + is done in a working "dists" directory in each archive's dists
> copy
> 97 + root, which then replaces the current "dists" in the archive
> root.
>
> Isn't that false now? Work is done in archiveroot, not distscopyroot.

You are correct. This text predated that change.

> 331 + For each archive, this switches the dists directory and the
> 332 + backup dists directory around.
>
> Lies! It rotates them through the three dirs.

That is implementation, and therefore suitable for a comment. This, however, is docstring and switching the two around is what the method does for the caller.

> 409 + def publish(self, security_only=False):
> 410 + """Do the main publishing work.
> [...]
>
> Could the except/recoverWorkingDists/raise be a finally instead? It also seems
> like rsyncBackupDists might belong in this method, as installDists is there.

It could be, but advantage of these working directories is that they are transient. The recoverBackupDists method is for failure paths only. The normal execution path does have a "finally" cleanup. I considered unifying the two, but it would mean either postponing the cleanup (turning the temporary directories back into state that needs managing) or hiding the "rename" cleanup matching the opening "rename" in a less symmetric, and much less precise, call to recoverBackupDists.

> 513 + """Write a marker file for checking direction movements.
>
> s/direction/directory/?

s.

Jeroen

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: