Merge lp://staging/~jtv/launchpad/bug-544237 into lp://staging/launchpad

Proposed by Jeroen T. Vermeulen
Status: Merged
Approved by: Jeroen T. Vermeulen
Approved revision: no longer in the source branch.
Merged at revision: not available
Proposed branch: lp://staging/~jtv/launchpad/bug-544237
Merge into: lp://staging/launchpad
Diff against target: 888 lines (+177/-326)
19 files modified
lib/lp/buildmaster/doc/builder.txt (+6/-6)
lib/lp/buildmaster/doc/buildfarmjobbehavior.txt (+2/-2)
lib/lp/buildmaster/interfaces/builder.py (+4/-4)
lib/lp/buildmaster/interfaces/buildfarmjob.py (+10/-0)
lib/lp/buildmaster/interfaces/buildfarmjobbehavior.py (+6/-5)
lib/lp/buildmaster/model/builder.py (+6/-5)
lib/lp/buildmaster/model/buildfarmjob.py (+24/-0)
lib/lp/buildmaster/model/buildfarmjobbehavior.py (+8/-105)
lib/lp/buildmaster/tests/test_buildfarmjobbehavior.py (+76/-39)
lib/lp/buildmaster/tests/test_manager.py (+1/-1)
lib/lp/code/model/recipebuilder.py (+2/-10)
lib/lp/code/model/sourcepackagerecipebuild.py (+3/-0)
lib/lp/code/tests/test_recipebuilder.py (+10/-25)
lib/lp/soyuz/doc/buildd-slavescanner.txt (+7/-1)
lib/lp/soyuz/model/binarypackagebuildbehavior.py (+5/-3)
lib/lp/soyuz/tests/soyuzbuilddhelpers.py (+3/-2)
lib/lp/soyuz/tests/test_binarypackagebuildbehavior.py (+0/-66)
lib/lp/translations/model/translationtemplatesbuildbehavior.py (+4/-19)
lib/lp/translations/tests/test_translationtemplatesbuildbehavior.py (+0/-33)
To merge this branch: bzr merge lp://staging/~jtv/launchpad/bug-544237
Reviewer Review Type Date Requested Status
Eleanor Berger (community) code Approve
Review via email: mp+23214@code.staging.launchpad.net

Commit message

Slave build cookies.

Description of the change

= Bug 544237 =

This is an oversized branch. I'm sorry for that; I really am. A lot of the diff is caused by the renaming of an exception class.

The build farm dispatches build jobs to slave machines. To avoid confusing one job running on a slave with another, the slave receives a "build id" that proves to the master that it's been working on that particular job. It's not technically an id; it's not even necessarily associated with a build. It's just there for verification. There's also a security aspect: slaves can be compromised, and in that state, probably should not be able to substitute their own output for that of a legitimate job running on another slave.

A method in IBuildFarmJobBehavior, which was to be overridden in each implementation, tried to verify a slave's build id against expectations. This was fairly complex without offering much security.

In this branch I replace the build id with something that's simpler and more secure at the same time: a build cookie. It hashes a bunch of values that are associated with a job running on a slave, and which don't change as a job progresses. Put together and hashed, these values are not very easily predictable. (The hashing helps by obscuring from the slave various values that it could otherwise use as starting points for its guesses). I removed all attempts to parse and analyze the cookie. All that's really needed for verification is to re-generate the cookie and compare the outcome to the version that the slave provided. The two should be equal. This does away with the multiple implementations of the verification logic.

We've discussed these changes on the mailing list and on IRC, in particular with bigjools, jml, and wgrant. A few design considerations:

 * We haven't actually identified any real risks even if a compromised slave does forge another slave's build cookie. But the purpose of this check has always been a bit obscure so we're just not taking any chances—especially with future code changes that might otherwise rely on false security. If there is a security risk, this branch can be shown to reduce it. If there is none, this branch merely simplifies the code.

 * For now I included the original slave build id, as generated by getName, in the hashed cookie. This means that the cookie will be at least as hard to guess as what we had before.

 * William and I analyzed failures that might conceivably occur while verifying a cookie, and considered re-raising them as verification failures. But all failures we could imagine would be bugs. So the appropriate response was always to let the exception propagate as-is.

 * There are probably "safer" hash algorithms out there than SHA1. But the code never hashes output from the slave, so there is no question of a compromised slave appending arbitrary data to its output in order to get a desired hash value.

 * It's probably still not that hard for a compromised slave, with knowledge of ballpark figures for some of the inputs, to guess the values that went into its own cookie, which it could then use as a starting point for guessing other cookies. We considered including values like the job's start date or builder id. But that may make it easier for future code changes to break the algorithm. For properly secure cookies, we'd need some salt.

There's a bunch of lint output, mostly the usual "unable to import." Also some lint in places I didn't touch; this branch being the size it is I'd rather not mess with that (and risk causing more conflicts for others).

For testing, I ran all Soyuz tests and all tests with "build" in them. But be aware that includes windmill tests, and that when using lucid, you may get "unknown error code" errors from pygpgme. Jelmer has a pygpgme patch underway that fixes those.

To Q/A, verify that Soyuz builds still succeed on dogfood.

Jeroen

To post a comment you must log in.
Revision history for this message
Eleanor Berger (intellectronica) :
review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.