lp://staging/~fwereade/pyjuju/fix-charm-upgrade

Created by William Reade and last modified
Get this branch:
bzr branch lp://staging/~fwereade/pyjuju/fix-charm-upgrade
Only William Reade can upload to this branch. If you are William Reade please log in for upload directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
William Reade
Project:
pyjuju
Status:
Merged

Recent revisions

458. By William Reade

merge parent

457. By William Reade

break up synchronize test to avoid occasional timeout during full test runs

456. By William Reade

WorkflowState now (1) has a synchronize method that re-runs inflight

transitions after restart, and (2) requires external locking on state-
changing methods.

(1) is a required change due to the verdict on fix-charm-upgrade (that is:
we don't want additional states for the upgrade process, and we should use
write-ahead logging to maintain state across process death); (2) is
consequently required because of the interaction between the upgrade flag
watch and the resolved watch, both of which take `state == "started"` to
mean "it's safe to fire a transition". This was a pre-existing bug, but was
IMO exacerbated severely by the fact that we now never leave the "started"
state while upgrading a charm -- enough so that I'm not comfortable
proposing a patch that fails to address this issue.

This diff is *big*, but the vast majority of the changes are as a result of
the WorkflowState locking; in particular, many many tests set workflow state
in one way or another, and the necessary changes add a *lot* of noise. In
hindsight, the locking changes could have been made independently, but I
don't think the resulting pair of branches would actually be significantly
easier to deal with.

Various explanatory notes follow:

* External WorkflowState locking (rather than automagic internal locking) was
  chosen for simplicity's sake on the part of the client as well: when one
  to fire a transition conditional on some state being active, it makes
  sense to grab the lock, check, and fire the transition, in the certain
  knowledge that nobody can have changed state underneath you.

* The obvious problem with DeferredLock (that you can tell it's locked, but
  not who owns the lock) has minimal impact, thanks to the unit tests.
  Consider the following scenarios:

  * Code tries to lock, test not holding lock: we're fine.
  * Code tries to lock, test is holding lock: bad test; deadlocks.
  * Code fails to lock, test not holding lock: bad code: asserts.
  * Code fails to lock, test is holding lock: bad code AND bad test.

  The final scenario is the only dangerous one, but it's just a special
  case of the fact that you can *always* write a bad test that passes bad
  code; I think attempting to solve this is out of scope for this universe.

* WorkflowState.fire_transition sets the inflight transition once it knows
  the requested transition is valid, and only clears it explicitly if the
  transition fails without an error transition to follow up (when a
  transition succeeds, set_state implicitly clears inflight).

* WorkflowState.synchronize (1) detects and re-runs inflight transitions and
  (2) has taken over responsibility for the automatic transitions (eg
  None->installed->started) which had previously been handled in UWS/RWS.
  The overlapping concept of success_transition_id has been dropped;
  transitions can now be marked as "automatic".

* WorkflowState.set_inflight really wants to be private, but is exposed for
  testing's sake; get_inflight needs to be exposed so that UWS.synchronize
  doesn't inappropriately start the executor when recovering from mid-
  upgrade process death.

* Er...

* That's it.

R=
CC=
https://codereview.appspot.com/5647064

455. By William Reade

merge parent

454. By William Reade

remove UL.upgrade_charm's first_attempt kwarg

453. By William Reade

manually back out r448 without screwing up future merges, I hope

452. By William Reade

merge parent

451. By William Reade

merge parent

450. By William Reade

address final review point

449. By William Reade

address review point

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
Stacked on:
lp://staging/pyjuju
This branch contains Public information 
Everyone can see this information.

Subscribers