Nice to see this moving forward. Some comments: https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst File source/drafts/stopping-units.rst (right): https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode4 source/drafts/stopping-units.rst:4: ------------- Sorry for nitpicking, but can we please stick to a consistent convention for headers? I believe that's what we have today: Top level: === Inner level: --- https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode30 source/drafts/stopping-units.rst:30: while still allowing for termination of errant children. This is a very dense paragraph. Can it be simplified to something like: """ Juju records the desired state in the topology, and preserves that state while the actions to accomplish it are still taking place. """ https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode32 source/drafts/stopping-units.rst:32: How it works, what follows is a a worked example going through a unit "How it works, what follows"? https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode58 source/drafts/stopping-units.rst:58: actions are recorded in the topology as a key 'action' value 'destroy' The "action: destroy" terminology feels a bit out of place here. This is pretty much a procedure call, but the topology holds state instead. Something like "destroyed: true" would be more appropriate. It remains true even after the action has taken place. https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode65 source/drafts/stopping-units.rst:65: - Machine writes to /units/unit-x/stop / sets watch on /units/unit-x/stop and timer. See below regarding the watch side of it. https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode69 source/drafts/stopping-units.rst:69: - executing stop hook. This should be done after departing from relations. https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode72 source/drafts/stopping-units.rst:72: - unit agent updates /units/unit-x/stop to reflect the action has been performed. That step doesn't seem necessary. We can reuse the existing liveness mechanism that is already supported by the agent. When the agent dies, the machine will know it. It also seems like a nice idea to have the stop ZooKeeper node existing for as long as that's the intended state of the unit. We have to need that node on startup for watching it anyway. https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode75 source/drafts/stopping-units.rst:75: - machine agent watch fires on 'stop' node deletion, shuts down the unit, removing Also touches the above point. https://codereview.appspot.com/5847053/diff/5001/source/drafts/stopping-units.rst#newcode94 source/drafts/stopping-units.rst:94: and garbage collects their topology footprint and zk state. What about garbage collecting on the next action after the decision that it's fine to GC it? I don't think we need a specific process that has the task of GCing the topology. In both cases, though, we have an issue: when is it ok to GC it? This proposal is not providing any means for deciding on that, which means the data becomes eternal. That hole indicates postponing that side of the problem may not be a good idea. We can do something simple, but it'd be good implement cleaning up at the same time. https://codereview.appspot.com/5847053/