Merge lp://staging/~fwereade/pyjuju/always-break-broken-relations into lp://staging/pyjuju

Proposed by William Reade
Status: Merged
Approved by: Kapil Thangavelu
Approved revision: 442
Merged at revision: 450
Proposed branch: lp://staging/~fwereade/pyjuju/always-break-broken-relations
Merge into: lp://staging/pyjuju
Prerequisite: lp://staging/~fwereade/pyjuju/restrict-relation-broken-context
Diff against target: 651 lines (+266/-148)
6 files modified
juju/agents/unit.py (+10/-13)
juju/control/tests/test_resolved.py (+2/-1)
juju/unit/lifecycle.py (+144/-78)
juju/unit/tests/test_lifecycle.py (+58/-22)
juju/unit/tests/test_workflow.py (+47/-32)
juju/unit/workflow.py (+5/-2)
To merge this branch: bzr merge lp://staging/~fwereade/pyjuju/always-break-broken-relations
Reviewer Review Type Date Requested Status
Kapil Thangavelu (community) Approve
Jim Baker (community) Approve
Review via email: mp+85268@code.staging.launchpad.net

Description of the change

Part 2 extracted from the horrifyingly overgrown restart-transitions branch:

UnitLifecycle now (effectively) persists the current RelationWorkflowStates and will, on load, execute the "depart" transition on any restored RelationWorkflowState which is no longer related.

To post a comment you must log in.
Revision history for this message
Kapil Thangavelu (hazmat) wrote :

[0]

Relative offset path usage into a (grand)parent aren't kosher. Child
usage is, or pass the path in. Also the state directory isn't
namespaced, so usage needs qualification among multiple units.

+ self._state_dir_path = os.path.abspath(
+ os.path.join(unit_path, "../../state"))

[1]

+ # Before we get into serially executing all the relation-broken
+ # hooks, stop all the lifecycles, to minimise the chances of them
+ # running redundant hooks between now and the point at which their
+ # own relation-departed hook is fired.
+ # If they do run redundant hooks, nothing *bad* should happen, but
+ # it's a potentially significant waste of time.

This manages to not capture the key part from the original
documentation on why this is code is structured the way it is.

Namely, concurrent events for subsequent watch firings should not
be executed post rel break watch firing and stopping.

[2]

+ # Actually depart relations, one by one; waiting time should be
+ # minimal thanks to the loop above.

That's not true, executing a stop hook is arbitrary time.

[3] _load/store_relations could use doc strings. at minimum noting storage of presence not state of relations.

[4]

+ state = yaml.dump(dict(
+ (relation_id, (workflow.service_id, workflow.relation_name))
+ for (relation_id, workflow) in self._relations.items()))

- # changes relation delta of global zk state with our memory state.
- new_relations = dict([(service_relation.internal_relation_id,
- service_relation) for
- service_relation in new_relations])
+ # Calculate delta between zookeeper state and our stored state.
+ new_relations = dict(
+ (service_relation.internal_relation_id, service_relation)
+ for service_relation in new_relations)

py2.7 specific, i'd like to retain compatibility for 2.6 (lucid).

[6]

(relation_id, (workflow.service_id, workflow.relation_name))

Storing service id is superflous, its in the context of a unit, which
always knows its service.

[7]
nitpick test_lifecycle, unused
from juju.unit.workflow import RelationWorkflowState

[8] untested bits

! lifecycle->_load_relations
                self._relations[relation_id] = workflow

lifecycle->execute_change_hook
        else:
            hook_names = [hook_name]

review: Needs Fixing
Revision history for this message
William Reade (fwereade) wrote :

[0]

Heh, I couldn't understand why it was done like that originally, so I felt it best to let sleeping dogs lie :).

[1]

I don't understand by what mechanism a stopped relation workflow would be executing new hooks in the first place. Either the stop works, in which case the little double-loop dance is only useful in the case described in my comment... or, it doesn't work, in which case it doesn't matter when we stop, because it won't help anyway.

What am I missing?

[2]

...but time spent *waiting* for *other* hooks should be minimised. I'll try to reword for clarity all the same.

[3]

Good plan.

[4]

Dicts can be initialised with generators at least as far back as 2.5...

[5] ? :p

[6]

...whoops, thank you.

[7]

Cheers.

[8i]

Heh, just because I can't externally distinguish between hitting that code path and just filling _relations in later, no excuse not to test it. Nice catch.

[8ii]

Looks like that param is unused anyway. Yay code deletion :).

Revision history for this message
William Reade (fwereade) wrote :

Points addressed, modulo quibbles above.

For future reviewers' reference: HookScheduler persistence comes much later, in resolve-unit-relation-diffs; so, this code can handle relations broken while not running, but won't fire any other relation hooks it missed.

Revision history for this message
Jim Baker (jimbaker) wrote :

+1, just a couple of minors:

[1]

+def unit_path(juju_path, unit_state):
+ return os.path.join(
+ juju_path, "units", unit_state.unit_name.replace("/", "-"))
+
+

Nit: this function looks potentially useful to break out, but not tested/documented if not private
and the name confuses with other unit_path usage. Also it's only used once in the subsequent code.

[2]

+ # This lifecycle is not responding to events; while it's not looking,
+ # trash one of the relations.
+ broken_complete = self.wait_on_hook("app-relation-broken")
+ yield self.relation_manager.remove_relation_state(
+ self.states["relation"])
+ yield self.sleep(0.1)

Seems to be unnecessary when I removed the sleep, fortunately given its general disagreeabilty :)

review: Approve
Revision history for this message
William Reade (fwereade) wrote :

[2] Isn't this one of the acceptable uses of sleep (ie waiting for something bad to maybe happen so we can be relatively sure it doesn't)? I'll add an explanatory comment, but I have a (vague, unsubstantiated) impression that we tend to comment most uses of sleep thus; and that we probably ought to be able to agree on a general assumption that that's what sleep is basically *for*, and therefore shouldn't require commenting. Make any sense?

440. By William Reade

merge parent

441. By William Reade

address review points

442. By William Reade

replace a sleep with a poke_zk

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

LGTM +1, some minors from a second pass.

[8]

             workflow_state = RelationWorkflowState(
- self.client, unit_relation, lifecycle, self.makeDir())
+ self.client, unit_relation, unit.unit_name, lifecycle,
+ self.makeDir())

+ return RelationWorkflowState(
+ self._client, unit_relation, relation_name, lifecycle,
+ self._state_dir)

One of these is wrong.

[9]

+ # If they do run redundant hooks, nothing *bad* should happen, but
+ # it's a potentially significant waste of time.

I'm not entirely comfortable calling these redundant hooks, their
effectively state changes events trigger hook executions against a
dead relation. The fact that its safe is an implementation detail that
may change in the future.

The previous comment posited an example that made this more clear. firing a modified after a broken relation is detected, is a problem imo.

review: Approve
443. By William Reade

revert poke_zk to sleep: we're checking for hook execution

Revision history for this message
William Reade (fwereade) wrote :

[8]

Hmm, clearly so; nice catch.

[9]

Good point; I'll clarify.

444. By William Reade

better explanation of relation depart handling in UnitLifecycle

445. By William Reade

fix missed review point

446. By William Reade

merge parent

447. By William Reade

merge parent

448. By William Reade

fix unit lifecycle state file name (was not unique)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: