Merge lp://staging/~sinzui/juju-ci-tools/lost-controller into lp://staging/juju-ci-tools

Proposed by Curtis Hovey
Status: Merged
Merged at revision: 1727
Proposed branch: lp://staging/~sinzui/juju-ci-tools/lost-controller
Merge into: lp://staging/juju-ci-tools
Diff against target: 208 lines (+99/-17)
2 files modified
deploy_stack.py (+27/-1)
tests/test_deploy_stack.py (+72/-16)
To merge this branch: bzr merge lp://staging/~sinzui/juju-ci-tools/lost-controller
Reviewer Review Type Date Requested Status
Aaron Bentley (community) Approve
Review via email: mp+310803@code.staging.launchpad.net

Description of the change

Do not call safe_print_status if the controller is lost.

This branch introduces BootstrapManager.has_controller = <None | True | False>. The default value is None, While bootstrapped the value is True. Tests can set the value to False to indicate the controller is lost. Many juju actions are pointless to try when there is no controller.

This change avoids the places I saw the failing HA tests get stuck. I am unsure of dump_all_logs and friends. They might need to me aware the controller is lost.

My next branch will update assess_recovery to set has_controller = False when instrumenting failure, and then set it back to True if the controller comes back in 10 minutes.

To post a comment you must log in.
Revision history for this message
Aaron Bentley (abentley) wrote :

I'm not sure that it's the safe_print_status that's getting us. In the logs I've looked at, safe_print_status was ultimately killed by a timeout, and something later got stuck. Still, there's no point waiting for status when there's no controller.

I am a bit skeptical that has_controller=None vs has_controller=False is a useful distinction. I'd favour a straight boolean, but I don't insist.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches