Merge lp://staging/~sinzui/juju-ci-tools/lost-controller into lp://staging/juju-ci-tools
Status: | Merged |
---|---|
Merged at revision: | 1727 |
Proposed branch: | lp://staging/~sinzui/juju-ci-tools/lost-controller |
Merge into: | lp://staging/juju-ci-tools |
Diff against target: |
208 lines (+99/-17) 2 files modified
deploy_stack.py (+27/-1) tests/test_deploy_stack.py (+72/-16) |
To merge this branch: | bzr merge lp://staging/~sinzui/juju-ci-tools/lost-controller |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Aaron Bentley (community) | Approve | ||
Review via email: mp+310803@code.staging.launchpad.net |
Description of the change
Do not call safe_print_status if the controller is lost.
This branch introduces BootstrapManage
This change avoids the places I saw the failing HA tests get stuck. I am unsure of dump_all_logs and friends. They might need to me aware the controller is lost.
My next branch will update assess_recovery to set has_controller = False when instrumenting failure, and then set it back to True if the controller comes back in 10 minutes.
I'm not sure that it's the safe_print_status that's getting us. In the logs I've looked at, safe_print_status was ultimately killed by a timeout, and something later got stuck. Still, there's no point waiting for status when there's no controller.
I am a bit skeptical that has_controller=None vs has_controller= False is a useful distinction. I'd favour a straight boolean, but I don't insist.