Merge lp://staging/~doanac/ubuntu-ci-services-itself/rabbit-queue-status into lp://staging/ubuntu-ci-services-itself
Status: | Merged |
---|---|
Approved by: | Andy Doan |
Approved revision: | 335 |
Merged at revision: | 344 |
Proposed branch: | lp://staging/~doanac/ubuntu-ci-services-itself/rabbit-queue-status |
Merge into: | lp://staging/ubuntu-ci-services-itself |
Diff against target: |
270 lines (+147/-3) 9 files modified
branch-source-builder/bsbuilder/resources/v1.py (+3/-1) branch-source-builder/bsbuilder/tests/test_v1.py (+3/-1) ci-utils/ci_utils/json_status.py (+35/-0) ci-utils/ci_utils/tests/test_json_status.py (+96/-0) image-builder/imagebuilder/resources/v1.py (+3/-1) juju-deployer/branch-source-builder.yaml.tmpl (+2/-0) juju-deployer/image-builder.yaml.tmpl (+2/-0) juju-deployer/test-runner.yaml.tmpl (+2/-0) test_runner/tstrun/resources/v1.py (+1/-0) |
To merge this branch: | bzr merge lp://staging/~doanac/ubuntu-ci-services-itself/rabbit-queue-status |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Andy Doan (community) | Approve | ||
Vincent Ladeuil (community) | Approve | ||
PS Jenkins bot (community) | continuous-integration | Approve | |
Review via email: mp+209833@code.staging.launchpad.net |
Commit message
run_worker: add minimal monitoring for workers
We've had some periodic issues where a run_worker script
failes to come online.
This adds a simple check for each of our services that use
rabbitmq workers. It checks queue information via the rabbitmq
web API to see how many consumers are subscribed to the queue. This
will let us know when a run_worker script isn't running.
Description of the change
We've had a few bugs lately where one or more of our rabbit-workers weren't online. Its a hard situation to detect, and requires poking around via "juju-ssh". This adds a simple status check to each of our services using rabbit workers. They check for the "consumer count" of a queue which essentially indicates whether or not its corresponding runner is online.
In addition to unit test cases, I ran some test cases in the cloud with:
turning rabbit off yields a webui page that looks like:
| imagebuild-
| | workers-online: unable to check
turning off one of the workers yields (with "workers-online" highlighted in red):
| imagebuild-
| | workers-online: 0
and when everything is online:
| imagebuild-
| | workers-online: 1
FAILED: Continuous integration, rev:333 s-jenkins. ubuntu- ci:8080/ job/uci- engine- ci/322/
http://
Executed test runs:
Click here to trigger a rebuild: s-jenkins. ubuntu- ci:8080/ job/uci- engine- ci/322/ rebuild
http://