Merge lp://staging/~doanac/ubuntu-ci-services-itself/rabbit-queue-status into lp://staging/ubuntu-ci-services-itself

Proposed by Andy Doan
Status: Merged
Approved by: Andy Doan
Approved revision: 335
Merged at revision: 344
Proposed branch: lp://staging/~doanac/ubuntu-ci-services-itself/rabbit-queue-status
Merge into: lp://staging/ubuntu-ci-services-itself
Diff against target: 270 lines (+147/-3)
9 files modified
branch-source-builder/bsbuilder/resources/v1.py (+3/-1)
branch-source-builder/bsbuilder/tests/test_v1.py (+3/-1)
ci-utils/ci_utils/json_status.py (+35/-0)
ci-utils/ci_utils/tests/test_json_status.py (+96/-0)
image-builder/imagebuilder/resources/v1.py (+3/-1)
juju-deployer/branch-source-builder.yaml.tmpl (+2/-0)
juju-deployer/image-builder.yaml.tmpl (+2/-0)
juju-deployer/test-runner.yaml.tmpl (+2/-0)
test_runner/tstrun/resources/v1.py (+1/-0)
To merge this branch: bzr merge lp://staging/~doanac/ubuntu-ci-services-itself/rabbit-queue-status
Reviewer Review Type Date Requested Status
Andy Doan (community) Approve
Vincent Ladeuil (community) Approve
PS Jenkins bot (community) continuous-integration Approve
Review via email: mp+209833@code.staging.launchpad.net

Commit message

run_worker: add minimal monitoring for workers

We've had some periodic issues where a run_worker script
failes to come online.

This adds a simple check for each of our services that use
rabbitmq workers. It checks queue information via the rabbitmq
web API to see how many consumers are subscribed to the queue. This
will let us know when a run_worker script isn't running.

Description of the change

We've had a few bugs lately where one or more of our rabbit-workers weren't online. Its a hard situation to detect, and requires poking around via "juju-ssh". This adds a simple status check to each of our services using rabbit workers. They check for the "consumer count" of a queue which essentially indicates whether or not its corresponding runner is online.

In addition to unit test cases, I ran some test cases in the cloud with:

turning rabbit off yields a webui page that looks like:

 | imagebuild-restish/0 | rabbit configured: true
 | | workers-online: unable to check

turning off one of the workers yields (with "workers-online" highlighted in red):

 | imagebuild-restish/0 | rabbit configured: true
 | | workers-online: 0

and when everything is online:

 | imagebuild-restish/0 | rabbit configured: true
 | | workers-online: 1

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

FAILED: Continuous integration, rev:333
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/322/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/322/rebuild

review: Needs Fixing (continuous-integration)
334. By Andy Doan

fix broken test case

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:334
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/333/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/333/rebuild

review: Approve (continuous-integration)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

PASSED: Continuous integration, rev:334
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/334/
Executed test runs:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/uci-engine-ci/334/rebuild

review: Approve (continuous-integration)
Revision history for this message
Vincent Ladeuil (vila) wrote :

62 + '''checks if there any workers attached to the queue.'''

'are' missing ?

66 + # we already report rabbit isn't configured, so no sense adding
67 + # another failure for this

Let's have a common way to report that error then, in the most obscure cases, it's better to have *some* output than none. I know this will never happen... but I know that if it happens, we'll be glad to have an error message ;)

review: Approve
335. By Andy Doan

grammar mistake

Revision history for this message
Andy Doan (doanac) wrote :

On 03/09/2014 04:43 AM, Vincent Ladeuil wrote:
> 62 + '''checks if there any workers attached to the queue.'''
>
> 'are' missing ?

fixed in revno(335)

> 66 + # we already report rabbit isn't configured, so no sense adding
> 67 + # another failure for this
>
> Let's have a common way to report that error then, in the most obscure cases, it's better to have*some* output than none. I know this will never happen... but I know that if it happens, we'll be glad to have an error message;)

We are via the "add_rabbit_configured" in that object. My point was
we'll be reporting a "rabbit isn't configured" message already so
showing another error is just piling on extra stuff that might make it
more confusing to diagnose.

Revision history for this message
Andy Doan (doanac) wrote :

self-acking since vila already acked.

review: Approve
Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> Andy Doan <email address hidden> writes:

    > On 03/09/2014 04:43 AM, Vincent Ladeuil wrote:
    >> 62 + '''checks if there any workers attached to the queue.'''
    >>
    >> 'are' missing ?

    > fixed in revno(335)

    >> 66 + # we already report rabbit isn't configured, so no sense adding
    >> 67 + # another failure for this
    >>
    >> Let's have a common way to report that error then, in the most obscure cases, it's better to have*some* output than none. I know this will never happen... but I know that if it happens, we'll be glad to have an error message;)

    > We are via the "add_rabbit_configured" in that object. My point was
    > we'll be reporting a "rabbit isn't configured" message already so
    > showing another error is just piling on extra stuff that might make it
    > more confusing to diagnose.

Ok, I was mistaken then, I thought the previous error was fatal and that
this one could only trigger if the config existed before and disappeared
for unknown reasons.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches