Merge lp://staging/~chad.smith/charms/precise/block-storage-broker/bsb-retries-on-volume-create-and-attach into lp://staging/charms/block-storage-broker

Proposed by Chad Smith
Status: Merged
Merged at revision: 56
Proposed branch: lp://staging/~chad.smith/charms/precise/block-storage-broker/bsb-retries-on-volume-create-and-attach
Merge into: lp://staging/charms/block-storage-broker
Diff against target: 217 lines (+78/-34)
3 files modified
hooks/test_hooks.py (+4/-4)
hooks/test_util.py (+38/-13)
hooks/util.py (+36/-17)
To merge this branch: bzr merge lp://staging/~chad.smith/charms/precise/block-storage-broker/bsb-retries-on-volume-create-and-attach
Reviewer Review Type Date Requested Status
Chad Smith (community) Approve
David Britton (community) Needs Fixing
Paul Larson Pending
Review via email: mp+231974@code.staging.launchpad.net

Description of the change

To attempt to avoid spurious errors from openstack on nova volume-create and volume-attach commands, this branch adds a retry mechanism for those two commands run by the block storage broker.

This branch introduces an internal _run_command method to reunse retry and error handling logic around commands that are known to produce intermittent 500 errors.

_run_command attempts 3 retries on command failure and logs WARNINGs with each failed attempt. If the command fails 4 times in a row an ERROR is logged and we exit(1) to break the hook.

To keep this merge proposal smaller, we only use _run_command is only being used for "nova volume-create" and "nova volume-attach" methods. Subsequent branches will pull in other nova and euca2ools commands as we refine the error handling and retries needed.

To post a comment you must log in.
Revision history for this message
David Britton (dpb) wrote :

[0] in _def_run_command(), why split the output into lines in a list? I think doing an rstrip() on the result would be enough..

Revision history for this message
David Britton (dpb) wrote :

[1] I would increase the retry time to 30s, 5s seems too frequent.

Just two minor thing to fix. I tested both the unit tests and deploying on openstack, all worked great.

Thanks for the contribution. Hopefully this will make it more stable. If not, we can easily increase the retry count as well.

I like it.

review: Needs Fixing
Revision history for this message
Paul Larson (pwlars) wrote :

This is working better for me. I don't get the HTTP 500 errors anymore, but I still occasionally hit the other problem I described where it tries to create a new volume when the existing one is already there.

Revision history for this message
Chad Smith (chad.smith) wrote :

> [0] in _def_run_command(), why split the output into lines in a list? I think
> doing an rstrip() on the result would be enough..

The output was being split into lines to pre-enable a follow-on branch which would use _run_command for our "nova volume-show" calls. This volume-show has multi-line output and splitting it in the _run_command call would make parsing of that output a bit simpler. I'll pull it out of this branch though as nothing "currently" uses that functionality.

Revision history for this message
Chad Smith (chad.smith) wrote :

> This is working better for me. I don't get the HTTP 500 errors anymore, but I
> still occasionally hit the other problem I described where it tries to create
> a new volume when the existing one is already there.

Thanks Paul, I think we might have to open a separate bug for that duplicate volume created. I'm trying to understand your redeployment steps so I can see this error with more debug statements added.

It sounded like you deployed successfully once, then tore down your environment or the units, and then redeployed. Did you juju destroy-environment or juju destroy-service postgresql? or just remove-unit postgresql/0?

Thanks for the additional info.

Revision history for this message
David Britton (dpb) :
review: Approve
Revision history for this message
David Britton (dpb) wrote :

Hi Chad -- Thanks for doing this.

Please resolve the merge conflicts and I'll commit up for both trusty and precise.

review: Needs Fixing
Revision history for this message
David Britton (dpb) wrote :

@Paul, @Chad, Please follow on with a separate bug for the other issue you are seeing, Thanks!

Revision history for this message
Chad Smith (chad.smith) wrote :

Just pulled trunk and resolved the test conflict. changes are now pushed thanks David.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to all changes: