Merge lp://staging/~freyes/charms/trusty/percona-cluster/lp1426508 into lp://staging/~openstack-charmers-archive/charms/trusty/percona-cluster/next

Proposed by Felipe Reyes
Status: Merged
Merged at revision: 54
Proposed branch: lp://staging/~freyes/charms/trusty/percona-cluster/lp1426508
Merge into: lp://staging/~openstack-charmers-archive/charms/trusty/percona-cluster/next
Diff against target: 2350 lines (+2141/-3)
25 files modified
.bzrignore (+3/-0)
Makefile (+7/-0)
charm-helpers-tests.yaml (+5/-0)
copyright (+22/-0)
hooks/percona_hooks.py (+35/-3)
hooks/percona_utils.py (+17/-0)
ocf/percona/mysql_monitor (+636/-0)
setup.cfg (+6/-0)
templates/my.cnf (+1/-0)
tests/00-setup.sh (+29/-0)
tests/10-deploy_test.py (+29/-0)
tests/20-broken-mysqld.py (+38/-0)
tests/30-kill-9-mysqld.py (+38/-0)
tests/basic_deployment.py (+151/-0)
tests/charmhelpers/__init__.py (+38/-0)
tests/charmhelpers/contrib/__init__.py (+15/-0)
tests/charmhelpers/contrib/amulet/__init__.py (+15/-0)
tests/charmhelpers/contrib/amulet/deployment.py (+93/-0)
tests/charmhelpers/contrib/amulet/utils.py (+316/-0)
tests/charmhelpers/contrib/openstack/__init__.py (+15/-0)
tests/charmhelpers/contrib/openstack/amulet/__init__.py (+15/-0)
tests/charmhelpers/contrib/openstack/amulet/deployment.py (+137/-0)
tests/charmhelpers/contrib/openstack/amulet/utils.py (+294/-0)
unit_tests/test_percona_hooks.py (+65/-0)
unit_tests/test_utils.py (+121/-0)
To merge this branch: bzr merge lp://staging/~freyes/charms/trusty/percona-cluster/lp1426508
Reviewer Review Type Date Requested Status
James Page Pending
Mario Splivalo Pending
OpenStack Charmers Pending
Review via email: mp+256640@code.staging.launchpad.net

This proposal supersedes a proposal from 2015-04-07.

Description of the change

Dear OpenStack Charmers,

This patch configures mysql_monitor[0] to keep updated two properties (readable and writable) on each node member of the cluster, these properties are used to define a location rule[1][2] that instructs pacemaker to run the vip only in nodes where the writable property is set to 1.

This fixes scenarios where mysql is out of sync, stopped (manually or because it crashed).

This MP also adds functional tests to check 2 scenarios: a standard 3 nodes deployment, another where mysql service is stopped in the node where the vip is running and it's checked that the vip was migrated to another node (and connectivity is OK after the migration). To run the functional tests the AMULET_OS_VIP environment variable has to be defined, for instance if you're using lxc with the local provider you can run:

$ export AMULET_OS_VIP=10.0.3.2
$ make test

Best,

Note0: This patch doesn't take care of starting mysql service if it's stopped, it just take care of monitor the service.
Note1: this patch requires hacluster MP available at [2] to support location rules definition
Note2: to know if the node is capable of receiving read/write requests the clustercheck[3] is used

[0] https://github.com/percona/percona-pacemaker-agents/blob/master/agents/mysql_monitor
[1] http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html/Clusters_from_Scratch/_specifying_a_preferred_location.html
[2] https://code.launchpad.net/~freyes/charms/trusty/hacluster/add-location/+merge/252127
[3] http://www.percona.com/doc/percona-xtradb-cluster/5.5/faq.html#q-how-can-i-check-the-galera-node-health

To post a comment you must log in.
Revision history for this message
James Page (james-page) wrote : Posted in a previous version of this proposal

Felipe

This looks like a really good start to resolving this challenge; generally you changes look fine (a few inline comments) but I really would like to see upgrades for existing deployments handled as well.

This would involve re-executing the ha_relation_joined function from the upgrade-charm/config-changed hook so that corosync can reconfigure its resources as required.

review: Needs Fixing
Revision history for this message
Felipe Reyes (freyes) wrote : Posted in a previous version of this proposal

James, this new version of the patch addresses your feedback and adds a couple of unit tests for ha-relation-joined.

Thanks,

Revision history for this message
Felipe Reyes (freyes) wrote : Posted in a previous version of this proposal

Mario was reviewing this patch and he found a problem when mysqld is killed and the pidfile is left the agent (mysql_monitor) doesn't properly detect that mysql is not running. I filed a pull request[0] to address this scenario.

[0] https://github.com/percona/percona-pacemaker-agents/pull/53

Revision history for this message
Felipe Reyes (freyes) wrote : Posted in a previous version of this proposal

Mario, I just pushed a new MP, this one includes the PR available at [0].

I'll take care of keeping ocf/percona/mysql_monitor in sync with the upstream version.

Best,

[0] https://github.com/percona/percona-pacemaker-agents/pull/53

Revision history for this message
James Page (james-page) wrote : Posted in a previous version of this proposal
Download full text (6.3 KiB)

I'm struggling to get the amulet tests to pass:

juju-test.conductor DEBUG : Tearing down devel juju environment
juju-test.conductor DEBUG : Calling "juju destroy-environment -y devel"
WARNING cannot delete security group "juju-devel-0". Used by another environment?
WARNING cannot delete security group "juju-devel". Used by another environment?
WARNING cannot delete security group "juju-devel-0". Used by another environment?
juju-test.conductor DEBUG : Starting a bootstrap for devel, kill after 300
juju-test.conductor DEBUG : Running the following: juju bootstrap -e devel
Bootstrapping environment "devel"
Starting new instance for initial state server
Launching instance
 - 0fcaf736-ca4d-4148-befe-a7fe4f564179
Installing Juju agent on bootstrap instance
Waiting for address
Attempting to connect to 10.5.15.115:22
Warning: Permanently added '10.5.15.115' (ECDSA) to the list of known hosts.
Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Running apt-get upgrade
Installing package: curl
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: rsyslog-gnutls
Installing package: cloud-utils
Installing package: cloud-image-utils
Fetching tools: curl -sSfw 'tools from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' --retry 10 -o $bin/tools.tar.gz <[https://streams.canonical.com/juju/tools/devel/juju-1.23-beta4-trusty-amd64.tgz]>
Bootstrapping Juju machine agent
Starting Juju machine agent (jujud-machine-0)
Bootstrap complete
juju-test.conductor DEBUG : Waiting for bootstrap
juju-test.conductor DEBUG : Still not bootstrapped
juju-test.conductor DEBUG : Running the following: juju status -e devel
juju-test.conductor DEBUG : State for 1.23.0: started
juju-test.conductor.10-deploy_test.py DEBUG : Running 10-deploy_test.py (tests/10-deploy_test.py)
2015-04-13 08:46:45 Starting deployment of devel
2015-04-13 08:46:46 Deploying services...
2015-04-13 08:46:46 Deploying service hacluster using cs:trusty/hacluster-18
2015-04-13 08:46:50 Deploying service percona-cluster using local:trusty/percona-cluster
2015-04-13 08:47:05 Config specifies num units for subordinate: hacluster
2015-04-13 08:49:49 Adding relations...
2015-04-13 08:49:49 Adding relation percona-cluster:ha <-> hacluster:ha
2015-04-13 08:51:03 Deployment complete in 257.99 seconds
Traceback (most recent call last):
  File "tests/10-deploy_test.py", line 29, in <module>
    t.run()
  File "tests/10-deploy_test.py", line 13, in run
    super(ThreeNode, self).run()
  File "/home/ubuntu/charms/trusty/percona-cluster/tests/basic_deployment.py", line 70, in run
    assert sorted(self.get_pcmkr_resources()) == sorted(resources)
AssertionError
juju-test.conductor.10-deploy_test.py DEBUG : percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/2
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/2

juju-test.conductor.10-deploy_test.py DEBUG : Got exit code: 1
juju-test.conductor.10-deploy_test.py RESULT : ✘
juju-test.conductor DEBUG : Tearing down devel juju environment...

Read more...

review: Needs Information
Revision history for this message
James Page (james-page) wrote : Posted in a previous version of this proposal

Please add:

test:
        @echo Starting amulet deployment tests...
        #NOTE(beisner): can remove -v after bug 1320357 is fixed
        # https://bugs.launchpad.net/amulet/+bug/1320357
        @juju test -v -p AMULET_HTTP_PROXY --timeout 900

to the Makefile - this will be picked up by OSCI.

review: Needs Fixing
Revision history for this message
James Page (james-page) wrote : Posted in a previous version of this proposal

Felipe

Thinking about 'local.yaml' - that's a bit tricky for our automated testing tooling - however using a environment variable is not - (I see 'VIP' in the code already).

Please could you scope that to be AMULET_OS_VIP and pass it through in the Makefile:

   @juju test -v -p AMULET_HTTP_PROXY,AMULET_OS_VIP --timeout 900

review: Needs Fixing
Revision history for this message
James Page (james-page) wrote : Posted in a previous version of this proposal

Test failure is being being a bit stupid - Monday moment - re-trying now...

Revision history for this message
James Page (james-page) wrote : Posted in a previous version of this proposal
Download full text (3.8 KiB)

The unit poweroff test works fine, but the mysql shutdown test fails in my test run:

juju-test.conductor.20-broken-mysqld.py DEBUG : Running 20-broken-mysqld.py (tests/20-broken-mysqld.py)
2015-04-13 10:41:54 Starting deployment of devel
2015-04-13 10:41:54 Deploying services...
2015-04-13 10:41:55 Deploying service hacluster using cs:trusty/hacluster-18
2015-04-13 10:41:59 Deploying service percona-cluster using local:trusty/percona-cluster
2015-04-13 10:42:13 Config specifies num units for subordinate: hacluster
2015-04-13 10:44:58 Adding relations...
2015-04-13 10:44:58 Adding relation percona-cluster:ha <-> hacluster:ha
2015-04-13 10:46:07 Deployment complete in 253.46 seconds
Traceback (most recent call last):
  File "tests/20-broken-mysqld.py", line 38, in <module>
    t.run()
  File "tests/20-broken-mysqld.py", line 31, in run
    assert changed, "The master didn't change"
AssertionError: The master didn't change
juju-test.conductor.20-broken-mysqld.py DEBUG : percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
stopping mysql in {'subordinates': {'hacluster/2': {'unit': '2', 'upgrading-from': 'cs:trusty/hacluster-18', 'agent-version': '1.23-beta4', 'service': 'hacluster', 'agent-state': 'started', 'unit_name': 'hacluster/2', 'public-address': '10.5.15.143'}}, 'unit': '1', 'machine': '2', 'agent-version': '1.23-beta4', 'service': 'percona-cluster', 'public-address': '10.5.15.143', 'unit_name': 'percona-cluster/1', 'agent-state': 'started'}
looking for the new master
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) running in percona-cluster/1
percona-cluster/0
ERROR subprocess encountered error code 1
percona-cluster/1
inet 10.5.100.1/24 brd 10.5.100.255 scope global eth0
vip(10.5.100.1) runnin...

Read more...

Revision history for this message
Felipe Reyes (freyes) wrote : Posted in a previous version of this proposal

On Mon, 13 Apr 2015 09:59:31 -0000
James Page <email address hidden> wrote:

> Thinking about 'local.yaml' - that's a bit tricky for our automated
> testing tooling - however using a environment variable is not - (I
> see 'VIP' in the code already).
Yeah, it is, I wasn't proud about it.

> Please could you scope that to be AMULET_OS_VIP and pass it through
> in the Makefile:
>
> @juju test -v -p AMULET_HTTP_PROXY,AMULET_OS_VIP --timeout 900
Good idea, I'll use that approach. I didn't know about what really did
'-p'.

Revision history for this message
James Page (james-page) wrote : Posted in a previous version of this proposal

Looks like the check is correctly detecting that mysql is not running - however its broken propagating the response back to pacemaker?

Apr 16 14:11:43 juju-devel2-machine-1 mysql_monitor(res_mysql_monitor)[29199]: MYSQL IS NOT RUNNING:
Apr 16 14:11:43 juju-devel2-machine-1 mysql_monitor(res_mysql_monitor)[29199]: DEBUG: res_mysql_monitor monitor : 0
Apr 16 14:11:44 juju-devel2-machine-1 mysql_monitor(res_mysql_monitor)[29226]: ERROR: Not enough arguments [1] to ocf_log.
Apr 16 14:11:44 juju-devel2-machine-1 mysql_monitor(res_mysql_monitor)[29226]: MYSQL IS NOT RUNNING:
Apr 16 14:11:44 juju-devel2-machine-1 mysql_monitor(res_mysql_monitor)[29226]: DEBUG: res_mysql_monitor monitor : 0

Revision history for this message
Felipe Reyes (freyes) wrote : Posted in a previous version of this proposal

Here is the output of 'make test' after making some changes, the most important one is pull hacluster from /next which is the reason why the vip didn't get migrated when mysqld is stopped (/trunk lacks the ability to define 'location' rules).

http://paste.ubuntu.com/10837670/

Revision history for this message
James Page (james-page) wrote :

I've manually tested the clustering changes, and they are working fine; however I can't get the amulet tests to run reliably, so for now I've landed with tests disabled; we'll need to look at that for 15.07 charm release:

https://bugs.launchpad.net/charms/+source/percona-cluster/+bug/1446169

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches