Merge lp://staging/~stub/charms/precise/postgresql/enable-integration-tests into lp://staging/charms/postgresql

Proposed by Stuart Bishop
Status: Work in progress
Proposed branch: lp://staging/~stub/charms/precise/postgresql/enable-integration-tests
Merge into: lp://staging/charms/postgresql
Diff against target: 2264 lines (+509/-1039)
17 files modified
.bzrignore (+1/-0)
Makefile (+51/-34)
config.yaml (+65/-63)
hooks/helpers.py (+0/-199)
hooks/hooks.py (+15/-15)
hooks/test_hooks.py (+0/-1)
lib/juju-deployer-wrapper.py (+15/-0)
templates/postgresql.conf.tmpl (+10/-0)
test.py (+149/-354)
testing/README (+0/-36)
testing/amuletfixture.py (+200/-0)
testing/jujufixture.py (+0/-297)
tests/00-setup.sh (+0/-15)
tests/01-lint.sh (+0/-3)
tests/02-unit-tests.sh (+0/-3)
tests/03-basic-amulet.py (+0/-19)
tests/tests.yaml (+3/-0)
To merge this branch: bzr merge lp://staging/~stub/charms/precise/postgresql/enable-integration-tests
Reviewer Review Type Date Requested Status
Review Queue (community) automated testing Needs Fixing
Adam Israel (community) Needs Fixing
charmers Pending
Review via email: mp+238283@code.staging.launchpad.net

Description of the change

Switch to Amulet infrastructure.

Reenable integration tests by default.

To post a comment you must log in.
Revision history for this message
Review Queue (review-queue) wrote :

This items has failed automated testing! Results available here http://reports.vapour.ws/charm-tests/charm-bundle-test-11095-results

review: Needs Fixing (automated testing)
Revision history for this message
Adam Israel (aisrael) wrote :

Hi Stuart,

I had the opportunity to review this merge proposal today.

The Makefile target `testdep` isn't being called by `make test`, so my initial bundletester run failed due to dependencies. I manually ran that step, and was able to get further in the integration tests. Unfortunately, the test eventually hung (after 3 hours of running).

I've posted the full bundletester logs here:
http://pastebin.ubuntu.com/10581906/

There were a couple of integration tests that failed (test_basic_admin, test_roles_granted), but test_upgrade_charm eventually timed out. I suspect that final error, and the length of time the tests took to run, was due to the number of machines spun up and never destroyed during the test -- 29 started, 3 pending.

Bundletester should tear down machines allocated between tests, but that didn't happen in this case. I wonder if it's related at all to the monkeypatched version of Amulet you're using.

How does this line up with your own testing? Anything I should do differently to test? Feel free to ping me in irc or elsewhere if there's more I can do to facilitate getting these tests landed.

I have to NAK this for now, pending further instructions on running tests and/or the resolution of the above issues.

review: Needs Fixing
Revision history for this message
Review Queue (review-queue) wrote :

This items has failed automated testing! Results available here http://reports.vapour.ws/charm-tests/charm-bundle-test-11097-results

review: Needs Fixing (automated testing)
Revision history for this message
Review Queue (review-queue) wrote :

This items has failed automated testing! Results available here http://reports.vapour.ws/charm-tests/charm-bundle-test-11098-results

review: Needs Fixing (automated testing)
Revision history for this message
Stuart Bishop (stub) wrote :

I was relying on the newer Juju behavior (or local provider behavior?), where machines are destroyed by default when the service is destroyed. I'll put back the machine harvesting code I thought might not be needed.

test_basic_admin, test_roles_granted and test_upgrade_charm usually pass, and I suspect their failures was due to the provisioning issues. The more problematic tests are the failover ones, as they sometimes trigger a race condition I need to work out how to fix (if you failover too often too quickly, you might promote a unit to master that has not reached a recoverable state and things die).

bundletester is irrelevant here, as it is just the launcher. As far as bundletester is concerned there is just one huge test.

I've upgraded the Makefile rules to install the test dependencies the first time they are needed.

Revision history for this message
Review Queue (review-queue) wrote :

This items has failed automated testing! Results available here http://reports.vapour.ws/charm-tests/charm-bundle-test-11102-results

review: Needs Fixing (automated testing)
Revision history for this message
Review Queue (review-queue) wrote :

This items has failed automated testing! Results available here http://reports.vapour.ws/charm-tests/charm-bundle-test-11104-results

review: Needs Fixing (automated testing)
Revision history for this message
Stuart Bishop (stub) wrote :

Tests are failing as the mocker library cannot be found, despite it being installed using 'apt-get install python-mocker'. This is likely as the python interpreter in play (the charmguardian vm?) isn't searching the system modules.

Revision history for this message
Adam Israel (aisrael) wrote :

Hey stub,

Thanks for pointing that out. I've been working on a review of these integration tests and am tracking a bug in juju core/juju deployer (lp:1421315) that's been causing some of the tests to fail.

Out of curiosity, what version of juju are you running?

Revision history for this message
Stuart Bishop (stub) wrote :

I stick to ppa:juju/stable for writing charms.

Revision history for this message
Adam Israel (aisrael) wrote :

Excellent, thanks for confirming.

I have some initial notes for you.

I started testing using bundletester but ran into some intermittent test failures due to a bug in bundletester (now fixed).

I've continued testing manually, i.e., running nosetests -sv test:PG91Tests

There's a bug in juju core (lp:1421315) and juju-deployer. I've patched my local version of juju-deployer to work around the issue (a merge for that is pending).

With the above issues resolved, I'm running into another error that maybe you have some insight about.

test.PG91Tests.test_failover ... ERROR cannot read info: lock timeout exceeded
['juju', 'run', '--timeout=21600s', '--unit', 'postgresql/0', 'sudo tail -1 /var/log/juju/unit-postgresql-0.log'] failed: 1

ERROR subprocess encountered error code 1

I'm investigating why that error is being thrown. I wonder if juju run is executing before the unit is stood up. Any thoughts?

Revision history for this message
Stuart Bishop (stub) wrote :

This particular error is bubbling up from the juju-wait plugin (lp:juju-wait). This will only call juju run against units that are alive (do not have life: set to dying or dead in the status) and are ready (report an agent-state: of started). The only thing slightly unusual that could be happening here is that it will execute the juju run's in parallel. The quickest way of checking if this is the culprit is to hack /usr/bin/juju-wait and temporarily change max_workers=6 to max_workers=1.

Revision history for this message
Adam Israel (aisrael) wrote :

Ack, thanks, I'll check juju-wait.

Revision history for this message
Adam Israel (aisrael) wrote :
Download full text (5.4 KiB)

Ok, I've finished a comprehensive review of the integration tests.

There were several tests failing that I've determined to be a failure in the test infrastructure. I've landed a patch to bundletester that was running tests more than once. I'll be sending patches to amulet, juju-deployer, and juju-wait (the change from max_workers=6 to max_workers=1 fixed the lock timeout error).

With the above patches implemented locally, I've been able to run the 9.1 integration test with only one failure, that looks like it's ultimately erroneous:

test_syslog initially failed on the deploy command. This line:

    self.deployment.add('rsyslog', 'cs:rsyslog', 2)

should read:

    self.deployment.add('rsyslog', 'rsyslog', 2)

"cs:rsyslog" returns a 404 - see https://manage.jujucharms.com/api/3/charm/trusty/cs:rsyslog

With that fixed, the test runs but fails it's assert checks:
_StringException: Traceback (most recent call last):
  File "/charms/trusty/postgresql/test.py", line 697, in test_syslog
    self.failUnless('master {}'.format(token) in out)
  File "/usr/lib/python2.7/unittest/case.py", line 614, in deprecated_func
    return original_func(*args, **kwargs)
  File "/usr/lib/python2.7/unittest/case.py", line 424, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

I think the failure is wrong, though. I ssh'd into each rsyslog unit after the failure and `tail -n 100 /var/log/syslog | grep`ped for each string and found it.

While running test_explicit_database, I ran into a case where the replication-relation-broken failed on tearDown, and reset_environment wasn't cycling through to resolve failures. The test passed when I flipped the force flag in amuletfixture.py, but that might not be what you intended.

The stack trace of the failure:
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 Traceback (most recent call last):
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 File "/var/lib/juju/agents/unit-postgresql-0/charm/hooks/replication-relation-broken", line 2701, in <module>
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 hooks.execute(sys.argv)
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 File "/var/lib/juju/agents/unit-postgresql-0/charm/hooks/charmhelpers/core/hookenv.py", line 490, in execute
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 self._hooks[hook_name]()
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 File "/var/lib/juju/agents/unit-postgresql-0/charm/hooks/replication-relation-broken", line 2263, in replication_relation_broken
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 config_changed()
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 File "/var/lib/juju/agents/unit-postgresql-0/charm/hooks/replication-relation-broken", line 1140, in config_changed
2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 generate_postgresql_hba(postgresql_hba)
2015-03-26 15:1...

Read more...

Revision history for this message
Stuart Bishop (stub) wrote :
Download full text (6.6 KiB)

On 7 March 2015 at 06:36, Adam Israel <email address hidden> wrote:

> There were several tests failing that I've determined to be a failure in the test infrastructure. I've landed a patch to bundletester that was running tests more than once. I'll be sending patches to amulet, juju-deployer, and juju-wait (the change from max_workers=6 to max_workers=1 fixed the lock timeout error).

Ok. The multiple workers is of course to speed things up, but
correctness is better. I'll sort juju-wait shortly and kick off the
package build.

> With the above patches implemented locally, I've been able to run the 9.1 integration test with only one failure, that looks like it's ultimately erroneous:
>
> test_syslog initially failed on the deploy command. This line:
>
> self.deployment.add('rsyslog', 'cs:rsyslog', 2)
>
> should read:
>
> self.deployment.add('rsyslog', 'rsyslog', 2)
>
> "cs:rsyslog" returns a 404 - see https://manage.jujucharms.com/api/3/charm/trusty/cs:rsyslog

Aha... when to use prefixes or not has always confused me. I think
this has worked in the past.

> With that fixed, the test runs but fails it's assert checks:
> _StringException: Traceback (most recent call last):
> File "/charms/trusty/postgresql/test.py", line 697, in test_syslog
> self.failUnless('master {}'.format(token) in out)
> File "/usr/lib/python2.7/unittest/case.py", line 614, in deprecated_func
> return original_func(*args, **kwargs)
> File "/usr/lib/python2.7/unittest/case.py", line 424, in assertTrue
> raise self.failureException(msg)
> AssertionError: False is not true
>
> I think the failure is wrong, though. I ssh'd into each rsyslog unit after the failure and `tail -n 100 /var/log/syslog | grep`ped for each string and found it.
>
> While running test_explicit_database, I ran into a case where the replication-relation-broken failed on tearDown, and reset_environment wasn't cycling through to resolve failures. The test passed when I flipped the force flag in amuletfixture.py, but that might not be what you intended.

I consider this a bug, which is why the force flag isn't set.

> The stack trace of the failure:
> 2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 Traceback (most recent call last):
> 2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 File "/var/lib/juju/agents/unit-postgresql-0/charm/hooks/replication-relation-broken", line 2701, in <module>
> 2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 hooks.execute(sys.argv)
> 2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 File "/var/lib/juju/agents/unit-postgresql-0/charm/hooks/charmhelpers/core/hookenv.py", line 490, in execute
> 2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 self._hooks[hook_name]()
> 2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:40 File "/var/lib/juju/agents/unit-postgresql-0/charm/hooks/replication-relation-broken", line 2263, in replication_relation_broken
> 2015-03-26 15:17:01 INFO unit.postgresql/0.replication-relation-broken logger.go:4...

Read more...

Revision history for this message
Adam Israel (aisrael) wrote :

Sorry, it was the complete PG91Tests test suite that took an hour, not an individual test!

Revision history for this message
Stuart Bishop (stub) wrote :

This branch is ready for another test run.

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

Started tests.

Unmerged revisions

123. By Stuart Bishop

Missing dependency

122. By Stuart Bishop

test updates

121. By Stuart Bishop

Install test dependencies if required

120. By Stuart Bishop

Remove dead code

119. By Stuart Bishop

silence upgrade-charm test

118. By Stuart Bishop

Newer wait() implementation

117. By Stuart Bishop

missing config

116. By Stuart Bishop

tidy

115. By Stuart Bishop

delint

114. By Stuart Bishop

quiet teardown

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches