Merge lp://staging/~stub/charms/trusty/postgresql/rewrite into lp://staging/charms/trusty/postgresql

Proposed by Stuart Bishop
Status: Merged
Merged at revision: 131
Proposed branch: lp://staging/~stub/charms/trusty/postgresql/rewrite
Merge into: lp://staging/charms/trusty/postgresql
Prerequisite: lp://staging/~stub/charms/trusty/postgresql/rewrite-charmhelpers
Diff against target: 22814 lines (+17181/-4595)
67 files modified
.bzrignore (+3/-1)
Makefile (+64/-49)
README.md (+52/-100)
TODO (+0/-27)
actions.yaml (+11/-0)
actions/actions.py (+42/-7)
config.yaml (+327/-270)
copyright (+6/-7)
hooks/bootstrap.py (+57/-0)
hooks/client.py (+182/-0)
hooks/coordinator.py (+19/-0)
hooks/data-relation-changed (+23/-0)
hooks/data-relation-departed (+23/-0)
hooks/db-admin-relation-departed (+23/-0)
hooks/db-relation-departed (+23/-0)
hooks/decorators.py (+124/-0)
hooks/definitions.py (+86/-0)
hooks/helpers.py (+0/-197)
hooks/hooks.py (+0/-2820)
hooks/leader-elected (+23/-0)
hooks/leader-settings-changed (+23/-0)
hooks/local-monitors-relation-changed (+23/-0)
hooks/master-relation-changed (+23/-0)
hooks/master-relation-departed (+23/-0)
hooks/metrics.py (+64/-0)
hooks/nagios.py (+81/-0)
hooks/nrpe-external-master-relation-changed (+23/-0)
hooks/postgresql.py (+692/-0)
hooks/replication-relation-changed (+23/-0)
hooks/replication.py (+324/-0)
hooks/service.py (+930/-0)
hooks/start (+23/-0)
hooks/stop (+23/-0)
hooks/storage.py (+88/-0)
hooks/syslog-relation-departed (+23/-0)
hooks/syslogrel.py (+72/-0)
hooks/test_hooks.py (+0/-433)
hooks/upgrade.py (+121/-0)
hooks/wal_e.py (+129/-0)
lib/cache_settings.py (+44/-0)
lib/juju-deployer-wrapper.py (+32/-0)
lib/pg_settings_9.1.json (+2792/-0)
lib/pg_settings_9.2.json (+2858/-0)
lib/pg_settings_9.3.json (+2936/-0)
lib/pg_settings_9.4.json (+3050/-0)
lib/pgclient/metadata.yaml (+3/-4)
metadata.yaml (+44/-9)
templates/pg_backup_job.tmpl (+16/-23)
templates/pg_hba.conf.tmpl (+0/-29)
templates/pg_ident.conf.tmpl (+0/-3)
templates/postgres.cron.tmpl (+6/-6)
templates/postgresql.conf.tmpl (+0/-213)
templates/recovery.conf.tmpl (+1/-1)
templates/rsyslog_forward.conf (+2/-2)
templates/start_conf.tmpl (+0/-13)
templates/swiftwal.conf.tmpl (+0/-6)
testing/README (+0/-36)
testing/amuletfixture.py (+241/-0)
testing/jujufixture.py (+0/-297)
tests/00-setup.sh (+0/-15)
tests/01-lint.sh (+0/-3)
tests/02-unit-tests.sh (+0/-3)
tests/03-basic-amulet.py (+0/-19)
tests/obsolete.py (+2/-2)
tests/test_integration.py (+628/-0)
tests/test_postgresql.py (+711/-0)
tests/tests.yaml (+19/-0)
To merge this branch: bzr merge lp://staging/~stub/charms/trusty/postgresql/rewrite
Reviewer Review Type Date Requested Status
Matt Bruzek (community) Approve
Review via email: mp+267646@code.staging.launchpad.net

Description of the change

The PostgreSQL charm is one of the oldest around, crufty and unmaintainable. Its time for a rewrite to make use of modern Juju and improve its reliability.

To post a comment you must log in.
Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :
Revision history for this message
Stuart Bishop (stub) wrote :

They were passing on the new system being setup. I haven't been
watching the old runner.

--
Stuart Bishop <email address hidden>

Revision history for this message
Stuart Bishop (stub) wrote :

On 17 September 2015 at 14:49, Stuart Bishop
<email address hidden> wrote:
> They were passing on the new system being setup. I haven't been
> watching the old runner.

Looks like the green runs are lost in the mists of time. I'm not sure
what has changed in the month this has been waiting for review.

Revision history for this message
Stuart Bishop (stub) wrote :

Looking at http://data.vapour.ws/charm-test/charm-bundle-test-azure-651-all-machines-log,
it seems debugging output has been turned up to 11, and I think this
breaks 'juju wait'. Without 'juju wait', it is impossible to get the
tests running reliably

Its hard to tell through all the noise, but I think here is where the
initial deploy actually terminates:

unit-postgresql-0[21563]: 2015-09-17 01:01:47 INFO
unit.postgresql/0.juju-log server.go:254 db-admin:2: *** End
'db-admin-relation-changed' hook

After this, we keep seeing noise about leadership leases, and other
stuff including the juju run commands that are checking that the logs
are quiet (thus shooting itself in the foot, because the only way to
check if the logs are quiet adds noise to the logs...).

This never got picked up locally, as I don't even know how to turn on
this output and don't get it here (using juju stable).

Also see Bug #1496130, where we are fixing things so the OpenStack
mojo tests can use the wait algorithm.

What is confusing me though is that the Cassandra tests last passed
Sep 4th, and I would have expected them to have started failing much
earlier (it too relies on juju wait). What is even more confusing, the
last successful run at
http://reports.vapour.ws/charm-test-details/charm-bundle-test-parent-703
leads me to the AWS log at
http://data.vapour.ws/charm-test/charm-bundle-test-aws-591-all-machines-log
which contains a whole heap of logging information from other tests,
in addition to the Cassandra logs, which means test isolation was
busted and the results bogus.

Revision history for this message
Stuart Bishop (stub) wrote :

I see. Prior to ~Sep 4th, tests were often passing but would also
often fail with various provisioning errors. eg. This one is common,
indicating units took longer than 5 minutes to provision:

amulet.helpers.TimeoutError: public-address not set forall units after 300s

After to ~Sep 4th, all integration tests started failing consistently
at 'juju wait', for both Cassandra and the PostgreSQL rewrite.

--
Stuart Bishop <email address hidden>

Revision history for this message
Stuart Bishop (stub) wrote :

I added Juju 1.24 specific behavior to the juju wait plugin, which avoids needing to sniff the logs. I've updated the package in ppa:stub/juju and I think I've queued a retest.

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

> They were passing on the new system being setup. I haven't been
> watching the old runner.
>
Sorry, I'm not familiar, where is the new system (or output from it)?

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

> Looking at http://data.vapour.ws/charm-test/charm-bundle-test-azure-651-all-
> machines-log,
> it seems debugging output has been turned up to 11, and I think this
> breaks 'juju wait'. Without 'juju wait', it is impossible to get the
> tests running reliably

I'm not sure why this would be, unless the default log level in juju itself got turned up. CI is currently running 1.24.5.1

> What is confusing me though is that the Cassandra tests last passed
> Sep 4th, and I would have expected them to have started failing much
> earlier (it too relies on juju wait). What is even more confusing, the
> last successful run at
> http://reports.vapour.ws/charm-test-details/charm-bundle-test-parent-703
> leads me to the AWS log at
> http://data.vapour.ws/charm-test/charm-bundle-test-aws-591-all-machines-log
> which contains a whole heap of logging information from other tests,
> in addition to the Cassandra logs, which means test isolation was
> busted and the results bogus.

Wow, that is bizarre. Thanks for pointing this out, I had not seen this before. I'm not convinced that test isolation is busted, but log isolation may be. I'll look into this.

Revision history for this message
Stuart Bishop (stub) wrote :
Download full text (11.6 KiB)

stub@aargh:~/charms/postgresql/rewrite$ make test
sudo add-apt-repository -y ppa:juju/stable
gpg: keyring `/tmp/tmppv9t1ria/secring.gpg' created
gpg: keyring `/tmp/tmppv9t1ria/pubring.gpg' created
gpg: requesting key C8068B11 from hkp server keyserver.ubuntu.com
gpg: /tmp/tmppv9t1ria/trustdb.gpg: trustdb created
gpg: key C8068B11: public key "Launchpad Ensemble PPA" imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
OK
sudo add-apt-repository -y ppa:stub/juju
gpg: keyring `/tmp/tmpfkk5hwu_/secring.gpg' created
gpg: keyring `/tmp/tmpfkk5hwu_/pubring.gpg' created
gpg: requesting key E4FD7A7A from hkp server keyserver.ubuntu.com
gpg: /tmp/tmpfkk5hwu_/trustdb.gpg: trustdb created
gpg: key E4FD7A7A: public key "Launchpad Stub's Launchpad PPA" imported
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)
OK
[...]
Reading package lists... Done
sudo apt-get install -y \
    python3-psycopg2 python3-nose python3-flake8 amulet \
    python3-jinja2 python3-yaml juju-wait bzr \
    python3-nose-cov python3-nose-timer python-swiftclient
Reading package lists... Done
Building dependency tree
Reading state information... Done
bzr is already the newest version.
python-swiftclient is already the newest version.
python3-flake8 is already the newest version.
python3-jinja2 is already the newest version.
python3-nose is already the newest version.
python3-psycopg2 is already the newest version.
python3-yaml is already the newest version.
python3-nose-cov is already the newest version.
python3-nose-timer is already the newest version.
amulet is already the newest version.
juju-wait is already the newest version.
0 to upgrade, 0 to newly install, 0 to remove and 0 not to upgrade.
Charm Proof
I: metadata name (postgresql) must match directory name (rewrite) exactly for local deployment.
Lint check (flake8)
user configuration: /home/stub/.config/flake8
directory hooks
checking hooks/bootstrap.py
checking hooks/client.py
checking hooks/coordinator.py
checking hooks/decorators.py
checking hooks/definitions.py
checking hooks/helpers.py
checking hooks/metrics.py
checking hooks/nagios.py
checking hooks/postgresql.py
checking hooks/replication.py
checking hooks/service.py
checking hooks/storage.py
checking hooks/syslogrel.py
checking hooks/upgrade.py
checking hooks/wal_e.py
directory actions
checking actions/actions.py
directory testing
checking testing/__init__.py
checking testing/amuletfixture.py
directory tests
checking tests/obsolete.py
checking tests/test_integration.py
checking tests/test_postgresql.py
nosetests3 -sv tests/test_postgresql.py --cover-package=bootstrap,nagios,wal_e,syslogrel,replication,definitions,storage,decorators,metrics,upgrade,postgresql,helpers,coordinator,client,service \
    --with-coverage --cover-branches
test_addr_to_range (test_postgresql.TestPostgresql) ... ok
test_connect (test_postgresql.TestPostgresql) ... ok
test_convert_unit (test_postgresql.TestPostgresql) ... ok
test_create_cluster (test_postgresql.TestPostgresql) ... ok
test_drop_cluster (test_postgresql.TestPostgresql) ... ok
test_ensure_database (test_postgresql.TestPostgresql) ... ok
test_ensure_extensions (test_...

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

@stub, which cloud was that on?

I still have no idea why it literally never passes on CI. Usually takes forever to run too, and often hangs and needs to be killed. I haven't had time to dig deeper yet but I'm hoping to soon.

Revision history for this message
Stuart Bishop (stub) wrote :

@tim It was just a different report, so these clouds. I think I was getting confused with the Cassandra runs which I was doing around the same time.

Previously I had been getting tests to pass, but not all the tests. There were always a few that would fail due to provisioning type errors, the amulet error failing to list the contents of a remote directory, or a juju-deployer race.

Then around Sept 4th, something changed on your clouds and the juju logging changed. This broke 'juju wait', and thus all the PostgreSQL tests, and the Cassandra tests to boot. I've since been reworking the t plugin to be less fragile with Juju 1.23 and earlier, and use 'agent-status' with Juju 1.24 and later. This has got the Cassandra tests green again (and the lp:~stub/charms/trusty/cassandra/spike MP green too).

Revision history for this message
Stuart Bishop (stub) wrote :

This round is looking good. Tests at http://juju-ci.vapour.ws:8080/job/charm-bundle-test-joyent/729/console running smooth until the env.lock bug kicked in.

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

I just emailed the juju list about the env.lock problem. If my multiple $JUJU_HOME idea works, it'll be a quick fix. That particular problem is ruining a lot of otherwise green test runs across many charms and bundles.

Revision history for this message
Stuart Bishop (stub) wrote :

The remaining failure seems genuine (PG93MultiTests.test_failover, which victimizes test_replication). Given 3 nodes, dropping the master and adding a new node at the same time can cause a situation where there is no master. I have not been able to reproduce this locally. I need to trawl the logs from the CI system and try to reproduce manually on a cloud.

248. By Stuart Bishop

Work around Bug #1510000

Revision history for this message
Stuart Bishop (stub) wrote :

FWIW, this is still waiting for review so I can land it. Despite the failure of one of the tests on the CI system, this branch is still preferable to the existing known broken charm store charm which has its tests disabled.

249. By Stuart Bishop

Add timings

250. By Stuart Bishop

Dependency for timestamps

251. By Stuart Bishop

Skip failover tests until Bug #1511659 can be dealt with

252. By Stuart Bishop

Update tests for Juju 1.25 unit naming changes

Revision history for this message
Matt Bruzek (mbruzek) wrote :

Hello Stuart,

Thanks for the enormous amount of work on this charm! This merge is a huge refactor and a lot of files changed. From what I saw of the code and the extensive tests it looks great. I see the automated tests are passing on Joyent, and Power 8! I also tested this on the KVM local provider and the results were PASS: 13 Total: 13 (7392.816871 sec).

Charm proof now returns no errors or warnings which was not the case with the previous branch. Thanks for working with Tim and the team to get the automated tests passing. The tests are complex and very thorough. The automated tests passing are very important and will help this charm be of the highest quality in the future.

+1 LGTM

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to all changes: