Merge lp://staging/~gary/launchpad/bug844631 into lp://staging/launchpad

Proposed by Gary Poster
Status: Merged
Approved by: Gary Poster
Approved revision: no longer in the source branch.
Merged at revision: 14096
Proposed branch: lp://staging/~gary/launchpad/bug844631
Merge into: lp://staging/launchpad
Diff against target: 790 lines (+482/-61)
9 files modified
lib/canonical/config/schema-lazr.conf (+11/-0)
lib/canonical/launchpad/webapp/error.py (+13/-0)
lib/canonical/launchpad/webapp/tests/test_error.py (+91/-1)
lib/canonical/testing/layers.py (+1/-0)
lib/lp/app/browser/configure.zcml (+18/-0)
lib/lp/app/templates/launchpad-databaseunavailable.pt (+293/-0)
lib/lp/services/profile/profile.py (+15/-16)
lib/lp/services/profile/tests.py (+13/-44)
lib/lp/testing/fixture.py (+27/-0)
To merge this branch: bzr merge lp://staging/~gary/launchpad/bug844631
Reviewer Review Type Date Requested Status
Brad Crittenden (community) code Approve
Robert Collins (community) Approve
Review via email: mp+77725@code.staging.launchpad.net

Commit message

[r=bac,lifeless][bug=844631] Add an error page for when the database is down (such as fastdowntime) or has serious issues.

Description of the change

= Overview =

This branch adds an error page for when the database is down (such as fastdowntime) or has serious issues.

It catches both OperationalError and DisconnectionError because, on experimentation, it seems that the database raises OperationalError if it can't connect initially (such as after an app restart when the db is down), and (as expected) DisconnectionError if it was connected but lost the connection.

= UI =

The page design is by Huw, except for the status update section, which I had to rewrite because of technical reasons (see below).

= Implementation =

The registration of the page is straightforward. The branch only has two elements of real interest (and which took up my time).

== Testing ==

The first exciting bit was testing. Stuart suggested that I write integration tests using pgbouncer. This was a bit challenging because it found some edge cases in using the pgbouncer. In particular, you'll see an odd little loop in the test where I get rid of two 500 errors. This was about the best I could do. I talked with him about this for some time and had some alternatives, but we settled on this. I'm not sure what to tell you about it, even though it took a longer time than expected.

== Status updates ==

The second element of this branch that took up my time is the inclusion of the status updates on the page.

=== Rejected ideas ===

(Feel free to skip; I feel compelled to share.)

Huw initially had done this with a Twitter-supplied widget that had a number of problems, such as relying on http. He found another widget, but it relied on jsonp, which would introduce a security hole for our site (http://en.wikipedia.org/wiki/JSONP#Security_concerns). Matthew proposed reusing the same style of memcached RSS loading that we use on the front page for the blog listings, but that was risky for a page that is supposed to be shown when the system is having problems already, and also would have required poking another hole in our firewall. Robert proposed the interesting idea of having a static page hosted in Apache over http which used jsonp, and redirect the appserver error page to that page. That probably would have worked, but I was hoping for a lighter weight solution.

=== Implementation: the story ===

(OK, you can skip this too. This took so long that I feel like talking about it, I guess.)

I next explored the Flash xdr method that YUI provides. Identi.ca does not provide a http://identi.ca/crossdomain.xml , which Flash requires. twitter.com and api.twitter.com explicitly have a crossdomain.xml that only allows their own sites to use it. After a bit of digging, I discovered that search.twitter.com does have a friendly crossdomain.xml. I hooked it up and it was working.

Unfortunately, it used Flash.

Next, I discovered CORS (http://en.wikipedia.org/wiki/Cross-Origin_Resource_Sharing). Yay! This is the answer we want.

=== Implementation: the meat ===

(Read this part, please.)

I try CORS first. That should work on modern Ubuntu installs. If that fails, I fall back to Flash. If that fails, I give up. This seems to work. Yay.

=== Testing the JS out yourself ===

You probably could just shut down PG while you run LP, but I installed a temporary view to look at it.

=== modified file 'lib/lp/app/browser/configure.zcml'
--- lib/lp/app/browser/configure.zcml 2011-09-29 02:32:20 +0000
+++ lib/lp/app/browser/configure.zcml 2011-09-29 18:54:11 +0000
@@ -377,6 +377,15 @@
       class="canonical.launchpad.webapp.error.OperationalErrorView"
       />

+ <!-- XXX Remove -->
+ <browser:page
+ for="*"
+ name="disconnected.html"
+ permission="zope.Public"
+ template="../templates/launchpad-databaseunavailable.pt"
+ class="canonical.launchpad.webapp.error.DisconnectionErrorView"
+ />
+
   <!-- UnsafeFormGetSubmissionError -->
   <browser:page
       for="canonical.launchpad.webapp.interfaces.UnsafeFormGetSubmissionError"

== Lint ==

lint is happy AFAICT except for the usual long line complaints in the lazr config file, but "make lint" is confused and gives me lint on the whole tree, so it's possible I missed something.

That's it.

Thanks!

Gary

To post a comment you must log in.
Revision history for this message
Gary Poster (gary) wrote :

I forgot to mention the following:

- lib/canonical/launchpad/webapp/publication.py has deleted code because Stuart told me to. See comments 14 and 2 of https://bugs.launchpad.net/launchpad/+bug/844631 .

- There are a number of extraneous lint cleanups, because of touching files.

- The change to the profile module is there because I noticed the bad logic while working on the tests. It's otherwise unrelated, and trivial.

Thanks

Revision history for this message
Robert Collins (lifeless) wrote :

+1 with the reinstatement of the publication code - stub didn't mean to remove the reset and rollback calls - they are not things we can push into storm; OTOH the assertion he mentions we would have expected to be fixed. But, we're still seeing it so its clearly not fixed, -> I don't think we can remove it.

We'll want to address that bug promptly so that folk see the right error page, but that can be a different landing.

review: Approve
Revision history for this message
Brad Crittenden (bac) wrote :

Thanks for the nice work Gary. I know the branch had lots of frustrations but I'm glad you got them all tamed.

I wonder if it would've been better to have split this effort into two branches: one for the page and another to add the status messages. Might've helped you with momentum.

On IRC we discussed improvements to the text of the page and I like the final version you wrote. I like the convivial nature of the page but it does need to read smoothly too.

I notice you put up the spinner statically which has the downside of being displayed permanently if JS is not enabled in the browser. I think it should be displayed conditionally. The header "Recent status updates" should also only be showed if there is a chance it will be populated.

Otherwise it looks good. You just mentioned on IRC you'll reinstate the publisher code.

Also, thanks for the lint cleanup.

review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.