Merge lp://staging/~ceronman/txsolr/response-consumer-errback-843429 into lp://staging/txsolr

Proposed by Manuel Cerón
Status: Merged
Merged at revision: 59
Proposed branch: lp://staging/~ceronman/txsolr/response-consumer-errback-843429
Merge into: lp://staging/txsolr
Diff against target: 235 lines (+120/-32)
4 files modified
txsolr/__init__.py (+8/-2)
txsolr/client.py (+16/-9)
txsolr/response.py (+21/-16)
txsolr/test/test_response.py (+75/-5)
To merge this branch: bzr merge lp://staging/~ceronman/txsolr/response-consumer-errback-843429
Reviewer Review Type Date Requested Status
Jamu Kakar (community) Approve
Esteve Fernandez (community) Approve
Terry Jones Pending
Review via email: mp+74325@code.staging.launchpad.net

Description of the change

This branch introduces the following changes:

 - ResponseConsumer now handles correctly errors when decoding a response. The bug was causing memory leaks and unfired deferreds.

 - client._request now adds an extra errback just in case something bad happens in the response callback.

To post a comment you must log in.
62. By Manuel Cerón

Reverted HTTPRequestError

63. By Manuel Cerón

Added an extra errback for the request deferred.

Revision history for this message
Terry Jones (terrycojones) wrote :

[1]

In client.py, this:

        d.addCallbacks(responseCallback, responseErrback)
        d.addErrback(responseErrback)

is the same as

        d.addCallback(responseCallback)
        d.addErrback(responseErrback)

Does it really make sense for responseErrback to handle both kinds of
errors? I.e., if an error happens in agent.request, it will be passed to
responseErrback. But if there's an error in responseCallback, any kind of
error at all - including bad Python syntax, name error, attribute error,
etc, then you're also sending that error to responseErrback, which
uncoditionally errbacks with a HTTPRequestError. That's just wrong :-)

---

OK, I just pushed a branch with what I think is a neater way to do all
this. Sorry, I'm very tired.... I'll try to be more communicative
tomorrow.

bzr+ssh://bazaar.launchpad.net/~terrycojones/txsolr/response-consumer-errback-843429

Revision history for this message
Manuel Cerón (ceronman) wrote :

Hi Terry,

I like the changes you made. The only problem is the errback in ResponseConsumer when ConnectionDone is not given. That would cause that all requests fail if the server doesn't send a Content-Lenght.

I'll merge your changes tomorrow in this branch.

64. By Manuel Cerón

Reverted terrycojones changes and applying only some of them.

Revision history for this message
Manuel Cerón (ceronman) wrote :

Hi Terry.

I merged your changes, but then found that it was a bit weird that the ResponseConsumer was triggering the callbacks but not the errbacks, so I decided to keep the code almost intact. We can push another branch with better thought cleanup.

Revision history for this message
Esteve Fernandez (esteve) wrote :

+1 LGTM

review: Approve
Revision history for this message
Jamu Kakar (jkakar) wrote :

[1]

+ def testResponseConsumerWithGoodResponse(self):
+ """
+ The L{ResponseConsumer} protocol should fire the given L{Deferred} if
+ the given body is a valid Solr response.
+ """

and

+ def testResponseConsumerWithBadResponse(self):
+ """
+ The L{ResponseConsumer} protocol should fire the given L{Deferred} with
+ an C{errback} with L{SolrResponseError} if the decoding fails.
+ """

Please s/L{Deferred}/C{Deferred}/ in these docstrings.

[2]

+ self.assertEqual(solrResponse.header['status'], 0)
+ self.assertEqual(solrResponse.header['QTime'], 2)
+ self.assertEqual(solrResponse.results.numFound, 0)
+ self.assertEqual(len(solrResponse.results.docs), 0)

Please reverse the order of the arguments here.

[3]

One minor thing:

    def __init__(self, deferred, responseClass):
        self.body = ''
        self.deferred = deferred
        self.responseClass = responseClass

    def dataReceived(self, bytes):
        _logger.debug('Consumer data received:\n' + bytes)
        self.body += bytes

It's better to append the bytes to a list and use ''.join(self.bytes)
when you want the body. Appending to the string will use a lot of
memory unnecessarily.

Anyway, this looks good, +1!

review: Approve
Revision history for this message
Terry Jones (terrycojones) wrote :

Hi Manuel

OK.

[1]

You have d.addCallbacks(responseCallback) # s/Callbacks/Callback/

Revision history for this message
Terry Jones (terrycojones) wrote :

Hi Manuel

[1]

In client.py you have

         def responseErrback(failure):
             result.errback(HTTPRequestError(failure))

I'm not sure why you do it that way. Why not just pass the failure to
the errback?

Something went wrong, and you already have a failure, so pass it along.

If you put the failure into an HTTPRequestError instance and pass that
to the errback, the errback will wrap the exception instance in a
failure, which would give a failure containing a exception containing
a failure containing an exception.

By unconditionally wrapping every error into an HTTPRequestError
you're silently turning things that are not HTTP errors (like Python
syntax errors or the self.errback AttributeError - i.e., things that
might be/go wrong in responseCallback) into HTTPRequestErrors.

[2]

In response.py, I think this should be rewritten:

    def connectionLost(self, reason):
        if not reason.check(ResponseDone):
            _logger.warning('Unclean response: ' + repr(reason.value))

        try:
            response = self.responseClass(self.body)
        except SolrResponseError, e:
            _logger.error("Can't decode response body: %r" % self.body)
            self.deferred.errback(e)
        else:
            self.deferred.callback(response)

to

    def connectionLost(self, reason):
        if reason.check(ResponseDone):
            try:
                response = self.responseClass(self.body)
            except SolrResponseError, e: # better: use Exception
                _logger.error("Can't decode response body: %r (%s)" % (
                    self.body, e))
                self.deferred.errback(e)
            else:
                self.deferred.callback(response)
        else:
            _logger.warning('Unclean response: ' + repr(reason.value))
            self.deferred.errback(reason)

The code as it stands will try to decode the response when you get
something other than ResponseDone, and you only want to do that when
the connection is closed properly.

Question: Are you *sure* that the only possible exception from calling
self.responseClass(self.body) is a SolrResponseError? I don't find
that code very reassuring - what happens if some other exception
occurs? I think it would be better to use Exception to catch other
possible exceptions at that point. Anything that goes wrong, you give
it to the errback.

I also added logging of the exception when the response decoding fails
(I mean printing the exception so we can see what happened). Ideally
that kind of error would be translated to an Internal Server Error if
Fluidinfo ran into it.

Revision history for this message
Manuel Cerón (ceronman) wrote :
Download full text (3.9 KiB)

Hi Terry

> [1]
>
> In client.py you have
>
> def responseErrback(failure):
> result.errback(HTTPRequestError(failure))
>
> I'm not sure why you do it that way. Why not just pass the failure to
> the errback?
>
> Something went wrong, and you already have a failure, so pass it along.
>
> If you put the failure into an HTTPRequestError instance and pass that
> to the errback, the errback will wrap the exception instance in a
> failure, which would give a failure containing a exception containing
> a failure containing an exception.
>
> By unconditionally wrapping every error into an HTTPRequestError
> you're silently turning things that are not HTTP errors (like Python
> syntax errors or the self.errback AttributeError - i.e., things that
> might be/go wrong in responseCallback) into HTTPRequestErrors.

The reason why I wrap all the errors in a HTTPRequestError is because I wanted to have only one kind of error raised by a txsolr request. This is handy in some cases, for example, we can catch HTTPRequestError in the transaction module and retry the transaction if anything wrong happens. This would't be possible if raise any exception because then I don't know what exception to catch. For instance, I don't know what kind of exceptions can be raised by the twisted agent request, and that's not properly documented.

Maybe I can change the code to look like this:

         def responseErrback(failure):
             result.errback(HTTPRequestError(failure.value))

> [2]
>
> In response.py, I think this should be rewritten:
>
> def connectionLost(self, reason):
> if not reason.check(ResponseDone):
> _logger.warning('Unclean response: ' + repr(reason.value))
>
> try:
> response = self.responseClass(self.body)
> except SolrResponseError, e:
> _logger.error("Can't decode response body: %r" % self.body)
> self.deferred.errback(e)
> else:
> self.deferred.callback(response)
>
> to
>
> def connectionLost(self, reason):
> if reason.check(ResponseDone):
> try:
> response = self.responseClass(self.body)
> except SolrResponseError, e: # better: use Exception
> _logger.error("Can't decode response body: %r (%s)" % (
> self.body, e))
> self.deferred.errback(e)
> else:
> self.deferred.callback(response)
> else:
> _logger.warning('Unclean response: ' + repr(reason.value))
> self.deferred.errback(reason)
>
> The code as it stands will try to decode the response when you get
> something other than ResponseDone, and you only want to do that when
> the connection is closed properly.

The code I wrote is correct. I want to decode the response even if I don't get a ResponseDone reason. In fact, with our current Solr setup, we never get a ResponseDone reason because the server is using chunked transfer encoding and a Content-Length header is not sent, thus we never get a ResponseDone reason. That's why I try to decode the body first, and only if the procedure fails, I fire the errb...

Read more...

Revision history for this message
Terry Jones (terrycojones) wrote :
Download full text (4.7 KiB)

Hi Manuel

> The reason why I wrap all the errors in a HTTPRequestError is
> because I wanted to have only one kind of error raised by a txsolr
> request. This is handy in some cases, for example, we can catch
> HTTPRequestError in the transaction module and retry the transaction
> if anything wrong happens. This would't be possible if raise any
> exception because then I don't know what exception to catch. For
> instance, I don't know what kind of exceptions can be raised by the
> twisted agent request, and that's not properly documented.

I understand the need to catch HTTPRequestErrors and to retry etc. I
just saying that you're turning *everything* into one of them and
that's not good.

For example, change the first line of to look like this:

  def responseCallback(response):
      if response.poopy
          blah

(notice the missing colon on the if, the non-existent name). Then run
your code and see what kind of error you get.

You should be returning HTTPRequestError failure for HTTP errors and
dealing with them via retry etc. Other errors can bubble through to
higher levels, as is usual with Twisted, and in our case be logged and
turned into 500 errors. Putting failure.trap(HTTPRequestError) into
an errback that calls the client method is perfect. I mean there's
already a very well understood Twisted mechanism for filtering out
different kinds of errors. Your approach is like saying "I am
infallible, nothing can possibly be wrong with my code, the only thing
that could go wrong is an HTTP error" :-)

> Maybe I can change the code to look like this:
>
> def responseErrback(failure):
> result.errback(HTTPRequestError(failure.value))

That's putting an exception in an exception and still turning every
error into an HTTPRequestError (I know that's what you *want*, but it
doesn't correspond to programming reality - as we saw with
self.errback, which was a real error that was being silently
swallowed). How about:

    def responseCallback(response):
        _logger.debug('Received response from ' + url)
        if response.code == 200:
            deliveryProtocol = ResponseConsumer(result, JSONSolrResponse)
            response.deliverBody(deliveryProtocol)
        else:
            deliveryProtocol = DiscardingResponseConsumer()
            response.deliverBody(deliveryProtocol)
            raise HTTPWrongStatus(response.code)

    def responseErrback(failure):
        if failure.check(HTTPWrongStatus):
            result.errback(HTTPRequestError(failure.value))
        else:
            # Whoah, unexpected error in my code!
            _logger.error(failure.value)
            result.errback(failure)

    d.addCallback(responseCallback)
    d.addErrback(responseErrback)
    return result

It would be simpler to drop HTTPWrongStatus as it's not used anywhere
as far as I can see.

The above errback deals with both kinds of errors. If the thing that
calls it attaches its own errback that does a
failure.trap(HTTPRequestError) then any other error is more serious
and will pass through (and presumably be caught by a higher level).

> > [2]
> >
> > In response.py
>
> The code I wrote is correct. I want to decode t...

Read more...

65. By Manuel Cerón

Loggers use a custom format now.

66. By Manuel Cerón

Fixed failing tests.

67. By Manuel Cerón

Fixed pep8 error

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to all changes: