Merge into trunk : response-consumer-errback-843429 : Code : txsolr

Reviewer	Date Requested	Status
Jamu Kakar (community)		Approve on 2011-09-07
Esteve Fernandez (community)	2011-09-07	Approve on 2011-09-07
Terry Jones	2011-09-07	Pending
Review via email: mp+74325@code.staging.launchpad.net

lp://staging/~ceronman/txsolr/response-consumer-errback-843429 updated on 2011-09-07

62. By Manuel Cerón on 2011-09-07: Reverted HTTPRequestError
63. By Manuel Cerón on 2011-09-07: Added an extra errback for the request deferred.

Revision history for this message

Terry Jones (terrycojones) wrote on 2011-09-07:

#

[1]

In client.py, this:

d.addCallbacks(responseCallback, responseErrback)
d.addErrback(responseErrback)

is the same as

d.addCallback(responseCallback)
d.addErrback(responseErrback)

Does it really make sense for responseErrback to handle both kinds of
errors? I.e., if an error happens in agent.request, it will be passed to
responseErrback. But if there's an error in responseCallback, any kind of
error at all - including bad Python syntax, name error, attribute error,
etc, then you're also sending that error to responseErrback, which
uncoditionally errbacks with a HTTPRequestError. That's just wrong :-)

---

OK, I just pushed a branch with what I think is a neater way to do all
this. Sorry, I'm very tired.... I'll try to be more communicative
tomorrow.

bzr+ssh://bazaar.launchpad.net/~terrycojones/txsolr/response-consumer-errback-843429

Revision history for this message

Manuel Cerón (ceronman) wrote on 2011-09-07:

#

Hi Terry,

I like the changes you made. The only problem is the errback in ResponseConsumer when ConnectionDone is not given. That would cause that all requests fail if the server doesn't send a Content-Lenght.

I'll merge your changes tomorrow in this branch.

lp://staging/~ceronman/txsolr/response-consumer-errback-843429 updated on 2011-09-07

64. By Manuel Cerón on 2011-09-07: Reverted terrycojones changes and applying only some of them.

Revision history for this message

Manuel Cerón (ceronman) wrote on 2011-09-07:

#

Hi Terry.

I merged your changes, but then found that it was a bit weird that the ResponseConsumer was triggering the callbacks but not the errbacks, so I decided to keep the code almost intact. We can push another branch with better thought cleanup.

Revision history for this message

Esteve Fernandez (esteve) wrote on 2011-09-07:

#

+1 LGTM

review: Approve

Revision history for this message

Jamu Kakar (jkakar) wrote on 2011-09-07:

#

[1]

+ def testResponseConsumerWithGoodResponse(self):
+ """
+ The L{ResponseConsumer} protocol should fire the given L{Deferred} if
+ the given body is a valid Solr response.
+ """

and

+ def testResponseConsumerWithBadResponse(self):
+ """
+ The L{ResponseConsumer} protocol should fire the given L{Deferred} with
+ an C{errback} with L{SolrResponseError} if the decoding fails.
+ """

Please s/L{Deferred}/C{Deferred}/ in these docstrings.

[2]

+ self.assertEqual(solrResponse.header['status'], 0)
+ self.assertEqual(solrResponse.header['QTime'], 2)
+ self.assertEqual(solrResponse.results.numFound, 0)
+ self.assertEqual(len(solrResponse.results.docs), 0)

Please reverse the order of the arguments here.

[3]

One minor thing:

    def __init__(self, deferred, responseClass):
        self.body = ''
        self.deferred = deferred
        self.responseClass = responseClass

    def dataReceived(self, bytes):
        _logger.debug('Consumer data received:\n' + bytes)
        self.body += bytes

It's better to append the bytes to a list and use ''.join(self.bytes)
when you want the body. Appending to the string will use a lot of
memory unnecessarily.

Anyway, this looks good, +1!

review: Approve

Revision history for this message

Terry Jones (terrycojones) wrote on 2011-09-07:

#

Hi Manuel

OK.

[1]

You have d.addCallbacks(responseCallback) # s/Callbacks/Callback/

Revision history for this message

Terry Jones (terrycojones) wrote on 2011-09-07:

#

Hi Manuel

[1]

In client.py you have

def responseErrback(failure):
result.errback(HTTPRequestError(failure))

I'm not sure why you do it that way. Why not just pass the failure to
the errback?

Something went wrong, and you already have a failure, so pass it along.

If you put the failure into an HTTPRequestError instance and pass that
to the errback, the errback will wrap the exception instance in a
failure, which would give a failure containing a exception containing
a failure containing an exception.

By unconditionally wrapping every error into an HTTPRequestError
you're silently turning things that are not HTTP errors (like Python
syntax errors or the self.errback AttributeError - i.e., things that
might be/go wrong in responseCallback) into HTTPRequestErrors.

[2]

In response.py, I think this should be rewritten:

    def connectionLost(self, reason):
        if not reason.check(ResponseDone):
            _logger.warning('Unclean response: ' + repr(reason.value))

        try:
            response = self.responseClass(self.body)
        except SolrResponseError, e:
            _logger.error("Can't decode response body: %r" % self.body)
            self.deferred.errback(e)
        else:
            self.deferred.callback(response)

to

    def connectionLost(self, reason):
        if reason.check(ResponseDone):
            try:
                response = self.responseClass(self.body)
            except SolrResponseError, e: # better: use Exception
                _logger.error("Can't decode response body: %r (%s)" % (
                    self.body, e))
                self.deferred.errback(e)
            else:
                self.deferred.callback(response)
        else:
            _logger.warning('Unclean response: ' + repr(reason.value))
            self.deferred.errback(reason)

The code as it stands will try to decode the response when you get
something other than ResponseDone, and you only want to do that when
the connection is closed properly.

Question: Are you *sure* that the only possible exception from calling
self.responseClass(self.body) is a SolrResponseError? I don't find
that code very reassuring - what happens if some other exception
occurs? I think it would be better to use Exception to catch other
possible exceptions at that point. Anything that goes wrong, you give
it to the errback.

I also added logging of the exception when the response decoding fails
(I mean printing the exception so we can see what happened). Ideally
that kind of error would be translated to an Internal Server Error if
Fluidinfo ran into it.

Hi Manuel

[1]

In client.py you have

def responseErrback(failure):
             result.errback(HTTPRequestError(failure))

I'm not sure why you do it that way. Why not just pass the failure to
the errback?

Something went wrong, and you already have a failure, so pass it along.

If you put the failure into an HTTPRequestError instance and pass that
to the errback, the errback will wrap the exception instance in a
failure, which would give a failure containing a exception containing
a failure containing an exception.

By unconditionally wrapping every error into an HTTPRequestError
you're silently turning things that are not HTTP errors (like Python
syntax errors or the self.errback AttributeError - i.e., things that
might be/go wrong in responseCallback) into HTTPRequestErrors.

[2]

In response.py, I think this should be rewritten:

def connectionLost(self, reason):
        if not reason.check(ResponseDone):
            _logger.warning('Unclean response: ' + repr(reason.value))

try:
            response = self.responseClass(self.body)
        except SolrResponseError, e:
            _logger.error("Can't decode response body: %r" % self.body)
            self.deferred.errback(e)
        else:
            self.deferred.callback(response)

to

def connectionLost(self, reason):
        if reason.check(ResponseDone):
            try:
                response = self.responseClass(self.body)
            except SolrResponseError, e:  # better: use Exception
                _logger.error("Can't decode response body: %r (%s)" % (
                    self.body, e))
                self.deferred.errback(e)
            else:
                self.deferred.callback(response)
        else:
            _logger.warning('Unclean response: ' + repr(reason.value))
            self.deferred.errback(reason)

The code as it stands will try to decode the response when you get
something other than ResponseDone, and you only want to do that when
the connection is closed properly.

Question: Are you *sure* that the only possible exception from calling
self.responseClass(self.body) is a SolrResponseError?  I don't find
that code very reassuring - what happens if some other exception
occurs?  I think it would be better to use Exception to catch other
possible exceptions at that point.  Anything that goes wrong, you give
it to the errback.

I also added logging of the exception when the response decoding fails
(I mean printing the exception so we can see what happened).  Ideally
that kind of error would be translated to an Internal Server Error if
Fluidinfo ran into it.

Revision history for this message

Manuel Cerón (ceronman) wrote on 2011-09-07:

#

Download full text (3.9 KiB)

Hi Terry

> [1]
>
> In client.py you have
>
> def responseErrback(failure):
> result.errback(HTTPRequestError(failure))
>
> I'm not sure why you do it that way. Why not just pass the failure to
> the errback?
>
> Something went wrong, and you already have a failure, so pass it along.
>
> If you put the failure into an HTTPRequestError instance and pass that
> to the errback, the errback will wrap the exception instance in a
> failure, which would give a failure containing a exception containing
> a failure containing an exception.
>
> By unconditionally wrapping every error into an HTTPRequestError
> you're silently turning things that are not HTTP errors (like Python
> syntax errors or the self.errback AttributeError - i.e., things that
> might be/go wrong in responseCallback) into HTTPRequestErrors.

The reason why I wrap all the errors in a HTTPRequestError is because I wanted to have only one kind of error raised by a txsolr request. This is handy in some cases, for example, we can catch HTTPRequestError in the transaction module and retry the transaction if anything wrong happens. This would't be possible if raise any exception because then I don't know what exception to catch. For instance, I don't know what kind of exceptions can be raised by the twisted agent request, and that's not properly documented.

Maybe I can change the code to look like this:

def responseErrback(failure):
result.errback(HTTPRequestError(failure.value))

> [2]
>
> In response.py, I think this should be rewritten:
>
> def connectionLost(self, reason):
> if not reason.check(ResponseDone):
> _logger.warning('Unclean response: ' + repr(reason.value))
>
> try:
> response = self.responseClass(self.body)
> except SolrResponseError, e:
> _logger.error("Can't decode response body: %r" % self.body)
> self.deferred.errback(e)
> else:
> self.deferred.callback(response)
>
> to
>
> def connectionLost(self, reason):
> if reason.check(ResponseDone):
> try:
> response = self.responseClass(self.body)
> except SolrResponseError, e: # better: use Exception
> _logger.error("Can't decode response body: %r (%s)" % (
> self.body, e))
> self.deferred.errback(e)
> else:
> self.deferred.callback(response)
> else:
> _logger.warning('Unclean response: ' + repr(reason.value))
> self.deferred.errback(reason)
>
> The code as it stands will try to decode the response when you get
> something other than ResponseDone, and you only want to do that when
> the connection is closed properly.

The code I wrote is correct. I want to decode the response even if I don't get a ResponseDone reason. In fact, with our current Solr setup, we never get a ResponseDone reason because the server is using chunked transfer encoding and a Content-Length header is not sent, thus we never get a ResponseDone reason. That's why I try to decode the body first, and only if the procedure fails, I fire the errb...

Hi Terry

> [1]
> 
> In client.py you have
> 
>          def responseErrback(failure):
>              result.errback(HTTPRequestError(failure))
> 
> I'm not sure why you do it that way. Why not just pass the failure to
> the errback?
> 
> Something went wrong, and you already have a failure, so pass it along.
> 
> If you put the failure into an HTTPRequestError instance and pass that
> to the errback, the errback will wrap the exception instance in a
> failure, which would give a failure containing a exception containing
> a failure containing an exception.
> 
> By unconditionally wrapping every error into an HTTPRequestError
> you're silently turning things that are not HTTP errors (like Python
> syntax errors or the self.errback AttributeError - i.e., things that
> might be/go wrong in responseCallback) into HTTPRequestErrors.

The reason why I wrap all the errors in a HTTPRequestError is because I wanted to have only one kind of error raised by a txsolr request. This is handy in some cases, for example, we can catch HTTPRequestError in the transaction module and retry the transaction if anything wrong happens. This would't be possible if raise any exception because then I don't know what exception to catch. For instance, I don't know what kind of exceptions can be raised by the twisted agent request, and that's not properly documented.

Maybe I can change the code to look like this:

def responseErrback(failure):
             result.errback(HTTPRequestError(failure.value))

> [2]
> 
> In response.py, I think this should be rewritten:
> 
>     def connectionLost(self, reason):
>         if not reason.check(ResponseDone):
>             _logger.warning('Unclean response: ' + repr(reason.value))
> 
>         try:
>             response = self.responseClass(self.body)
>         except SolrResponseError, e:
>             _logger.error("Can't decode response body: %r" % self.body)
>             self.deferred.errback(e)
>         else:
>             self.deferred.callback(response)
> 
> to
> 
>     def connectionLost(self, reason):
>         if reason.check(ResponseDone):
>             try:
>                 response = self.responseClass(self.body)
>             except SolrResponseError, e:  # better: use Exception
>                 _logger.error("Can't decode response body: %r (%s)" % (
>                     self.body, e))
>                 self.deferred.errback(e)
>             else:
>                 self.deferred.callback(response)
>         else:
>             _logger.warning('Unclean response: ' + repr(reason.value))
>             self.deferred.errback(reason)
> 
> The code as it stands will try to decode the response when you get
> something other than ResponseDone, and you only want to do that when
> the connection is closed properly.

The code I wrote is correct. I want to decode the response even if I don't get a ResponseDone reason. In fact, with our current Solr setup, we never get a ResponseDone reason because the server is using chunked transfer encoding and a Content-Length header is not sent, thus we never get a ResponseDone reason. That's why I try to decode the body first, and only if the procedure fails, I fire the errback, otherwise I'll fire a callback with the decoded response.

> Question: Are you *sure* that the only possible exception from calling
> self.responseClass(self.body) is a SolrResponseError?  I don't find
> that code very reassuring - what happens if some other exception
> occurs?  I think it would be better to use Exception to catch other
> possible exceptions at that point.  Anything that goes wrong, you give
> it to the errback.

Yes, I agree, I will check for all the possible exceptions and fire an errback if any of them is raised.

> 
> I also added logging of the exception when the response decoding fails
> (I mean printing the exception so we can see what happened).  Ideally
> that kind of error would be translated to an Internal Server Error if
> Fluidinfo ran into it.

Yes I will add an error to the log too.

Revision history for this message

Terry Jones (terrycojones) wrote on 2011-09-07:

#

Download full text (4.7 KiB)

Hi Manuel

> The reason why I wrap all the errors in a HTTPRequestError is
> because I wanted to have only one kind of error raised by a txsolr
> request. This is handy in some cases, for example, we can catch
> HTTPRequestError in the transaction module and retry the transaction
> if anything wrong happens. This would't be possible if raise any
> exception because then I don't know what exception to catch. For
> instance, I don't know what kind of exceptions can be raised by the
> twisted agent request, and that's not properly documented.

I understand the need to catch HTTPRequestErrors and to retry etc. I
just saying that you're turning *everything* into one of them and
that's not good.

For example, change the first line of to look like this:

  def responseCallback(response):
      if response.poopy
          blah

(notice the missing colon on the if, the non-existent name). Then run
your code and see what kind of error you get.

You should be returning HTTPRequestError failure for HTTP errors and
dealing with them via retry etc. Other errors can bubble through to
higher levels, as is usual with Twisted, and in our case be logged and
turned into 500 errors. Putting failure.trap(HTTPRequestError) into
an errback that calls the client method is perfect. I mean there's
already a very well understood Twisted mechanism for filtering out
different kinds of errors. Your approach is like saying "I am
infallible, nothing can possibly be wrong with my code, the only thing
that could go wrong is an HTTP error" :-)

> Maybe I can change the code to look like this:
>
> def responseErrback(failure):
> result.errback(HTTPRequestError(failure.value))

That's putting an exception in an exception and still turning every
error into an HTTPRequestError (I know that's what you *want*, but it
doesn't correspond to programming reality - as we saw with
self.errback, which was a real error that was being silently
swallowed). How about:

    def responseCallback(response):
        _logger.debug('Received response from ' + url)
        if response.code == 200:
            deliveryProtocol = ResponseConsumer(result, JSONSolrResponse)
            response.deliverBody(deliveryProtocol)
        else:
            deliveryProtocol = DiscardingResponseConsumer()
            response.deliverBody(deliveryProtocol)
            raise HTTPWrongStatus(response.code)

    def responseErrback(failure):
        if failure.check(HTTPWrongStatus):
            result.errback(HTTPRequestError(failure.value))
        else:
            # Whoah, unexpected error in my code!
            _logger.error(failure.value)
            result.errback(failure)

    d.addCallback(responseCallback)
    d.addErrback(responseErrback)
    return result

It would be simpler to drop HTTPWrongStatus as it's not used anywhere
as far as I can see.

The above errback deals with both kinds of errors. If the thing that
calls it attaches its own errback that does a
failure.trap(HTTPRequestError) then any other error is more serious
and will pass through (and presumably be caught by a higher level).

> > [2]
> >
> > In response.py
>
> The code I wrote is correct. I want to decode t...

Hi Manuel

> The reason why I wrap all the errors in a HTTPRequestError is
> because I wanted to have only one kind of error raised by a txsolr
> request. This is handy in some cases, for example, we can catch
> HTTPRequestError in the transaction module and retry the transaction
> if anything wrong happens. This would't be possible if raise any
> exception because then I don't know what exception to catch. For
> instance, I don't know what kind of exceptions can be raised by the
> twisted agent request, and that's not properly documented.

I understand the need to catch HTTPRequestErrors and to retry etc. I
just saying that you're turning *everything* into one of them and
that's not good.

For example, change the first line of to look like this:

def responseCallback(response):
      if response.poopy
          blah

(notice the missing colon on the if, the non-existent name). Then run
your code and see what kind of error you get.

You should be returning HTTPRequestError failure for HTTP errors and
dealing with them via retry etc.  Other errors can bubble through to
higher levels, as is usual with Twisted, and in our case be logged and
turned into 500 errors.  Putting failure.trap(HTTPRequestError) into
an errback that calls the client method is perfect. I mean there's
already a very well understood Twisted mechanism for filtering out
different kinds of errors.  Your approach is like saying "I am
infallible, nothing can possibly be wrong with my code, the only thing
that could go wrong is an HTTP error" :-)

> Maybe I can change the code to look like this:
> 
>          def responseErrback(failure):
>              result.errback(HTTPRequestError(failure.value))

That's putting an exception in an exception and still turning every
error into an HTTPRequestError (I know that's what you *want*, but it
doesn't correspond to programming reality - as we saw with
self.errback, which was a real error that was being silently
swallowed).  How about:

def responseCallback(response):
        _logger.debug('Received response from ' + url)
        if response.code == 200:
            deliveryProtocol = ResponseConsumer(result, JSONSolrResponse)
            response.deliverBody(deliveryProtocol)
        else:
            deliveryProtocol = DiscardingResponseConsumer()
            response.deliverBody(deliveryProtocol)
            raise HTTPWrongStatus(response.code)

def responseErrback(failure):
        if failure.check(HTTPWrongStatus):
            result.errback(HTTPRequestError(failure.value))
        else:
            # Whoah, unexpected error in my code!
            _logger.error(failure.value)
            result.errback(failure)

d.addCallback(responseCallback)
    d.addErrback(responseErrback)
    return result

It would be simpler to drop HTTPWrongStatus as it's not used anywhere
as far as I can see.

The above errback deals with both kinds of errors.  If the thing that
calls it attaches its own errback that does a
failure.trap(HTTPRequestError) then any other error is more serious
and will pass through (and presumably be caught by a higher level).

> > [2]
> > 
> > In response.py
> 
> The code I wrote is correct. I want to decode the response even if I
> don't get a ResponseDone reason. In fact, with our current Solr
> setup, we never get a ResponseDone reason because the server is
> using chunked transfer encoding and a Content-Length header is not
> sent, thus we never get a ResponseDone reason. That's why I try to
> decode the body first, and only if the procedure fails, I fire the
> errback, otherwise I'll fire a callback with the decoded response.

OK, got it, thanks. One comment is that if we're never getting
ResponseDone then you're going to be writing an "Unclean response:"
line to the log on every single Solr request, right?  Is there any
reason to do that if it's now the normal/expected behavior?

How about:

def connectionLost(self, reason):
        if reason.check(PotentialDataLoss, ResponseDone):
            try:
                response = self.responseClass(self.body)
            except Exception, e:
                _logger.error("Can't decode response body: %r" % self.body)
                self.deferred.errback(e)
            else:
                self.deferred.callback(response)
        else:
            _logger.warning('Unclean response: ' + repr(reason.value))
            self.deferred.errback(reason)

Or do you really want to process the body (and return it) on every
single error?  What does the JSONSolrResponse do if it's passed an
empty string (self.body would presumably be empty for some possible
errors). This also has the advantage that if there *is* some other
kind of unexpected error, you're passing it on in the Twisted way. If
Twisted happens to return a TheMoonExploded failure, you log it and
pass it on.

T

lp://staging/~ceronman/txsolr/response-consumer-errback-843429 updated on 2011-09-12

65. By Manuel Cerón on 2011-09-07: Loggers use a custom format now.
66. By Manuel Cerón on 2011-09-12: Fixed failing tests.
67. By Manuel Cerón on 2011-09-12: Fixed pep8 error

txsolr

Merge lp://staging/~ceronman/txsolr/response-consumer-errback-843429 into lp://staging/txsolr

Commit message

Description of the change

Preview Diff

Subscribers