Merge lp://staging/~jameinel/bzr/2.1-client-reconnect-819604 into lp://staging/bzr/2.1
Status: | Work in progress |
---|---|
Proposed branch: | lp://staging/~jameinel/bzr/2.1-client-reconnect-819604 |
Merge into: | lp://staging/bzr/2.1 |
Diff against target: |
1035 lines (+589/-143) 7 files modified
bzrlib/help_topics/en/debug-flags.txt (+2/-0) bzrlib/osutils.py (+40/-9) bzrlib/smart/client.py (+170/-86) bzrlib/smart/medium.py (+43/-24) bzrlib/smart/protocol.py (+0/-3) bzrlib/tests/test_osutils.py (+39/-0) bzrlib/tests/test_smart_transport.py (+295/-21) |
To merge this branch: | bzr merge lp://staging/~jameinel/bzr/2.1-client-reconnect-819604 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
bzr-core | Pending | ||
Review via email: mp+78597@code.staging.launchpad.net |
Commit message
Start of bug #819604, allow bzr clients to reconnect if the connection is gone when we go to write a new request.
Description of the change
This is some initial work for having clients reconnect when their bzr+ssh (or bzr://) connection gets closed underneath them.
To start with, the trickiest bit is that it is really quite hard to detect when things are closed. Mostly because of buffering, etc, at lots of different levels (internal memory buffering, socket/pipe buffering, 'ssh.exe' buffering, latency of close message from server back to client, etc.)
I'll also clarify that this isn't intended to handle all possible sorts of connection failures. What we really are trying to handle is allowing a server to be upgraded 'live'. Such that the server will gently disconnect us only between complete requests. Versus having requests terminated randomly in the middle of content. It may help with some of those cases, and we certainly don't want to cause corruption, etc if one of those happens. But it isn't the goal of *this* change.
This specific patch has a few aspects, and does help some cases in real-world testing.
1) Backport a minimum amount of the SmartSSHClientM
It also meant that he shared the logic from SmartSimplePipe
This is a pretty small part of the patch.
2) Update some of the lower level code so that we get ConnectionReset rather than various IOError and ValueError when writing to a closed connection.
3) Update the _SmartClient.
4) The one caveat is if there is a 'body_stream'. body_stream is an iterable, so we can't just rewind it and resume it.
On the plus side, there is only one caller RemoteStreamSin
That would allow us to close the gap a little bit. So we can detect ConnectionReset all the way to the point that we actually start streaming the content.
The other bit that would be possible, is to update ProtocolThreeRe
This just closes a window where we won't reconnect during 'bzr push'.
5) The next step is that (especially for sockets), we write out the request successfully, and then notice it is closed when we try to read back the response.
Here I'm a lot more concerned about non-idempotent requests. Because the server might have fully read the request, and sent back a response, but we didn't get to see the response.
However, our RPC design is meant to be stateless (all state is passed in the request, not assumed to be stored on the server side). Which means things *should* be idempotent. For something like 'put_bytes', we may write the same content twice, but we shouldn't corrupt anything.
The only one I can think of is 'append_bytes', and the associated open_write_stream() code. (RemoteTranspor
And that code shouldn't really be used with up-to-date bzr clients and servers. So I'm much more ok with it just dying if it isn't safe to retry.
I should note, I think it would be prudent to land this into bzr.dev first, but I wanted to make sure it was written against 2.1 since that is the oldest client that we want to backport support for.
Also, I expect there will be a modest amount of conflicts bringing this up from bzr-2.1 into bzr.dev. There has been a modest amount of changes inbetween, so I'll try to produce branches for each series.