Merge lp://staging/~gz/bzr/split_segment_params_not_url_842233 into lp://staging/bzr

Proposed by Martin Packman
Status: Merged
Approved by: Vincent Ladeuil
Approved revision: no longer in the source branch.
Merged at revision: 6286
Proposed branch: lp://staging/~gz/bzr/split_segment_params_not_url_842233
Merge into: lp://staging/bzr
Diff against target: 94 lines (+23/-7)
3 files modified
bzrlib/tests/test_urlutils.py (+9/-3)
bzrlib/urlutils.py (+8/-4)
doc/en/release-notes/bzr-2.5.txt (+6/-0)
To merge this branch: bzr merge lp://staging/~gz/bzr/split_segment_params_not_url_842233
Reviewer Review Type Date Requested Status
Vincent Ladeuil Approve
Review via email: mp+82722@code.staging.launchpad.net

Commit message

Avoid using URL specific join and split when seperating segment parameters

Description of the change

Correctly split segment parameters from a url, without depending on url handling functions. This branch keeps the current semantics, except for a couple of assertions on relative paths which are invalid.

For background, see the merge proposal fixing the main initial fallout from the addition colo branch addressing:

<https://code.launchpad.net/~gz/bzr/root_drive_file_url_841322/+merge/74034>

The salient point is that to make a colo address, the join code roughly uses "%(url)s,branch=name" with whatever the base url is. Blindly sticking a comma and segment params on the end of any old url means the result is no longer a url - depending on the base format it may still be parsable as one, but it may not.

Such as:
    <http://example.com> is valid
    <http://example.com,branch=name> is invalid
    <file:///> is special-cased to valid in bzrlib on windows
    <file:///,branch=name> is invalid on windows

So, either we must be careful to make sure to always remove the segment parameters smuggled on before using any url handling functions or passing anywhere that needs a url, or a different approach is needed.

This is further complicated by the laxness of the various Transport implementations when it comes to url handling, which involves a lot of plain string operations without checking inputs. Particularly the adding terminal slashes in some places and removing them in others is troublesome.

To post a comment you must log in.
Revision history for this message
Vincent Ladeuil (vila) wrote :

Thanks for your patience on that.

Overall, I agree that url handling in the various transports classes may receive some love especially about which encoding is used (including url-escaped) is/should/should not be used by which methods.

It's both amazing and scary that you've been able to fix this bug without touching a single transport test... I understand why but it's still amazing.

A couple of nits:

8 + # Check relative references with absolute paths
17 + # Check relative references with relative paths
29 + # TODO: Check full URLs as well as relative references

33 + # Check relative references with absolute paths
42 + # Check relative references with relative paths
51 + # TODO: Check full URLs as well as relative references

Hmm, do I see a pattern there ?

This screams: split these tests to my ears.

70 + # Segements begin at first comma after last forward slash, if one exists

s/Segements/Segments/

71 + segment_start = lurl.find(",", lurl.rfind("/")+1)

'+1'... Cunning ;) and commented !

74 - return (join(parent_url, subsegments[0]), subsegments[1:])
75 + return lurl[:segment_start], lurl[segment_start+1:].split(",")

The comma is easy to miss, the leading paren was clearer to indicate that two values are returned.

Two final notes:
- there has been discussion in the past about never removing the final path from 't.base' as it *is* a directory anyway (and then stop adding it blindly),
- I'd really like to see your mail summarising your thoughts about url handling across the code base ;-D

.. and one more thing: news entry

review: Approve
Revision history for this message
Martin Packman (gz) wrote :

> Overall, I agree that url handling in the various transports classes may
> receive some love especially about which encoding is used (including url-
> escaped) is/should/should not be used by which methods.

Yes, Jelmer has gone some way towards sorting this out, but there's still a lot of confusion in the code.

> It's both amazing and scary that you've been able to fix this bug without
> touching a single transport test... I understand why but it's still amazing.

Well, that's due to the cheat with split_trailing_slash which lets me replicate the existing (somewhat bogus) behaviour but without tickling the main problem by trying to split/join as a url.

> Hmm, do I see a pattern there ?
>
> This screams: split these tests to my ears.

Yes, and given it should really cover the case with schemes as well, it almost wants parametrizing. I resisted that as part of this change as I'm unconvinced by the current interfaces.

> Two final notes:
> - there has been discussion in the past about never removing the final path
> from 't.base' as it *is* a directory anyway (and then stop adding it blindly),
> - I'd really like to see your mail summarising your thoughts about url
> handling across the code base ;-D

Being far more careful about what are valid operations on different kinds of url scheme would help, which perhaps the URL class can assist with though I'm a little leery of the added weight there.

> .. and one more thing: news entry

Added, and nits addressed.

Revision history for this message
Martin Packman (gz) wrote :

sent to pqm by email

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.