Merge lp://staging/~parthm/bzr/300062-invalid-pattern-warnings into lp://staging/bzr

Proposed by Parth Malwankar
Status: Rejected
Rejected by: Robert Collins
Proposed branch: lp://staging/~parthm/bzr/300062-invalid-pattern-warnings
Merge into: lp://staging/bzr
Diff against target: 680 lines (+383/-39)
12 files modified
NEWS (+5/-0)
bzrlib/builtins.py (+10/-0)
bzrlib/errors.py (+1/-1)
bzrlib/globbing.py (+143/-28)
bzrlib/lazy_regex.py (+6/-6)
bzrlib/tests/blackbox/test_add.py (+28/-0)
bzrlib/tests/blackbox/test_ignore.py (+32/-0)
bzrlib/tests/blackbox/test_ignored.py (+15/-0)
bzrlib/tests/blackbox/test_ls.py (+23/-0)
bzrlib/tests/blackbox/test_status.py (+69/-0)
bzrlib/tests/test_globbing.py (+41/-4)
bzrlib/tests/test_ignores.py (+10/-0)
To merge this branch: bzr merge lp://staging/~parthm/bzr/300062-invalid-pattern-warnings
Reviewer Review Type Date Requested Status
John A Meinel Needs Fixing
Robert Collins Pending
bzr-core Pending
Review via email: mp+26809@code.staging.launchpad.net

Commit message

Better handling for invalid patterns in .bzrignore and user ignore. ignore command rejects bad patterns passed to it.

Description of the change

=== Fixed Bug #300062 ===

Based on the earlier discussion
https://code.edge.launchpad.net/~parthm/bzr/300062-better-handling-for-invalid-ignore-pattern/+merge/26400
this branch make bzr issue a warning if there are invalid patterns in any ignore file. 'ls', 'add', 'ignore', 'ignored' and 'status' work as expected. Below is a sample interaction of these commands.

[aaa]% cat .bzrignore
RE:*.cpp
foo
[aaa]% ~/src/bzr.dev/300062-invalid-pattern-warnings/bzr ignore xyz "RE:[[[[[" pqr
bzr: warning: Skipped invalid pattern argument(s):
  RE:[[[[[
bzr: warning: Skipped invalid pattern(s):
  RE:*.cpp
Please fix above patterns in ".bzrignore" or "/home/parthm/.bazaar/ignore".
Bazaar commands will NOT ignore files meant to match above pattern(s).
[aaa]% cat .bzrignore
RE:*.cpp
foo
xyz
pqr
[aaa]%

=================================================================================
[aaa]% ~/src/bzr.dev/300062-invalid-pattern-warnings/bzr st -S
 M .bzrignore
bzr: warning: Skipped invalid pattern(s):
  RE:*.cpp
Please fix above patterns in ".bzrignore" or "/home/parthm/.bazaar/ignore".
Bazaar commands will NOT ignore files meant to match above pattern(s).
? one.txt
? two.txt
? x
? y
[aaa]% ~/src/bzr.dev/300062-invalid-pattern-warnings/bzr st
bzr: warning: Skipped invalid pattern(s):
  RE:*.cpp
Please fix above patterns in ".bzrignore" or "/home/parthm/.bazaar/ignore".
Bazaar commands will NOT ignore files meant to match above pattern(s).
modified:
  .bzrignore
unknown:
  one.txt
  two.txt
  x
  y

=================================================================================
[aaa]% ~/src/bzr.dev/300062-invalid-pattern-warnings/bzr ls -i
.bzrignore.~1~
bzr: warning: Skipped invalid pattern(s):
  RE:*.cpp
Please fix above patterns in ".bzrignore" or "/home/parthm/.bazaar/ignore".
Bazaar commands will NOT ignore files meant to match above pattern(s).
[aaa]% ~/src/bzr.dev/300062-invalid-pattern-warnings/bzr ls -u
bzr: warning: Skipped invalid pattern(s):
  RE:*.cpp
Please fix above patterns in ".bzrignore" or "/home/parthm/.bazaar/ignore".
Bazaar commands will NOT ignore files meant to match above pattern(s).
one.txt
two.txt
x
y

=================================================================================
[aaa]% ~/src/bzr.dev/300062-invalid-pattern-warnings/bzr ignored
.bzrignore.~1~ *~
bzr: warning: Skipped invalid pattern(s):
  RE:*.cpp
Please fix above patterns in ".bzrignore" or "/home/parthm/.bazaar/ignore".
Bazaar commands will NOT ignore files meant to match above pattern(s).

=================================================================================
[aaa]% ~/src/bzr.dev/300062-invalid-pattern-warnings/bzr add
bzr: warning: Skipped invalid pattern(s):
  RE:*.cpp
Please fix above patterns in ".bzrignore" or "/home/parthm/.bazaar/ignore".
Bazaar commands will NOT ignore files meant to match above pattern(s).
adding one.txt
adding two.txt
adding x
adding y

To post a comment you must log in.
Revision history for this message
John A Meinel (jameinel) wrote :

110 + if validp:
111 + g = Globster(validp)
112 + new_regex_patterns.append(g.regex_patterns[0])

^- This seems odd. A comment here saying that we are collapsing the valid requests into a single compiled regex would be useful.

158 +def is_pattern_valid(pattern):

^- It seems like it would be nice if this could share the code that determines what translation unit to use. Otherwise it seems likely to get skew if one of them is updated. For example, the proposed 'FixedString' prefix. The fallout is that it would normally work fine, but if someone had a *different* invalid regex, then suddenly it would start causing this one to fail as well.

I don't understand why you changed variables from being private. In lazy_regex you renamed _real_re_compile to real_re_compile, and in globbing you changed _regex_patterns.

It is perfectly fine for *tests* to check private variables, it is good to keep them private so that plugins get a feeling that they may not be stable names.

...

342 + def test_invalid_pattern_in_ignore_file(self):
343 + """Existing invalid patterns should be shown.
344 + """
345 + tree = self.make_branch_and_tree('.')
346 + self.build_tree(['file', 'foo.c'])
347 + f = open('.bzrignore', 'w')
348 + try:
349 + f.write(u"*.c\nRE:[\n") # invalid pattern
350 + finally:
351 + f.close()

^- I think you can simplify this to:
self.build_tree_contents([('.bzrignore', '*.c\nRE:[\n')])

It is pretty bad form to write *unicode* strings to f.write() given that in python2 it is an 8-bit request. (It happens to be automatically casting your unicode string down to an ascii string.)

This test might be easier to read if we turned it into a "run_script" test, but that may not be easily available here.

The same comments apply to test_add_with_invalid_ignore_in_global_ignore and
test_invalid_pattern_in_ignore_file in the 2 other places.

review: Needs Fixing
Revision history for this message
Parth Malwankar (parthm) wrote :

Thanks for the review John.

> 110 + if validp:
> 111 + g = Globster(validp)
> 112 + new_regex_patterns.append(g.regex_patterns[0])
>
> ^- This seems odd. A comment here saying that we are collapsing the valid
> requests into a single compiled regex would be useful.
>

Done.

> 158 +def is_pattern_valid(pattern):
>
> ^- It seems like it would be nice if this could share the code that determines
> what translation unit to use. Otherwise it seems likely to get skew if one of
> them is updated. For example, the proposed 'FixedString' prefix. The fallout
> is that it would normally work fine, but if someone had a *different* invalid
> regex, then suddenly it would start causing this one to fail as well.
>

I have pulled out the common code into a class _Pattern. _Pattern has an
identify method that categorizes the pattern as fullpath, extension or
basename. This also provides pattern->translator and pattern->prefix
mapping as there are required in multiple places. Unit test is also added
for this.

>
> I don't understand why you changed variables from being private. In lazy_regex
> you renamed _real_re_compile to real_re_compile, and in globbing you changed
> _regex_patterns.
>

real_re_compile is imported by bzrlib.globbing as is_pattern_valid uses
real_re_compile to check if a pattern is valid. Hence I made that public.

For regex_patterns, in case of a pre-existing bad pattern Globbing.match
creates an instance of Globbing and accesses regex_patterns directly as
instance.regex_patterns. Considering that this uses is within the same
file we could allow it to be private but I thought making it public
may be cleaner.

Let me know if the above seems reasonable to you. I haven't done any
changes for this yet.

> It is perfectly fine for *tests* to check private variables, it is good to
> keep them private so that plugins get a feeling that they may not be stable
> names.
>
>
> ...
>
> 342 + def test_invalid_pattern_in_ignore_file(self):
> 343 + """Existing invalid patterns should be shown.
> 344 + """
> 345 + tree = self.make_branch_and_tree('.')
> 346 + self.build_tree(['file', 'foo.c'])
> 347 + f = open('.bzrignore', 'w')
> 348 + try:
> 349 + f.write(u"*.c\nRE:[\n") # invalid pattern
> 350 + finally:
> 351 + f.close()
>
> ^- I think you can simplify this to:
> self.build_tree_contents([('.bzrignore', '*.c\nRE:[\n')])
>

Thanks. That very handy. I wasn't aware of this method.
The tests have been updated to use this.

> It is pretty bad form to write *unicode* strings to f.write() given that in
> python2 it is an 8-bit request. (It happens to be automatically casting your
> unicode string down to an ascii string.)
>
> This test might be easier to read if we turned it into a "run_script" test, hha
> but that may not be easily available here.
>
> The same comments apply to test_add_with_invalid_ignore_in_global_ignore and
> test_invalid_pattern_in_ignore_file in the 2 other places.

Revision history for this message
Robert Collins (lifeless) wrote :

I have to say, I still think that this would be better solved by just
making the compile error clearer and having it throw an external
BZRError - no expansion of interfaces needed, the improvements can be
used by other things using regexps etc.

-Rob

Revision history for this message
Parth Malwankar (parthm) wrote :

Well, we now have both solutions in place. The warnings based (this) and the error based[1] though the latter may need some cleanup and review.

My personal preference is towards an error based approach. I feel warnings are may be ok for operations like 'status' that don't change anything, but specifically for something like 'add' an error would probably be better so that the user doesn't miss the warning message and go ahead and do the commit. We saw The user could 'revert' but it may be a bit of a pain to do that, or worse, user may miss the message and end up with multiple commits after that containing large binary files :) The 'add' operation with warnings specifically reminds me of bug #549310 where whoami was guessing and I ended up with multiple commits with a bad username[2]. I would prefer a uniform error based approach as Robert suggested.
I don't expect the rollout of an error based approach to be a problem as bzr currently crashes on bad patterns so the existing ignore files will not have bad patterns in them.

That said, I think both warning and error based approach are an improvement over the current situation where bzr shows a stack trace which can scare users. It would be great if we can align on one of these after some more discussion and go ahead.

[1] https://code.edge.launchpad.net/~parthm/bzr/300062-better-handling-for-invalid-ignore-pattern/+merge/26400
[2] https://bugs.launchpad.net/bzr/+bug/549310/comments/3

Revision history for this message
Robert Collins (lifeless) wrote :

So, lets go with an error based approach for now, for the following reasons:
 - its a very small change compared to the current code (wrap the
regex compiler to know its taking a list in and print the errorneous
pattern out)
 - it seems genuinely better for some use cases like add
 - we can add a warning based one later if this feels insufficient.

In general I do think its better to stop early and cleanly in commands
that can output lots of stuff - if we show a lot then users will often
miss the fine print.

I'll review the error based one again to move things forward.

Revision history for this message
Robert Collins (lifeless) wrote :

Marking this one as rejected for now.

Revision history for this message
Martin Pool (mbp) wrote :

On 21 June 2010 16:46, Robert Collins <email address hidden> wrote:
> So, lets go with an error based approach for now, for the following reasons:
>  - its a very small change compared to the current code (wrap the
> regex compiler to know its taking a list in and print the errorneous
> pattern out)
>  - it seems genuinely better for some use cases like add
>  - we can add a warning based one later if this feels insufficient.
>
> In general I do think its better to stop early and cleanly in commands
> that can output lots of stuff - if we show a lot then users will often
> miss the fine print.
>
> I'll review the error based one again to move things forward.

I said previously that I slightly preferred a warning if it was
equally easy to implement. But for the sake of simplicity and of
getting an error in cases where we do need it, an error is fine with
me.

--
Martin

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 21 June 2010 16:46, Robert Collins <email address hidden> wrote:
>> So, lets go with an error based approach for now, for the following reasons:
>> - its a very small change compared to the current code (wrap the
>> regex compiler to know its taking a list in and print the errorneous
>> pattern out)
>> - it seems genuinely better for some use cases like add
>> - we can add a warning based one later if this feels insufficient.
>>
>> In general I do think its better to stop early and cleanly in commands
>> that can output lots of stuff - if we show a lot then users will often
>> miss the fine print.
>>
>> I'll review the error based one again to move things forward.
>
> I said previously that I slightly preferred a warning if it was
> equally easy to implement. But for the sake of simplicity and of
> getting an error in cases where we do need it, an error is fine with
> me.
>

We've had a bad habit in the past of falling over on trivial details and
preventing users from getting stuff done. I'm not saying that is the
case here, and as always there is a balance to be struck.

As mentioned, this can't be considered a regression, since we used to
raise an unhelpful exception, so raising a helpful one is strictly
better than what we had.

So certainly, land the update (I have not reviewed the code). I'm not as
convinced that failing early is always the best answer, it is probably
ok here.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwg10gACgkQJdeBCYSNAAPBLwCgokPBGQCdKuxiom9uOmK2DKmN
Q0UAoKYen2O25wLOuQRdvqyUVRV42iLn
=Qoef
-----END PGP SIGNATURE-----

Unmerged revisions

5292. By Parth Malwankar

added test for _Pattern

5291. By Parth Malwankar

cleanup tests.

5290. By Parth Malwankar

cleanup of pattern categorization code

5289. By Parth Malwankar

merged in changes from trunk

5288. By Parth Malwankar

added _PatternType class to categorize patterns.

5287. By Parth Malwankar

updated NEWS

5286. By Parth Malwankar

merged in changes from trunk.

5285. By Parth Malwankar

added invalid patterns tests for 'ignore'

5284. By Parth Malwankar

added invalid pattern test for ignored.

5283. By Parth Malwankar

added invalid pattern tests for 'add' and 'ls'

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.