Merge lp://staging/~mascha/beeseek/fixesLP505596 into lp://staging/beeseek/1.0

Proposed by Martin Schaaf
Status: Merged
Approved by: Andrea Corbellini
Approved revision: not available
Merged at revision: not available
Proposed branch: lp://staging/~mascha/beeseek/fixesLP505596
Merge into: lp://staging/beeseek/1.0
Diff against target: 21 lines (+2/-2)
1 file modified
beeseek/ui/resultspage.py (+2/-2)
To merge this branch: bzr merge lp://staging/~mascha/beeseek/fixesLP505596
Reviewer Review Type Date Requested Status
Andrea Corbellini Approve
Martin Schaaf (community) Needs Resubmitting
Review via email: mp+17845@code.staging.launchpad.net
To post a comment you must log in.
Revision history for this message
Martin Schaaf (mascha) wrote :

Fixes lp #505596

Revision history for this message
Andrea Corbellini (andrea.corbellini) wrote :

Hi and thanks again for your contribution.

Your branch will work and your patch does its job correctly, but I think that using a regular expression is too much complicated for this bug. I think that the better way to fix it is catching ValueError. The reason why I think this is that catching an exception is faster than compiling a matching a regular expression.

Here's what `timeit` says on my computer:

>>> timeit("""
... try:
... index = int(value)
... except ValueError:
... index = 0
... """, "value = '123'")
0.83609890937805176
>>> timeit("""
... if re.match('\d+$', value):
... index = int(value)
... else:
... index = 0
... """, "import re; value = '123'")
3.7652208805084229

In this case, where the index given by the user is a valid number, the difference is big (~2.929 seconds).

>>> timeit("""
... try:
... index = int(value)
... except ValueError:
... index = 0
... """, "value = 'abc'")
3.6588859558105469
>>> timeit("""
... if re.match('\d+$', value):
... index = int(value)
... else:
... index = 0
... """, "import re; value = 'abc'")
2.3073050975799561

Here the index is not valid, and the faster code is the one that uses the regex (~1.351 seconds). However: 1. the delta is smaller than the first case and 2. this case is more rare (if a non-digit is added to the URL, it's because the user did it intentionally).

Of course, this is my opinion and you are free to reply if you thing I'm wrong or if I missed something.

416. By Martin Schaaf

o for better performance pre-compile the regular expression

Revision history for this message
Martin Schaaf (mascha) wrote :

Fix performance concerns by pre compiling the regular expression.

review: Needs Resubmitting
Revision history for this message
Andrea Corbellini (andrea.corbellini) wrote :

Pre-compiling the regular expression gives a great advantage:

>>> timeit("""
... if NUMBER_MATCH.match(value):
... index = int(value)
... else:
... index = 0
... """, "import re; NUMBER_MATCH = re.compile(r'^\d+$'); value = '123'")
1.5664610862731934
>>> timeit("""
... if NUMBER_MATCH.match(value):
... index = int(value)
... else:
... index = 0
... """, "import re; NUMBER_MATCH = re.compile(r'^\d+$'); value = 'abc'")
0.54535198211669922

It's still slower than the try..except block, but the difference is not as big as before.
Thank you for your fix!

review: Approve
Revision history for this message
Martin Schaaf (mascha) wrote :

Thank you for accepting. The reason why I prefer the solution with the regular expression is, that now the time for the worst case is much lower.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'beeseek/ui/resultspage.py'
2--- beeseek/ui/resultspage.py 2010-01-06 12:51:25 +0000
3+++ beeseek/ui/resultspage.py 2010-01-22 20:10:24 +0000
4@@ -25,6 +25,7 @@
5
6 WORD_SEPARATOR = re.compile(r'\W', re.UNICODE)
7 RESULTS_PER_PAGE = 10
8+NUMBER_MATCH = re.compile('^\d+$')
9
10 def show_results(client, page):
11 """Send to the client an HTML page with the results."""
12@@ -35,8 +36,7 @@
13 url_chunks = page.split('/')
14 query = unquote_plus(url_chunks[2])
15 keywords = WORD_SEPARATOR.split(query.lower())
16- if len(url_chunks) >= 4:
17- # XXX This will raise an exception if the page index is not a number.
18+ if len(url_chunks) >= 4 and NUMBER_MATCH.match(url_chunks[3]):
19 page_index = int(url_chunks[3])
20 else:
21 page_index = 0

Subscribers

People subscribed via source and target branches