Merge lp://staging/~allenap/launchpad/stop-checkwatches-hammering-bug-506158-devel into lp://staging/launchpad

Proposed by Gavin Panella
Status: Merged
Approved by: Graham Binns
Approved revision: not available
Merged at revision: not available
Proposed branch: lp://staging/~allenap/launchpad/stop-checkwatches-hammering-bug-506158-devel
Merge into: lp://staging/launchpad
Diff against target: 225 lines (+168/-17)
3 files modified
lib/lp/bugs/doc/checkwatches-batching.txt (+134/-0)
lib/lp/bugs/doc/externalbugtracker.txt (+3/-0)
lib/lp/bugs/scripts/checkwatches.py (+31/-17)
To merge this branch: bzr merge lp://staging/~allenap/launchpad/stop-checkwatches-hammering-bug-506158-devel
Reviewer Review Type Date Requested Status
Graham Binns (community) code Approve
Review via email: mp+17363@code.staging.launchpad.net

Commit message

Query remote bug trackers about recently modified bugs in limited batches. Previously, queries were made for all bugs, which could result in such large XML-RPC requests that the remote system would become overloaded.

To post a comment you must log in.
Revision history for this message
Gavin Panella (allenap) wrote :

[The same code has already reviewed in https://code.launchpad.net/~allenap/launchpad/stop-checkwatches-hammering-bug-506158/+merge/17306]

This batches up calls to getModifiedRemoteBugs() so that we don't maim remote trackers. I put some tests in a new file, checkwatches-batching.txt, and have added an XXX to externalbugtracker.txt to suggest that some of the tests in there should be moved to this new file.

The code in _getRemoteIdsToCheck() is still a bit back to front - it should probably identify the new bug watches first, then fill with old ones - but that's beyond the scope of this branch.

Revision history for this message
Graham Binns (gmb) :
review: Approve (code)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'lib/lp/bugs/doc/checkwatches-batching.txt'
2--- lib/lp/bugs/doc/checkwatches-batching.txt 1970-01-01 00:00:00 +0000
3+++ lib/lp/bugs/doc/checkwatches-batching.txt 2010-01-14 09:57:16 +0000
4@@ -0,0 +1,134 @@
5+Batching in checkwatches
6+========================
7+
8+The checkwatches system tries to be sensitive to batching options
9+given. Specifically, the _getRemoteIdsToCheck() method is responsible
10+for batching up operations.
11+
12+ >>> import transaction
13+ >>> from lp.bugs.scripts.checkwatches import BugWatchUpdater
14+ >>> from pprint import pprint
15+
16+ >>> updater = BugWatchUpdater(transaction)
17+
18+
19+Basics
20+------
21+
22+ >>> class BasicRemoteSystem:
23+ ... pass
24+
25+ >>> remote = BasicRemoteSystem()
26+
27+When there are no bug watches to check, the result is empty.
28+
29+ >>> pprint(updater._getRemoteIdsToCheck(
30+ ... remote, [], batch_size=2))
31+ {'all_remote_ids': [],
32+ 'remote_ids_to_check': [],
33+ 'unmodified_remote_ids': []}
34+
35+With up to batch_size watches, it advises us to check all the remote
36+bug IDs given.
37+
38+ >>> bug_watches = [
39+ ... factory.makeBugWatch(remote_bug='a'),
40+ ... factory.makeBugWatch(remote_bug='b'),
41+ ... ]
42+
43+ >>> pprint(updater._getRemoteIdsToCheck(
44+ ... remote, bug_watches, batch_size=2))
45+ {'all_remote_ids': [u'a', u'b'],
46+ 'remote_ids_to_check': [u'a', u'b'],
47+ 'unmodified_remote_ids': []}
48+
49+With more than batch_size watches, it advises to only check a subset
50+of the remote bug IDs given.
51+
52+ >>> bug_watches = [
53+ ... factory.makeBugWatch(remote_bug='a'),
54+ ... factory.makeBugWatch(remote_bug='b'),
55+ ... factory.makeBugWatch(remote_bug='c'),
56+ ... ]
57+
58+ >>> pprint(updater._getRemoteIdsToCheck(
59+ ... remote, bug_watches, batch_size=2))
60+ {'all_remote_ids': [u'a', u'b'],
61+ 'remote_ids_to_check': [u'a', u'b'],
62+ 'unmodified_remote_ids': []}
63+
64+
65+Querying the remote system for modified bugs
66+--------------------------------------------
67+
68+For bug watches that have been checked before, the remote system is
69+asked which of a list of bugs have been modified since a given date.
70+
71+ >>> from datetime import datetime
72+ >>> from pytz import UTC
73+ >>> from zope.security.proxy import removeSecurityProxy
74+
75+ >>> class QueryableRemoteSystem:
76+ ... def getModifiedRemoteBugs(self, remote_bug_ids, timestamp):
77+ ... print "getModifiedRemoteBugs(%r, %r)" % (
78+ ... remote_bug_ids, timestamp)
79+ ... # Return every *other* bug ID for demo purposes.
80+ ... return remote_bug_ids[::2]
81+
82+ >>> remote = QueryableRemoteSystem()
83+ >>> now = datetime(2010, 01, 13, 16, 52, tzinfo=UTC)
84+
85+When there are no bug watches to check, the result is empty, and the
86+remote system is not queried.
87+
88+ >>> ids_to_check = updater._getRemoteIdsToCheck(
89+ ... remote, [], batch_size=2,
90+ ... server_time=now, now=now)
91+
92+ >>> pprint(ids_to_check)
93+ {'all_remote_ids': [],
94+ 'remote_ids_to_check': [],
95+ 'unmodified_remote_ids': []}
96+
97+With up to batch_size previously checked watches, the remote system is
98+queried once, and we are advised to check only one of the watches.
99+
100+ >>> bug_watches = [
101+ ... factory.makeBugWatch(remote_bug='a'),
102+ ... factory.makeBugWatch(remote_bug='b'),
103+ ... ]
104+ >>> for bug_watch in bug_watches:
105+ ... removeSecurityProxy(bug_watch).lastchecked = now
106+
107+ >>> ids_to_check = updater._getRemoteIdsToCheck(
108+ ... remote, bug_watches, batch_size=2,
109+ ... server_time=now, now=now)
110+ getModifiedRemoteBugs([u'a', u'b'], datetime.datetime(...))
111+
112+ >>> pprint(ids_to_check)
113+ {'all_remote_ids': [u'a', u'b'],
114+ 'remote_ids_to_check': [u'a'],
115+ 'unmodified_remote_ids': [u'b']}
116+
117+With just more than batch_size previously checked watches, the remote
118+system is queried twice, and we are advised to check two of the
119+watches.
120+
121+ >>> bug_watches = [
122+ ... factory.makeBugWatch(remote_bug='a'),
123+ ... factory.makeBugWatch(remote_bug='b'),
124+ ... factory.makeBugWatch(remote_bug='c'),
125+ ... ]
126+ >>> for bug_watch in bug_watches:
127+ ... removeSecurityProxy(bug_watch).lastchecked = now
128+
129+ >>> ids_to_check = updater._getRemoteIdsToCheck(
130+ ... remote, bug_watches, batch_size=2,
131+ ... server_time=now, now=now)
132+ getModifiedRemoteBugs([u'a', u'b'], datetime.datetime(...))
133+ getModifiedRemoteBugs([u'c'], datetime.datetime(...))
134+
135+ >>> pprint(ids_to_check)
136+ {'all_remote_ids': [u'a', u'c', u'b'],
137+ 'remote_ids_to_check': [u'a', u'c'],
138+ 'unmodified_remote_ids': [u'b']}
139
140=== modified file 'lib/lp/bugs/doc/externalbugtracker.txt'
141--- lib/lp/bugs/doc/externalbugtracker.txt 2009-12-22 11:48:01 +0000
142+++ lib/lp/bugs/doc/externalbugtracker.txt 2010-01-14 09:57:16 +0000
143@@ -401,6 +401,9 @@
144
145 === Limiting which bug watches to update ===
146
147+XXX: GavinPanella 2010-01-13 bug=507205: Move this section to
148+checkwatches-batching.txt.
149+
150 In order to reduce the amount of data we have to transfer over the
151 network, each IExternalBugTracker has the ability to filter out bugs
152 that haven't been modified. The method responsible for this is
153
154=== modified file 'lib/lp/bugs/scripts/checkwatches.py'
155--- lib/lp/bugs/scripts/checkwatches.py 2010-01-06 14:35:52 +0000
156+++ lib/lp/bugs/scripts/checkwatches.py 2010-01-14 09:57:16 +0000
157@@ -562,6 +562,21 @@
158 abs(server_time - now) > self.ACCEPTABLE_TIME_SKEW):
159 raise TooMuchTimeSkew(abs(server_time - now))
160
161+ # We limit the number of watches we're updating by the
162+ # ExternalBugTracker's batch_size. In an ideal world we'd just
163+ # slice the bug_watches list but for the sake of testing we need
164+ # to ensure that the list of bug watches is ordered by remote
165+ # bug id before we do so.
166+ if batch_size is None:
167+ # If a batch_size hasn't been passed, use the one specified
168+ # by the ExternalBugTracker.
169+ batch_size = remotesystem.batch_size
170+
171+ if batch_size == 0:
172+ # A batch_size of 0 means that there's no batch size limit
173+ # for this bug tracker.
174+ batch_size = None
175+
176 old_bug_watches = [
177 bug_watch for bug_watch in bug_watches
178 if bug_watch.lastchecked is not None]
179@@ -581,8 +596,22 @@
180 # are actually some bugs that we're interested in so as to
181 # avoid unnecessary network traffic.
182 if server_time is not None and len(remote_old_ids) > 0:
183- old_ids_to_check = remotesystem.getModifiedRemoteBugs(
184- remote_old_ids, oldest_lastchecked)
185+ if batch_size is None:
186+ old_ids_to_check = remotesystem.getModifiedRemoteBugs(
187+ remote_old_ids, oldest_lastchecked)
188+ else:
189+ # Don't ask the remote system about more than
190+ # batch_size bugs at once, but keep asking until we
191+ # run out of bugs to ask about or we have batch_size
192+ # bugs to check.
193+ old_ids_to_check = []
194+ for index in xrange(0, len(remote_old_ids), batch_size):
195+ old_ids_to_check.extend(
196+ remotesystem.getModifiedRemoteBugs(
197+ remote_old_ids[index : index + batch_size],
198+ oldest_lastchecked))
199+ if len(old_ids_to_check) >= batch_size:
200+ break
201 else:
202 old_ids_to_check = list(remote_old_ids)
203
204@@ -601,21 +630,6 @@
205 old_ids_to_check = sorted(
206 set(old_ids_to_check).difference(set(remote_ids_to_check)))
207
208- # We limit the number of watches we're updating by the
209- # ExternalBugTracker's batch_size. In an ideal world we'd just
210- # slice the bug_watches list but for the sake of testing we need
211- # to ensure that the list of bug watches is ordered by remote
212- # bug id before we do so.
213- if batch_size is None:
214- # If a batch_size hasn't been passed, use the one specified
215- # by the ExternalBugTracker.
216- batch_size = remotesystem.batch_size
217-
218- if batch_size == 0:
219- # A batch_size of 0 means that there's no batch size limit
220- # for this bug tracker.
221- batch_size = None
222-
223 if batch_size is not None:
224 # We'll recreate our remote_ids_to_check list so that it's
225 # prioritised. We always include remote ids with comments.