Daisy

Merge lp://staging/~ev/daisy/weighted-tests into lp://staging/daisy

weighted-tests
Merge into trunk

Proposed by Evan on 2013-04-19

Status:	Merged
Merged at revision:	327
Proposed branch:	lp://staging/~ev/daisy/weighted-tests
Merge into:	lp://staging/daisy
Diff against target:	426 lines (+283/-15) (has conflicts) 11 files modified daisy/__init__.py (+1/-1) daisy/configuration.py (+1/-1) daisy/constants.py (+34/-0) daisy/retracer.py (+1/-1) daisy/schema.py (+13/-1) daisy/submit.py (+1/-1) daisy/submit_core.py (+1/-1) test/test_weighting.py (+125/-0) tools/build_errors_by_release.py (+17/-9) tools/unique_systems_for_errors_by_release.py (+44/-0) tools/weight_errors_per_day.py (+45/-0) Text conflict in daisy/schema.py
To merge this branch:	bzr merge lp://staging/~ev/daisy/weighted-tests
Related bugs:	Link a bug report

Reviewer	Date Requested	Status
Brian Murray (community)	2013-04-19	Approve on 2013-04-29
Matthew Paul Thomas (community)	2013-04-19	Needs Fixing on 2013-04-26
Review via email: mp+159795@code.staging.launchpad.net

Description of the change

This branch adds code to use the data found in ErrorsByRelease and UniqueSystemsForErrorsByRelease (to be created) to weight the average errors per calendar day.

It adds a test which confirms that for a system which reported an error in 12.04 a week ago, a day after that, and a day after that, the weightings are 0, 1/90, and 2/90, respectively.

Finally, it adds a script to calculate the unique systems in SystemsForErrorsByRelease daily. This will replace tools/unique_users_daily_update.py once we've proven the weighting to be accurate.

The SystemsForErrorsByRelease column family, which keeps track of the systems that have been weighted in each calendar day is necessary because the previous iteration, UniqueUsers90Days, is global. It represents all the systems that report into the error tracker for each release, regardless of whether their reports are for Ubuntu or some derivative.

It's worth noting that this branch doesn't change the back-population code (tools/build_errors_by_release.py) other than making the print statements optional. I therefor think that running that again to populate the new system identifiers column family, and to address the need to run the script twice that I mention in the unit test below, will fix the inaccuracy of the weighting output.

lp://staging/~ev/daisy/weighted-tests updated on 2013-04-24

320. By Evan on 2013-04-24: Build bigger buffers in get_range.

Revision history for this message

Matthew Paul Thomas (mpt) wrote on 2013-04-26:

> + # On the first day we had any error reports, the weighting would be 0
> + # because 0 days have past since the first report.

"passed"

> + # The second report is one day after the first and the only report of
> + # the day.
> + self.assertEqual(weights[timestamps[1] / 1e6], 1/90.0)
> +
> + # The third report is two days after the first and the only report of
> + # the day.
> + self.assertEqual(weights[timestamps[2] / 1e6], 2/90.0)

Commas after "first" would make these comments easier to understand.

> + working_date = target_date - datetime.timedelta(days=89)
>...
> + adj = min(day_difference, 90) / 90.0

These lines are far apart from each other and not obviously related. So if later we decide to change the ramp-up to 30 days, for example (improving responsiveness at the expense of spikes), you or someone else might easily change the latter while forgetting the former. I suggest using the same constant in both lines (e.g. datetime.timedelta(days=RAMPUP-1)), and including a comment explaining what it's for.

review: Needs Fixing

lp://staging/~ev/daisy/weighted-tests updated on 2013-04-26

321. By Evan on 2013-04-26: Address Matthew's concerns from the merge proposal. Move 90 days into a RAMP_UP constant. Fix grammatical errors.

Revision history for this message

Brian Murray (brian-murray) wrote on 2013-04-29:

+def main(release, start, end, verbose=False):
+ start = start.replace(hour=0, minute=0, second=0, microsecond=0)
+ end = end.replace(hour=0, minute=0, second=0, microsecond=0)
+
+ creds = {'username': config.cassandra_username,
+ 'password': config.cassandra_password}
+ pool = pycassa.ConnectionPool(config.cassandra_keyspace,
+ config.cassandra_hosts, timeout=600,
+ credentials=creds)
+
+ systems = pycassa.ColumnFamily(pool, 'SystemsForErrorsByRelease')
+ uniquesys = pycassa.ColumnFamily(pool, 'UniqueSystemsForErrorsByRelease')
+
+ while start <= end:
+ target_date = start.replace(hour=0, minute=0, second=0, microsecond=0)

hour, minute, second and microsecond are replace two times for start.

RAMP_UP is set to 90 in two separate python files. Perhaps this should be a config option somewhere instead?

+def weight(release='Ubuntu 12.04'):

Well, we talked about that.

review: Approve

lp://staging/~ev/daisy/weighted-tests updated on 2013-04-30

322. By Evan on 2013-04-30: Redundant.
323. By Evan on 2013-04-30: Add a constants module.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to all changes:

Brian Murray

Evan

Daisy

Merge lp://staging/~ev/daisy/weighted-tests into lp://staging/daisy

Commit message

Description of the change

Preview Diff

Subscribers