Merge lp://staging/~ev/daisy/weighted-tests into lp://staging/daisy

Proposed by Evan
Status: Merged
Merged at revision: 327
Proposed branch: lp://staging/~ev/daisy/weighted-tests
Merge into: lp://staging/daisy
Diff against target: 426 lines (+283/-15) (has conflicts)
11 files modified
daisy/__init__.py (+1/-1)
daisy/configuration.py (+1/-1)
daisy/constants.py (+34/-0)
daisy/retracer.py (+1/-1)
daisy/schema.py (+13/-1)
daisy/submit.py (+1/-1)
daisy/submit_core.py (+1/-1)
test/test_weighting.py (+125/-0)
tools/build_errors_by_release.py (+17/-9)
tools/unique_systems_for_errors_by_release.py (+44/-0)
tools/weight_errors_per_day.py (+45/-0)
Text conflict in daisy/schema.py
To merge this branch: bzr merge lp://staging/~ev/daisy/weighted-tests
Reviewer Review Type Date Requested Status
Brian Murray (community) Approve
Matthew Paul Thomas (community) Needs Fixing
Review via email: mp+159795@code.staging.launchpad.net

Description of the change

This branch adds code to use the data found in ErrorsByRelease and UniqueSystemsForErrorsByRelease (to be created) to weight the average errors per calendar day.

It adds a test which confirms that for a system which reported an error in 12.04 a week ago, a day after that, and a day after that, the weightings are 0, 1/90, and 2/90, respectively.

Finally, it adds a script to calculate the unique systems in SystemsForErrorsByRelease daily. This will replace tools/unique_users_daily_update.py once we've proven the weighting to be accurate.

The SystemsForErrorsByRelease column family, which keeps track of the systems that have been weighted in each calendar day is necessary because the previous iteration, UniqueUsers90Days, is global. It represents all the systems that report into the error tracker for each release, regardless of whether their reports are for Ubuntu or some derivative.

It's worth noting that this branch doesn't change the back-population code (tools/build_errors_by_release.py) other than making the print statements optional. I therefor think that running that again to populate the new system identifiers column family, and to address the need to run the script twice that I mention in the unit test below, will fix the inaccuracy of the weighting output.

To post a comment you must log in.
320. By Evan

Build bigger buffers in get_range.

Revision history for this message
Matthew Paul Thomas (mpt) wrote :

> + # On the first day we had any error reports, the weighting would be 0
> + # because 0 days have past since the first report.

"passed"

> + # The second report is one day after the first and the only report of
> + # the day.
> + self.assertEqual(weights[timestamps[1] / 1e6], 1/90.0)
> +
> + # The third report is two days after the first and the only report of
> + # the day.
> + self.assertEqual(weights[timestamps[2] / 1e6], 2/90.0)

Commas after "first" would make these comments easier to understand.

> + working_date = target_date - datetime.timedelta(days=89)
>...
> + adj = min(day_difference, 90) / 90.0

These lines are far apart from each other and not obviously related. So if later we decide to change the ramp-up to 30 days, for example (improving responsiveness at the expense of spikes), you or someone else might easily change the latter while forgetting the former. I suggest using the same constant in both lines (e.g. datetime.timedelta(days=RAMPUP-1)), and including a comment explaining what it's for.

review: Needs Fixing
321. By Evan

Address Matthew's concerns from the merge proposal. Move 90 days into a RAMP_UP constant. Fix grammatical errors.

Revision history for this message
Brian Murray (brian-murray) wrote :

+def main(release, start, end, verbose=False):
+ start = start.replace(hour=0, minute=0, second=0, microsecond=0)
+ end = end.replace(hour=0, minute=0, second=0, microsecond=0)
+
+ creds = {'username': config.cassandra_username,
+ 'password': config.cassandra_password}
+ pool = pycassa.ConnectionPool(config.cassandra_keyspace,
+ config.cassandra_hosts, timeout=600,
+ credentials=creds)
+
+ systems = pycassa.ColumnFamily(pool, 'SystemsForErrorsByRelease')
+ uniquesys = pycassa.ColumnFamily(pool, 'UniqueSystemsForErrorsByRelease')
+
+ while start <= end:
+ target_date = start.replace(hour=0, minute=0, second=0, microsecond=0)

hour, minute, second and microsecond are replace two times for start.

RAMP_UP is set to 90 in two separate python files. Perhaps this should be a config option somewhere instead?

+def weight(release='Ubuntu 12.04'):

Well, we talked about that.

review: Approve
322. By Evan

Redundant.

323. By Evan

Add a constants module.

324. By Evan

Update copyright dates.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to all changes: