On Mon, Mar 18, 2013 at 05:52:20PM -0000, Evan Dandrea wrote: > Evan Dandrea has proposed merging > lp:~ev/oops-repository/weighted-machines into > lp:~daisy-pluckers/oops-repository/trunk. > > Requested reviews: > Daisy Pluckers (daisy-pluckers) > > For more details, see: > https://code.launchpad.net/~ev/oops-repository/weighted-machines/+merge/153887 > > This adds a new method, update_errors_by_release, which populates the > FirstError column family with the first occurrence of an error for the > given system identifier in the given Ubuntu release. It then writes > this first occurrence into the ErrorsByRelease CF for the given Ubuntu > release and today's date under the OOPS ID. This allows us to look up > all the errors that occurred for an Ubuntu release for each day, > weighting each error in the result set by how many days its been since > its first error, as discussed in bug 1077122. > > More complete details of the implementation can be found in bug > 1077122. The big difference from that write up is that instead of > using a new uuid1() for the column name in ErrorsByRelease, we re-use > the OOPS ID UUID, so that the script in > https://code.launchpad.net/~ev/daisy/weighted-machines/+merge/153885 > can be idempotent on multiple runs. > -- > https://code.launchpad.net/~ev/oops-repository/weighted-machines/+merge/153887 > Your team Daisy Pluckers is requested to review the proposed merge of lp:~ev/oops-repository/weighted-machines into lp:~daisy-pluckers/oops-repository/trunk. > === modified file 'oopsrepository/oopses.py' > --- oopsrepository/oopses.py 2013-03-12 00:22:41 +0000 > +++ oopsrepository/oopses.py 2013-03-18 17:51:27 +0000 > @@ -9,6 +9,7 @@ > import json > import time > import uuid > +import datetime > > import pycassa > from pycassa.cassandra.ttypes import NotFoundException > @@ -186,6 +187,37 @@ > except NotFoundException: > return None > > +def update_errors_by_release(config, oops_id, system_token, release): > + release = release.encode('utf8') > + pool = connection_pool(config) > + firsterror = pycassa.ColumnFamily(pool, 'FirstError') > + errorsbyrelease = pycassa.ColumnFamily(pool, 'ErrorsByRelease') > + > + today = datetime.datetime.today() > + today = today.replace(hour=0, minute=0, second=0, microsecond=0) > + try: > + first_error_date = firsterror.get(release, columns=[system_token]) > + first_error_date = first_error_date[system_token] > + except NotFoundException: > + firsterror.insert(release, {system_token: today}) > + first_error_date = today > + > + # We use the OOPS ID rather than the system identifier here because we want > + # each crash from a system to take up a new column in this column family. > + # Each one of those columns should be associated with the date of the first > + # error for the system in this release. > + # > + # Remember, we're ultimately tracking errors here, not systems, but we need > + # the system idnetifier to know the first occurance of an error in the typo: identifier > + # release for that machine. > + # > + # For the given release for today, the crash should be weighted by the > + # first time an error occured in the release for the system this came from. > + # Multipled by their weight and summed together, these form the numerator typo: Multiplied Otherwise the rest of the merge proposal looks good to me. -- Brian Murray