On Mon, Mar 18, 2013 at 05:52:20PM -0000, Evan Dandrea wrote:
> Evan Dandrea has proposed merging
> lp:~ev/oops-repository/weighted-machines into
> lp:~daisy-pluckers/oops-repository/trunk.
> 
> Requested reviews:
>   Daisy Pluckers (daisy-pluckers)
> 
> For more details, see:
> https://code.launchpad.net/~ev/oops-repository/weighted-machines/+merge/153887
> 
> This adds a new method, update_errors_by_release, which populates the
> FirstError column family with the first occurrence of an error for the
> given system identifier in the given Ubuntu release. It then writes
> this first occurrence into the ErrorsByRelease CF for the given Ubuntu
> release and today's date under the OOPS ID. This allows us to look up
> all the errors that occurred for an Ubuntu release for each day,
> weighting each error in the result set by how many days its been since
> its first error, as discussed in bug 1077122.
> 
> More complete details of the implementation can be found in bug
> 1077122. The big difference from that write up is that instead of
> using a new uuid1() for the column name in ErrorsByRelease, we re-use
> the OOPS ID UUID, so that the script in
> https://code.launchpad.net/~ev/daisy/weighted-machines/+merge/153885
> can be idempotent on multiple runs.

> -- 
> https://code.launchpad.net/~ev/oops-repository/weighted-machines/+merge/153887
> Your team Daisy Pluckers is requested to review the proposed merge of lp:~ev/oops-repository/weighted-machines into lp:~daisy-pluckers/oops-repository/trunk.

> === modified file 'oopsrepository/oopses.py'
> --- oopsrepository/oopses.py	2013-03-12 00:22:41 +0000
> +++ oopsrepository/oopses.py	2013-03-18 17:51:27 +0000
> @@ -9,6 +9,7 @@
>  import json
>  import time
>  import uuid
> +import datetime
>  
>  import pycassa
>  from pycassa.cassandra.ttypes import NotFoundException
> @@ -186,6 +187,37 @@
>      except NotFoundException:
>          return None 
>  
> +def update_errors_by_release(config, oops_id, system_token, release):
> +    release = release.encode('utf8')
> +    pool = connection_pool(config)
> +    firsterror = pycassa.ColumnFamily(pool, 'FirstError')
> +    errorsbyrelease = pycassa.ColumnFamily(pool, 'ErrorsByRelease')
> +
> +    today = datetime.datetime.today()
> +    today = today.replace(hour=0, minute=0, second=0, microsecond=0)
> +    try:
> +        first_error_date = firsterror.get(release, columns=[system_token])
> +        first_error_date = first_error_date[system_token]
> +    except NotFoundException:
> +        firsterror.insert(release, {system_token: today})
> +        first_error_date = today
> +
> +    # We use the OOPS ID rather than the system identifier here because we want
> +    # each crash from a system to take up a new column in this column family.
> +    # Each one of those columns should be associated with the date of the first
> +    # error for the system in this release.
> +    # 
> +    # Remember, we're ultimately tracking errors here, not systems, but we need
> +    # the system idnetifier to know the first occurance of an error in the

typo: identifier

> +    # release for that machine.
> +    #
> +    # For the given release for today, the crash should be weighted by the
> +    # first time an error occured in the release for the system this came from.
> +    # Multipled by their weight and summed together, these form the numerator

typo: Multiplied

Otherwise the rest of the merge proposal looks good to me.

--
Brian Murray