Merge lp://staging/~qzhang/lava-scheduler/add-labhealth-page into lp://staging/lava-scheduler

Proposed by Paul Larson
Status: Merged
Approved by: Michael Hudson-Doyle
Approved revision: 132
Merged at revision: 132
Proposed branch: lp://staging/~qzhang/lava-scheduler/add-labhealth-page
Merge into: lp://staging/lava-scheduler
Diff against target: 622 lines (+481/-2)
10 files modified
lava_scheduler_app/migrations/0014_auto__add_field_device_health_status__add_field_device_last_health_rep.py (+124/-0)
lava_scheduler_app/migrations/0015_auto__add_field_testjob_health_check.py (+119/-0)
lava_scheduler_app/models.py (+29/-2)
lava_scheduler_app/templates/lava_scheduler_app/device.html (+4/-0)
lava_scheduler_app/templates/lava_scheduler_app/health_jobs.html (+105/-0)
lava_scheduler_app/templates/lava_scheduler_app/index.html (+4/-0)
lava_scheduler_app/templates/lava_scheduler_app/labhealth.html (+43/-0)
lava_scheduler_app/urls.py (+6/-0)
lava_scheduler_app/views.py (+38/-0)
lava_scheduler_daemon/dbjobsource.py (+9/-0)
To merge this branch: bzr merge lp://staging/~qzhang/lava-scheduler/add-labhealth-page
Reviewer Review Type Date Requested Status
Michael Hudson-Doyle (community) Approve
Spring Zhang (community) Needs Resubmitting
Zygmunt Krynicki (community) Needs Fixing
Review via email: mp+89254@code.staging.launchpad.net

Description of the change

Springz's lab health view... proposing so we can see it easier.

To post a comment you must log in.
Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Download full text (5.0 KiB)

General comment: Since I did not realize this was a part of the scheduler app earlier I tend to think (now) that DeviceHealth should be merged with the Device model.

167 + device = models.ForeignKey(Device, verbose_name=_(u"Device"))

Do you expect to have more than one instance of DeviceHealth for a given Device (as in, historic device health). If not then consider changing this to be OneToOneField or at least add unique=True

175 + last_report_time = models.DateTimeField(
176 + verbose_name = _(u"Last Report Time"),
177 + auto_now = False,
178 + null = True,
179 + blank = True,
180 + )

What is this field for? Is it any different to last_report_job.

187 + def put_into_sick(self):
188 + self.status = self.SICK
189 + self.save()
190 +
191 + def put_into_healthy(self):
192 + self.status = self.HEALTHY
193 + self.save()

IMHO there is nothing wrong with just self.status = self.HEALTY (or STATUS_HEALTY). We should definitely avoid calling .save() as that is separate responsibility (and the few methods that really really need to call save have a should_save=True) argument as well to override this.

157 + UNKNOWN = 0
158 + HEALTHY = 1
159 + SICK = 2

I'd prepend STATUS_ to each and use range(3) as in STATUS_UNKNOWN, STATUS_HEALTHY, STATUS_SICK = range(3)

195 + def latest_job(self):
196 + return TestJob.objects.select_related(
197 + "actual_device",
198 + "description",
199 + "status",
200 + "end_time"
201 + ).filter(
202 + actual_device=self.device.hostname,
203 + description__contains="lab health"
204 + ).latest('end_time')

IMHO the condition to filter on description__contains device health is far too loose. It will forever special case two words that may otherwise be used by anything. Doing this (adding hidden foreign keys based on some string value / pattern) is not very good practice. This will also be quite slow as you always need to scan the TestJob table.

I'd suggest an alternative:

1) Use observer pattern to spot bundles as they show up in the system
2) Define a configurable bundle stream for lab health (more on that later)
3) If a bundle is added to that stream update/create DeviceHealth instances

On configuration: Currently it's okay to hardcode certain configuration pieces. This will change in the future. I just wrote and remove a long description of the upcoming changes not to muddy the water here.

221 +<h2><a href="{% url lava.scheduler.labhealth.all %}">Lab Health</a></h2>

282 + url(r'^labhealth/$',
283 + 'lab_health',
284 + name='lava.scheduler.labhealth.all'),

Please use a more descriptive URL name, 'lava.scheduler.lab-health' sounds better to me.

+ {% for devicehealth in labhealth %}

Please use device_health_list instead of labhealth (django naming pattern for templates)

313 +@BreadCrumb("All Device Health", parent=index)
314 +def lab_health(request):
315 + labhealth = DeviceHealth.objects.select_related(
316 + "device", "status").all()
317 + for devicehealth in labhealth:
318 + try:
319 + if devicehealth.device.status != Device.OFFLINE:
320 + latest_job = devicehealth.latest_job()
321 + if latest_job.status == TestJob.COMPLETE:
322 + devicehealth.put_into_healthy()
323 + if latest_job.status == TestJob.INCOMPLET...

Read more...

review: Needs Fixing
116. By Spring Zhang

Move device health to Device model

117. By Spring Zhang

Fix issues introduced by integrating DH to Device

Revision history for this message
Spring Zhang (qzhang) wrote :

According to zyga's comments, move device health to a field in Device model, and fix some issues zyga mentioned, but not all. Still going to apply comments from connect session.

review: Needs Resubmitting
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Much better than before. Good work Spring!

General comment:

I think you and mwhudson are working on a single problem. This is all fine but I think that you _have to_ discuss how this will join. In particular:

1) Lab health jobs will trigger state transitions that are recorded as DeviceStateTransition.
2) The same transition is recorded as device becoming SICK/HEALTHY
Feels like an overlap somewhere. Perhaps it will help you to figure this out if you consider how to answer those questions:

* Is this device working?
* Why is it not working?
* How can I see that it is broken? (hint: job that shows it failed)

Comments below code fragments.

295 + if device_health.status != Device.OFFLINE:
296 + latest_health_job = device_health.latest_health_job()
297 + if latest_health_job.status == TestJob.COMPLETE:
298 + device_health.put_into_healthy()
299 + if latest_health_job.status == TestJob.INCOMPLETE:
300 + device_health.put_into_sick()
301 + device_health.set_last_health_report_job(latest_health_job)

Same comment as before: don't update the DB in a view like that.

185 + actual_device=self.hostname,
186 + description__contains="lab health"

Again, that's not the way I'd like us to detect lab health jobs. I'd suggest marking some jobs as explicit health check jobs at the DB level (adding another column for that).

189 + def set_last_health_report_job(self, job):
190 + self.last_health_report_job = job

This function is pointless. We have properties for a reason. Just assign stuff directly.

161 + last_health_report_job = models.ForeignKey(
162 + "TestJob", blank=True, unique=True, null=True,
163 + related_name=_(u"Health Report Job"))

This feels wrong to me. I don't want to update a field just because we have a new entry in the database. That's what relations are for: Job.filter(device=self, health_check=True).latest(). This also nullifies the need for having latest_health_job()

PS: I did not check the migration code but make sure you read and understand the consequences of having multiple people hacking on a single migration-using codebase: http://south.aeracode.org/docs/tutorial/part5.html

Needs fixing as you know but this is excellent improvement

review: Needs Fixing
118. By Spring Zhang

Remove migration rules to sync with mainline

119. By Spring Zhang

sync with mainline

120. By Spring Zhang

sync with mainline

121. By Spring Zhang

Move health_status changing code from view to TestJob

122. By Spring Zhang

Change some field name

123. By Spring Zhang

Improve individual device health job list

124. By Spring Zhang

update index.html to include device health but the table seems not displayed

125. By Spring Zhang

fix why index page can't work for device health

126. By Spring Zhang

put board to maintenance status when health check job fails

Revision history for this message
Spring Zhang (qzhang) wrote :

1. Add health_check field in TestJob
2. change health_status in jobCompleted_impl()
3. Include lab health in index page
4. Add a trigger to put board offline when a health check job failed
5. Respond to some comments
6. Not apply a singal yet

review: Needs Resubmitting
127. By Spring Zhang

clean not used methods

128. By Spring Zhang

delete an extra space

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :
Download full text (11.1 KiB)

Hi Spring,

Sorry for taking so long to get around to looking at this.

Mostly I really like it. I have a few general comments, which perhaps
can be taken care of in subsequent branches and a few minor little
details.

The general points:

1) I think the view changes make the lab health stuff perhaps a bit
   too obvious. They are mostly views for us, so having a lab health
   table on the default scheduler view is a bit much.

2) The device page should show the job health.

3) This doesn't do anything to help us run the health jobs. I wonder
   if the way we should do this is to implement job resubmission
   (there is a spec about this) and then have a way of resubmitting
   the last known health check for each board ... this is just a
   ramble, nothing in this branch needs to change.

4) I think we need to think a bit about how health and status
   interact. I think maybe it would make sense for a board to run
   health checks only if its health status is HEALTH_UNKNOWN. "Put
   online" when heath is HEALTH_SICK should set the status to
   HEALTH_UNKNOWN. That sort of thing.

I'd like to see some action on 1 & 2 in the branch, 3 & 4 can
definitely wait...

> === modified file 'lava_scheduler_app/models.py'
> --- lava_scheduler_app/models.py 2012-02-15 00:22:12 +0000
> +++ lava_scheduler_app/models.py 2012-02-15 02:14:00 +0000
> @@ -50,6 +50,14 @@
> (OFFLINING, 'Going offline'),
> )
>
> + # A device health shows a device is ready to test or not
> + HEALTH_UNKNOWN, HEALTH_HEALTHY, HEALTH_SICK = range(3)
> + HEALTH_CHOICES = (
> + (HEALTH_UNKNOWN, 'Unknown'),
> + (HEALTH_HEALTHY, 'Healthy'),
> + (HEALTH_SICK, 'Sick'),
> + )
> +
> hostname = models.CharField(
> verbose_name = _(u"Hostname"),
> max_length = 200,
> @@ -60,7 +68,8 @@
> DeviceType, verbose_name=_(u"Device type"))
>
> current_job = models.ForeignKey(
> - "TestJob", blank=True, unique=True, null=True)
> + "TestJob", blank=True, unique=True, null=True,
> + related_name=_(u"Current Job"))

You don't want to set related_name to this. See

https://docs.djangoproject.com/en/dev/ref/models/fields/#django.db.models.ForeignKey.related_name

I think in this case, setting related_name to '+' (which disables the
backwards relation) would be fine.

>
> tags = models.ManyToManyField(Tag, blank=True)
>
> @@ -70,6 +79,16 @@
> verbose_name = _(u"Device status"),
> )
>
> + health_status = models.IntegerField(
> + choices = HEALTH_CHOICES,
> + default = HEALTH_UNKNOWN,
> + verbose_name = _(u"Device Health"),
> + )
> +
> + last_health_report_job = models.ForeignKey(
> + "TestJob", blank=True, unique=True, null=True,
> + related_name=_(u"Health Report Job"))

Again, this is not what related_name is for. I think the backwards
relation can be suppressed here too.

> def __unicode__(self):
> return self.hostname
>
> @@ -77,6 +96,10 @@
> def get_absolute_url(self):
> return ("lava.scheduler.device.detail", [self.pk])
>
> + @models.permalink
> + def get_device_health_url(self):
> + retu...

129. By Spring Zhang

modify to comments

130. By Spring Zhang

Remove detail device health on index.html

131. By Spring Zhang

add health_status in index.html

132. By Spring Zhang

add health status in device page

Revision history for this message
Spring Zhang (qzhang) wrote :

Much appreciate to Michael’s comments, I have done some modifications:
1. remove detail device health on index.html
2. add health status in device page

Yes there is duplicated html code in health_jobs.html and device.html, I'll make a template to reuse common code later, also comment 3&4.

review: Needs Resubmitting
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Looks good, thanks for the changes! Please merge.

review: Approve
133. By Spring Zhang

sync with mainline

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Great work Spring :-)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches