juju-core

Code review comment for lp://staging/~themue/juju-core/055-apiserver-heartbeat-analyzing

055-apiserver-heartbeat-analyzing
Merge into trunk

Revision history for this message

Roger Peppe (rogpeppe) wrote on 2013-11-12:

This is going in a reasonable direction, but I have a few comments to
make. It's a bit of a pity that this adds yet another goroutine per
connection, but I think that can be sorted out at some other time (we
could probably have one goroutine that deals with all the pinging). The
other thing that this CL does not do is actually close the client
connection. That can probably be left for a followup though.

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root.go
File state/apiserver/root.go (right):

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root.go#newcode251
state/apiserver/root.go:251: pinger, err := newSrvPinger(r,
3*time.Minute)
This seems weird to me. For a start, it's racy - technically a client is
allowed to call Ping twice concurrently, and if it does that, we'll have
two srvPingers active. Secondly, it seems odd that if a client never
bothers to ping, they will never get one of these allocated, so we'll
never get the timeout behaviour. Thirdly, just deleting all the
resources without killing the connection seems unlikely to be worth it.

I think you need to set this up inside apiRootForEntity.
Additionally, I don't think you should return the srvPinger directly
from this method - that exposes its Stop method to the API, which you
probably don't want to do. I make a suggestion below.

One other thing: I think the actual value of the duration should perhaps
be tied to the value used by the client (perhaps the client should use
the value divided by 10 or something)

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root.go#newcode312
state/apiserver/root.go:312: reset chan interface{}
s/interface{}/struct{}/

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root.go#newcode315
state/apiserver/root.go:315: // newSrvPinger creates a new pinger with a
running killer.
This type isn't really a pinger - it doesn't do any pinging.

How about something like this?

type resourceTimeout struct {
    tomb tomb.Tomb
    timer *time.Timer
    reset chan struct{}
    resource killer
}

// newResourceTimeout will call resource.Kill if
// Refresh is not repeatedly called on the returned resourceTimeout
// within the given timeout.
func newResourceTimeout(resource killer, timeout time.Duration)
*resourceTimeout

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root.go#newcode332
state/apiserver/root.go:332: p.reset <- struct{}{}
This needs to make sure that the pinger isn't being stopped
by selecting on the tomb's dying chan too.

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root.go#newcode337
state/apiserver/root.go:337: // p.tomb.Kill(nil)
Why is this commented out?

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root.go#newcode345
state/apiserver/root.go:345: defer close(p.reset)
Why close the reset channel?

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root.go#newcode350
state/apiserver/root.go:350: case <-time.After(p.timeout):
I think you should use NewTimer here (and Stop it after the select) to
avoid creating a bit of garbage each time a value is received on reset.
Even better, just use a single Timer instance inside p and call Reset on
that.

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root_test.go
File state/apiserver/root_test.go (right):

https://codereview.appspot.com/24040044/diff/1/state/apiserver/root_test.go#newcode65
state/apiserver/root_test.go:65: time.Sleep(10 * time.Millisecond)
This test will trigger a race detector error - there's no
synchronisation between the Kill call and the read of killed.

Better to make Kill send the time on a channel.

https://codereview.appspot.com/24040044/