Code review comment for lp://staging/~wallyworld/python-jujuclient/retry-on-upgrade

Revision history for this message
Ian Booth (wallyworld) wrote :

> I don't think the proposed change will fix the bug.
>
> The error in the linked bug occurred when deployer tried to look at the status
> (called .get_stat()), which made a new watcher and (tried to) started it.
> That's when the error that Juju was upgrading occurred.
>
> This branch would make that RPC call to start a new watcher get retried,
> whilst Juju is upgrading. However, once the upgrade is complete, IIUC one
> would need to re-authenticate to Juju (and probably renegotiate facade
> versioning - we've just got a new version of Juju!).
>
> In other words, the retry is at the wrong level.

There's appears to be a little misunderstanding of what this fix does. It does not handle the case where Juju disconnects clients when the agents reboot after an agent upgrade.

What the fix does is this. When the Juju agent starts up, the API is in a limited "upgrading" mode while Juju a) performs any version upgrade steps, and b) checks if any agent upgrades are required. Until both of these complete, any attempts to do much more than get status will result in an "upgrade in progress" error. There are no agent restarts during any of this. So the deployer will not become disconnected. But it will get confused in the short window where an "upgrade in progress" error is reported, so this MP teaches it how to deal with that. It simply retries until the full API becomes available, which is what happens 99.9999% of the time when no agent upgrades are necessary.

« Back to merge proposal