Code review comment for lp://staging/~vila/bzr/525571-lock-bazaar-conf-files

Revision history for this message
Vincent Ladeuil (vila) wrote :

>>>>> Robert Collins <email address hidden> writes:

    > Review: Needs Fixing
    > Vila, I think I need to expand on this:

    >>> Perhaps that would be best tracked - when doing a read, if the object is not
    >>> locked, set self._can_write = False, and if it is, set it to True. Then check
    >>> that as well as whether it is locked, in _write_config.

    >> _can_write == is_locked() I see the idea but it won't change the end result nor the feedback we can give.

    > They are quite different.

    > If when you *read* you are unlocked, and you permit a *write*
    > without forcing a new read,

Which this proposal guards against.

    > changes from other processes are discarded.

Yes, that's a delicate problem. This proposal isn't perfect but good
enough to cover most of our use cases.

    > Here:
    > conf.read_something()
    > --client 2 writes to the file
    > conf.set_something()
    > --*boom* client 2's changes are lost

Yup, I added a test for that and saw it fail.

    > What we need is two fold: detection of this case (which setting
    > 'can write' during *read* if it is write locked at the time will
    > permit),

Right, but what will we gain here ? The ability to *not* force the
refreshing read if someone already did it while write locked ?

Yet, it goes in the same direction as implementing a no-op read lock,
which we may want to add too, so, worth considering.

    > and easy api use (which may be less immediate depending on
    > impact).

    > Alternatively lock_write could possibly trigger a read after
    > locking before returning, which would cause properly lock_write()
    > guarded mutators to work correctly - as they would apply the
    > mutation within the locked context.

Inside the lock scope, a write should always happen *after* a refreshing
read or you'll lose changes for variables you're not changing (the
current API change one variable at a time).

Of course you always lose any change to the variable you're changing and
that in itself could be a problem if processes try to use config
variables for synchronization (a really bad idea), like an incremented
counter.

    > In short - concurrency is hard, lets drink beer.

+1

There are various lock schemes with different scopes that can be tried,
but so far, I haven't found a good way to address:

- client1 read 'var'
- client1 starts a "long" process based on 'var'
- client2 set a new value for 'var'
- client1 finished its process with a now outdated value for 'var'

Short of blocking client2 for the duration of client1 process, I can't
see how to address this.

But do we care ?

It's hard to decide without concrete examples, if 'var' is a parent
branch location and the process is pull, chances are the user really
want its pull to finish and changing 'var' to point to a new branch
should just be considered the next time we try to pull (alternate
scenarios includes: abort the pull (and rollback the already fetched
revisions (eeerk)), block the write (i.e. block the whole config file
for the duration of client1 process (omg are you crazy ?))).

Overall, I agree that we should continue thinking about which lock model
we want here, the proposed one is incrementally better and at least
address the bug better than the minimal one landed for 2.1.

Orthogonal to the lock scope, sharing a config files at the
wt/branch/repo level could at least avoid reading the config file for
each variable which could make read locks less painful, but still, I
don't think it would be enough.

« Back to merge proposal