From: Noveck, Dave (dave.noveck@netapp.com)
Date: 01/11/00-10:28:45 AM Z
Message-ID: <4080CE03B682D311B589009027C2286638D422@tahoe.corp.netapp.com>
From: "Noveck, Dave" <dave.noveck@netapp.com>
Subject: RE: "indivisible lock" semantics
Date: Tue, 11 Jan 2000 08:28:45 -0800
> Spencer and I exchanged some mail last week about the fact that v4
> locks are indivisible (i.e., they follow Windows semantics, rather
> than Unix semantics). The rest of this message contains excerpts from
> two messages that I sent to Spencer.
>
> I'm not proposing a change to the protocol, but I would like to put
> this on the table for discussion, to see if anyone has further
> comments that might impact the protocol or spec. The reason I'm not
> proposing a protocol change is that I think it's rare for Unix
> applications to take advantage of "divisible lock" semantics. Given
> the date, I think it's more important to move forward with what we
> have.
I agree that this is not something that should cause us to delay the
protocol. I'm not even sure that we would need to accomodate this
if we had lots of spare time.
I'm curious about the adjective "rare" as opposed to "non-existent".
I'm not saying that such programs don't exist, but I've never
seen one. Has anybody else?
>
> Excerpt #1:
>
> >>>>> "Mike" == Mike Kupfer <Mike.Kupfer@eng.sun.com> writes:
>
> Mike> Page 58, Section 8.2.3: for a Unix client talking to a Win32
> Mike> server, some entity, either the client or the server, has to
> Mike> map a Posix locking request to (potentially) a series of
> Mike> Win32 locking requests. The series of Win32 requests should
> Mike> be atomic. For example, it is a legal Posix request to
> Mike> unlock a hole out of the middle of an existing lock. The
> Mike> Win32 API expresses this as removing the old lock and
> Mike> creating two new ones. The atomicity requirement is that
> Mike> nobody else create a conflicting lock between the time the
> Mike> first lock is removed and the two new locks are created.
>
> Mike> The I-D places this responsibility on the client, asserting
> Mike> that it is easier for the (Unix) client to do this
> Mike> conversion than it is for the (Win32) server. I think this
> Mike> is false, particularly in light of the atomicity issue.
>
> Mike> I don't see any simple ways to deal with the atomicity
> Mike> issue. Fortunately, it's rare for a Unix application to
> Mike> change subranges of an existing lock, so perhaps the
> Mike> protocol need not provide an atomicity guarantee. If a
> Mike> conflicting lock is created in the middle of a series of
> Mike> "Win32" operations, the client could simply treat the lock
> Mike> as being lost and notify the application.
It wouldn't necessarily know. It could be that another client got
a conflicting lock and gave it up before you got around to getting
the second locked region. The result would be that you wouldn't
have the atomicity guarantee and you wouldn't know about the conflicting
lock.
As you say, the assertion that this allows you to duplicate UNIX
semantics is false. I think the spec should say that the protoocl
allows you to provide a useful subset ("the useful subset" would be
more accurate, as far as I can see) of UNIX semantics but that
atomically unlocking a subrange is not possible.
>
> Mike> So I guess I'm asking that the last paragraph of Section
> Mike> 8.2.3 be rewritten to clarify this issue. Let me know if
> Mike> you want help with specific verbage.
>
> Excerpt #2:
>
> >>>>> "Mike" == Mike Kupfer <Mike.Kupfer@eng.sun.com> writes:
>
> Mike> Hmm, the more I look at the indivisible locks semantics, the
> Mike> less I like them. Suppose you have two processes, A and B,
> Mike> each of which has a read lock on a byte range. Now suppose
> Mike> process A submits a blocking request (F_SETLKW) to upgrade
> Mike> half the region to a write lock. You can do this with the
> Mike> current NLM protocol. Process A simply requests the upgrade
> Mike> and goes to sleep until the server grants it. With
> Mike> indivisible locks, process A goes to sleep and wakes up
> Mike> every now and then to see if there are conflicting locks for
> Mike> the range. Once there are no more conflicts, process A can
> Mike> release its lock and obtain the 2 new locks (one for the new
> Mike> write-lock range, one for the remaining read-lock range).
> Mike> Yuck.
I want to be clear about the details here. Let's suppose that
region Z consists of two abutting sub-regions: X and Y.
In your example, process A has a read lock on Z and is attempting
to upgrade his lock on the sub-region X to a write lock.
The overall transition [read(Z)] -> [write(X)+read(Y)] is *not*
atomic in the UNIX locking implementations that I've seen. If
there were two processes trying to do this, they would either
have to wait forever or get EDEADLCK and I don't think that
happens.
In the code I've seen, there are two separate atomic state transitions,
one to unlock X and then one to get the write lock on X. Conflicting
locks on the region X are not excluded between the two state transitions.
Region Y does stay locked throughout and no conflicting locks can happen
in this region.
The following are each atomic but the sequence is not atomic (it's
molecular):
1) [read(X)] -> [read(Z)]
2) [read(Z)] -> [write(X)+read(Z)]
My understanding is based on the flock implementation in Irix.
I did a rewrite for performance and scalability and so I looked
pretty closely at making sure that I was duplicating the existing
semantics, although this was over a year ago and I may have forgotten
some part of this. I didn't do an exhaustive comparison but this
aspect of the locking implementation seemed to come over pretty
directly from BSD.
Does anybody have a different understanding of what UNIX semantics
are in this regard?
The question of upgrade poses some issues even apart form the
divisiblity issue. If you simplify your example to a single region and
say that two processes are trying to upgrade a lock for the region X
from read to write. With the spec as it is, what happens? A and
B both keeps sending upgrade reequests which are denied. They
continue to poll and maybe do it slower after a while but they
don't make any progress.
There is no way that the server can ever grant A's request for
upgrade without revoking B's lock. The conclusion that I draw
is that upgrade should be done by the client using an unlock
followed by getting a write lock. This is not atomic but I
don't think UNIX has atomicity in this regard now (don't know
about Windows). There is also a performance issue in that
this turns an upgrade into a two-RPC operation, but maybe upgrade
is not common enough for that to be a worry.
>
> Mike> Of course, a Win32 server would have to do something similar
> Mike> if the protocol supported Posix-style locking. The
> Mike> difference is that with indivisible locks, the client has to
> Mike> poll the server. (Or I suppose it could submit a dummy
> Mike> write lock request and wait for the server to grant it.)
> Mike> With Posix-style locking, the server only has to poll the
> Mike> local locking interace provided by the OS.
>
> Mike> I guess the good news is that this can be considered unusual
> Mike> behavior, and that most applications will not split up
> Mike> existing locks like that.
>
> Mike> Anway, here's a counter-proposal for the first replacement
> Mike> paragraph that you suggested:
>
> Mike> "The byte range of a lock is indivisible. A range may be
> Mike> locked or unlocked between read and write but may not have
locked or unlocked or changed between read and write [but I am still
unsure about supporting that]
> Mike> subranges unlocked or changed between read and write. These
> Mike> are the semantics provided by the Win32 environment but only
> Mike> a subset of the semantics provided by Unix environment.
> Mike> This means that Unix clients may need to emulate a single
> Mike> locking request with multiple NFS calls. For example, if a
> Mike> subrange is unlocked, the client will need to unlock the
> Mike> entire range and lock the new regions. This creates a
> Mike> window during which a conflicting lock can be taken by
> Mike> another client. If this happens, the server will reject the
> Mike> client's attempt to create the new regions, and the client
> Mike> will notify the application that the lock has been lost."
I would replace the last sentence with the following:
Thus, atomically unlocking a subrange (as provided locally by
UNIX systems) may not be possible under NFS version 4.
This archive was generated by hypermail 2.1.2 : 03/04/05-01:47:59 AM Z CST