Bake-off Whiteboard

This represents my (Dave Noveck's) attempt to summarize the list of notes scrawled on the whiteboard at the bake-off and the ensuing discussion. This was intended as a list of questions and comments from the assembled developers about things in the spec that are either wrong or not clear. In some cases, there was a consensus about the proper resolution and in other cases there were various views. I have tried my best to accurately represent the discussion but am sure that in many cases I have failed. Let me know of any corrections.

I have used the item numbering from the whiteboard itself. Where it seemed to me that a single item on the board is best represented as multiple items, I have used the original numbers combined with arbitrary letters for each of the separate items.

1) Interaction of Leases, state-id's and seq-id's

There is a need for the spec to clarify the interaction of leases, state-id's and sequence id's, particularly in error situations. Andy's email to the working group and the response to it deal with these issues. There is a need to come to a clear resolution on these so that the spec can be clarified.

There was considerable discussion which helped to clarify the basic approach. There was not enough time to get all the details were fleshed out, however.

Andy will provide flow charts that show the desired order of processing, based on our discussion. We'll discuss these over email until we are all happy with them.

1a) Don't increment seq-id on error

Coming out of the previous discussion (for item 1), has been a recognition that the spec is not clear about what situations the sequence id for a given lockowner is to be incremented. In some cases, such as when the stateid is bad, the sequence id cannot be incremented. For many errors, you would be able to increment to sequence id, but the spec does not define which is which. Clearly, the client and server must agree on this or they will get out of sync.

After discussion, we decided that the best way out of this, would simply be to decide that the sequence id should not be incremented on any error. Since the purpose of the sequence mechanism is to prevent spurious repetition of non-idempotent requests, excluding error cases (which should cause no change of server state), seems safe enough.

Four people have already modified their clients and/or servers to reflect this understanding.

The following example shows that this resolution of the matter is not just a "good idea". Suppose that the server gets an open for a new lockowner that he has no record of after returning some lockowner storage. The normal response to this is to require open-confirm. Now consider what happens if the open has an error. If the server and client were to be obliged to bump the sequence id even in an error case, then open-confirm would be required for a failed open. Even if you were willing to tolerate this weirdness, there is no way in an the error response to open to communicate the necessity for an open confirm. Not bumping sequence on errors saves you from this madness.

1b) Need for sequence synchronization op

In view of a number of problems in which client and server got their sequence id's out of sync, there was some discussion about whether we need a new op to synchronize sequence id's for a particular lockowner. Without that, there's really no way to recover.

The consensus was that we would see if the clarifications made regarding sequence updating would suffice to make this problem go away. Until we have more experience, we shouldn't pursue additional op's to deal with this problem.

1c) Remembrance of state-id's past

During the discussion of item 1, Dave brought up the issue of a retransmitted CLOSE. To respond correctly, it is necessary to compare the sequence id to the sequence id for the lockowner. This necessitates knowing the lockowner. The problem is that the lockowner can only be found using the stateid passed in the CLOSE. If the CLOSE succeeded, then you would expect the state entry to be deallocated, which means that you would get a BAD_STATEID error and the server will be unable to find the lockowner necessary to treat this as a duplicate.

Dave talked about this could be dealt with in the server implementation by delaying the deallocation of state entries, until it was determined that they would not be needed to resolve duplicates. This description also appeared in a message to the working group. (A link to the archives should be here but this message doesn't seem to be in the archives).

Carl Beame found Dave's solution to complicated and suggested instead that the server be allowed instead to treat a CLOSE with a BAD stateid as a no-op and allow the server to return OK in this case.

No consensus was reached on this.

2) Response count for COMPOUND

There was some confusion about the op count in the response for COMPOUND. If the final op has an error, is that included? While the consensus was that the right answer is, "Yes.", the fact that this has come up, suggests that the spec should be made clearer on this issue.

3) Multi-component pathname issues

A number of proposals were made to more accurately report errors that occur in handling multi-component pathnames for OPEN and LOOKUP. The difficulties in recovering from a situation in which one of the directories in a multi-component path name turned out to be a symlink were discussed, with much gnashing of teeth.

The consensus was that these difficulties make use of multi-component pathnames undesirable with a better alternative being COMPOUNDed LOOKUP's. This makes it desirable to remove multi-component pathnames from OPEN and LOOKUP.

4) Lease and deadlocks

There was some concerns expressed about leases in a situation in which a given lockowner obtained a lock and then hung. If other processes continued unaffected then the leases associated with the locks held by the lockowner would be periodically renewed, even when, it could be argued, they shouldn't be.

The general consensus was that this behavior was inherent in the lease design and that it was the responsibility of the client operating system to deal with, but perhaps the spec should discuss this issue.

Carl Beame suggested that the spec encourage servers to provide some sort of facilities for administrators to cause locks to be revoked. Currently, the spec does make passing reference to administrative revocation of locks where leases have not expired. Perhaps the text here could be strengthened a bit.

5) CREATE/OPEN/MKDIR errors

Sergei pointed out that when OPEN-create which encounters a directory with the target name, the spec does not provide for an informative error code. NFS4ERR_EXISTS doesn't really tell the client what happened. A separate NFS4ERR_ISDIR would be better. Similarly when a CREATE of a directory encounters a file whose type is not a directory.

Exactly how to address this was discussed. Someone asked about other file types (device files, etc.). The consensus was that this problem should be dealt with but that we had to make sure that all possible cases were addressed without creating an explosion of new error codes.

Unless someone else volunteers, Sergei owns coming up with a precise proposal to deal with this.

6) Server references delegations by FH, not stateid

The one person who implemented delegations (Kendrick), found that it was inconvenient that the delegation recall was in terms of a filehandle and the delegation stateid is a more convenient alternative.

Since this only affects callbacks, making this change would not seriously affect existing implications. The consensus is that it makes sense to do this.

7) TAG, what's it good for?

Absolutely nothing.

Say it. Say it. Say it again.

It's just a page-alignment-breaker.

And a bandwidth-waster.

There was a consensus that the tag is not very helpful and that we should try to remove it from the spec. Until that time, clients should try to send zero-length tags and servers should reproduce the tag from the request in the response.

8a) Issues with locking length of all ones

Dave brought up issues involving a length of all ones in byte-range locking. The spec, after saying that you may lock bytes that have not been written yet, proceeds to specify that a length of all ones is used to lock to the "end of file". The consensus was that the intention here was to refer to the entire range of possible byte positions for a file.

On that basis, Dave proceeded to discuss other issues with the handling of this in the spec.

There was a consensus that this stuff needs to be cleaned up. Dave will come up with a detailed proposal.

8b) Byte ranges with 32-bit systems

Andy brought up the issue of byte-range locking on a server with a 32-bit file system. What should be done about locks that extend beyond the range of valid 32-bit offsets?

Carl Beame pointed out that some client applications use byte-range locks beyond the 32-but range for semaphores that are unconnected with the contents of the file locked.

The consensus was that the server should implement locking for 64-bit byte ranges, regardless of the file system's ability to read or write data at locations beyond the 32-bit range. The time to give the client an error is when he attempts to read or write at an unsupported offset. Perhaps the spec could make this clearer.

9) Rename delegation recall

Brent brought up the issue that ability of a client holding a delegation to process OPEN without making server requests depends on the ability of the client to assume that the name of the file has not changed.

What is written in the spec regarding this issue has some holes. The spec does state that a RENAME of the file should cause the delegation to be recalled. However, in discussing renaming of directories that lead to the file in question, the spec only addresses servers whose semantics do not allow such renames while the file is open. For such servers, it indicates that a recall should be done.

The case of other (i.e. UNIX-like) servers is not dealt, which would imply that no delegation recall is required. In fact, a delegation recall seems required, since, as Brent points out, not doing so could cause a client to erroneously allow an open which should fail.

In the discussion or this issue, some mentioned the difficulty of verifying that a directory rename did not affct a delegation below it. Dave pointed out that there is a parallel issue with CIFS implementations. Netapp's CIFS implementation does not allow renames of open files but stops short of doing the analogous checking for directory renames. This has not been a problem in practice.

The consensus is that is best to leave it up to the server implementor whether to deal with the directory rename issue, but that the spec should make clear that this issue applies both to servers that prohibit rename of open files (e.g. NT) and those that do not (e.g. UNIX).

10) Initial sequence id

There was an issue about what the initial sequence id should be, in the case in which the server sees a lockowner it has never seen before. In the case in which the server is requiring confirmation, it is pretty clear that any value presented by the client is OK, but what about the case in which no confirmation is required?

Some servers were requiring an initial sequence of zero in this case and some clients were presenting an initial value of one, resulting in a conflict.

It seems that there should not be a problem requiring the server to accept any initial sequence value in this case as well and this is what we settled on but the spec should make this clear.

11a) Overlapping and other weird byte ranges

There has been previous mail to the working group about this issue. The issue concerns how to deal with some of the very weird locking operations allowed by the POSIX flock semantics.

Should we treat all such operations just as the spec now handles sub-range unlocks? Or should the client be responsible for simulating the flock behavior if he wants it? The spec needs to resolve this.

The consensus was that more discussion on the working-group alias would be necessary before we decide exactly how the spec should resolve this issue. Dave will try to drive this ssue to a resolution.

12) SETATTR response attribute mask

There was a question about whether the mask of attributes set returned should be limited to a subset of the attributes that were requested to be set. The consensus was that it should.

As an example of a case which might lead to a different conclusion, consider a SETATTR setting size. The spec says that this should result in the modified time being set. It could be argued that modified time should in this case be set in the mask returned by SETATTR. If it shouldn't (or even if it should), the spec should make this clear.

A similar issue concerns the time_modify_set attribute. If you specify this in SETATTR, should the response attribute set contain the bit for time_modify_set? Or should it contain the bit for time_modify? Both?

The consensus was that the returned attribute mask should be a (not necessarily proper) subset of the mask of attributes that the client requested be set. The spec needs to be clarified on this point.

13) Attributes needed on create

Sergei has problems with the fact that you cannot specify attributes when doing a CREATE, in the case of creating a directory, for example. The option of doing a SETATTR after the create may not be available in some cases, as when you want to set the owner, since servers will often not let you change the owner unless you are the owner or root.

The consensus was that CREATE needs to be enhanced to allow attributes to be specified for all non-regular file types.

14) OPEN upgrade and share conflicts

The spec isn't clear about what share checks are done in the open upgrade case. The spec has a paragraph that discussed sharing checks and another that discusses the or-ing of access and deny bits, without indicating which has priority.

The issue boils down to this: If an open upgrade would be denied by normal sharing rules, is it also denied, even though this is for the same lockowner? The spec needs to clarify this.

Carl Beame stated that NT semantics require that the normal sharing checks be done first and that the V4 spec shold do the same. Nobody offered any other opinions on this.

15) NT server issues with OPEN upgrade

The Hummingbird people have come up with an issue regarding the open upgrade-downgrade model that is currently in the spec. There is no easy method whereby an NT server might implement a downgrade request in general. NT does not provide the requisite semantics.

It might implement this by handling the open upgrade by doing a second open and then closing one of its open's when a downgrade is done. There are some problems with this. As the spec stands, the client is not constrained in precisely what downgrades it would do, making it possible for it to make requests, that cannot be handled by the server closing one of its opens. Even if this could be resolved, there remains an issue with byte-range locks. The NT server has no way of knowing which of his opens to assign the locks to. He might assign them to an open which needs to be closed to satisfy a subsequent downgrade.

Carl Beame asks why upgrade-downgrade is done that way, instead of allowing multiple opens each of which results in a separate stateid that might be independently closed. Nobody had an answer. The archives need to be checked to see what the reason was.

There was considerable discussion of this issue. The consensus was that it needs to be resolved somehow, but nobody was ready to decide on anything yet. Andy expressed worry that allowing multiple OPEN's for a given file-lockowner pair would increase server memory requirements. More discussion of this issue on the working group alias is required. Dave will comb the archives and summarize the history of our current solution to upgrade-downgrade and what problems have been cited about other proposed solutions.

16) Stateid returned on CLOSE. Why?

There seems to be no valid use for the stateid returned on CLOSE, since the file is no longer opened by that lockowner. What is it there for?

In the discussion it was noted that the resolution for item 16 might give this value a new lease on life. Nobody could find a use for it in the current protocol, however.

17) Only one outstanding seqid-containing request at a time

One thing that makes the spec harder to understand than it needs to be concerns the handling of seqid's. It is fundamental to the design that the client should have only one outstanding request containing a sequence-id for each lockowner at a given time. This needs to be stated explicitly, which would make the rest of the discussion of this aspect of the protocol *much* easier to understand.

18) Unlimited-length opaques. Just say "No"?

Andy points out that there are a number of unlimited-length opaques in the protocol (e.g. client-string, lockowner) and that each each such adds to server vulnerabilities to denial-of-service attacks. The possibility of a 32 kbyte lockowners appearing makes coding of the server more complicated and will hurt performance in the normal case, unless great efforts are made to optimize server performance.

Scrubbing the .x file for unlimited-length opaques and providing liberal bounds for each seems like a good idea. Although it is not impossible that future developments will require us to breach limits that seem quite liberal right now, we have the minor versioning mechanism to deal with this possibility. Making things unlimited length has a cost in complexity, performance, and stability, that we probably don't want to pay.

Andy has graciously volunteered to go through the spec and come up with a proposal on the limits that need to be added.

19) OPEN and CREATE returning bitmap of attributes

Sergei brought up the issue of failure to set attributes on an OPEN (or CREATE if attributes are added to that). What is the correct handling if some attribute cannot be set. Sergei suggests that in this case the OPEN/CREATE should succeed but that a bit mask should be returned so that the client can determine what attributes were successfully set.

This issue was discussed without coming to any firm conclusions. Sergei will make a proposal to the working group setting forth the reasons for the change.

20) Get a seqid using OPEN

Bill Ricker remarked that the spec would be clearer if it was prominently pointed out that the only way to get an initial sequence id is through OPEN. Also, the spec refers to specifying a lockowner when doing a lock which is no longer true since this was changed to a stateid. Generally, it would be helpful if there were some rework of the spec in this area to be consistent with the protocol as it has evolved.

Everybody agreed with this.