Ongoing design work has exposed a number of weaknesses in the discussion of migration within RFC 3530. While there does not appear any necessity to change any message formats or add operations, a number of migration-related issues should be addressed when the protocol is updated for v4.1. The purpose of this note is to clearly lay out what needs to be done, so that any possible updates can be discussed as part of the process of formulating a spec for v4.1. Some of the items discussed below might also be appropriate in the context of an update of the v4.0 spec in connection with going to a Draft Standard status.
While the spec allows the server to return additional attributes in addition to fs_locations, when GETATTR is used with a current filehandle within an absent filesystem, not much guidance is given to help clarify what is appropriate. In particular, there are a number of attributes which most server applications should find relatively easy to supply which would be of value to clients, particularly in those cases in which NFS4ERR_MOVED is returned when first crossing into an absent file system that the client has not referenced.
The spec should encourage servers to provide the following attributes where possible:
The fsid attribute allows clients to recognize when fs boundaries have been crossed. This applies also when one crosses into an absent filesystem. While returning fsid is not absolutely required, since fs boundaries are also reflected, in this case, by means of the fs_root field of the fs_locations attribute, returning fsid is helpful and servers should have no difficulty in providing it.
To avoid confusion, the spec should note that the fsid provided in this case is solely so that the fs boundaries can be properly noted and that the fsid returned will not necessarily be valid after resolution of the migration event. The logic of fsid handling for v4.0 is that fsid's are only unique within a per-server context. This would seem to be a strong indication that they need not be persistent when file systems are moved from server to server, although RFC 3530 does not specifically address the matter.
The mounted_on_fileid attribute is of particular importance to many clients, in that they need this information to form a proper response to a readdir() call. When a readdir() call is done within UNIX, the d_ino field of each of the entries needs to have a unique value normally derived from the NFSv4 fileid attribute. It is in the case in which a file system boundary is crossed that using the fileid attribute, particularly when crossing into an absent fs, that use of the fileid attribute for this purpose will pose problems. Note first that the fileid attribute, since it is within a new fs and thus a new fileid space, will not be unique within the directory. Also, since the fs, at its new location, may arrange things differently, the fileid decided on at the directing server may be overridden at the target server, making it of little value. Neither of these problems arise in the case of mounted_on_fileid since that fileid is in the context of the mounted-on fs and unique within it.
There are a number of attributes which pose difficulties when returned for an absent filesystem. While not prohibiting the server from returning these, the spec should explain the issues which may result in problems, since these are not always obvious.
For reasons explained above under mounted_on_fileid, it would be difficult for the referring server to provide a fileid value that is of any use to the client. Given this, it seems much better for the server never to return fsid values for files on an absent fs.
Returning file handles for files in the absent fs, whether by use of GETFH (discussed below) or by using the filehandle attribute with GETATTR or READDIR poses problems for the client as the serer to which it is referred is likely not to assign the same filehandle value to the object in question. Even though it is possible that volatile filehandles may allow a change, the referring server should not prejudge the issue of filehandle volatility for the server which actually has the fs. By not providing the file handle, the referring server allows the target server freedom to choose the file handle value without constraint.
There are a number of cases in which the spec is either unclear or simply incorrect about the situations in which NFS4ERR_MOVED is to be returned. Discussion of these issues has exposed the following problems, which should be addressed to provide greater clarity and correctness:
In providing the definition of NFS4ERR_MOVED the spec refers to the "filesystem which contains the current filehandle object" being moved to another server. This has led to some confusion when considering the case of operations which change the current filehandle and potentially the current file system. For example, a LOOKUP which causes a transition to an absent file system might be supposed to result in this error. The spec should be clarified to make it explicit that only the current filehandle at the start of the operation can result in NFS4ERR_MOVED.
While the spec does not make any exception for GETFH when the current filehandle is within an absent filesystem, the fact that GETFH is such a passive, purely interrogative operation, may lead readers to wrongly suppose that an NFSERR_MOVED error will not arise in this situation. The spec should explicitly state that GETFH will return this error if the current filehandle is within an absent filesystem.
While the spec states (in section 6.2) "The NFS4ERR_MOVED error is returned for all operations except PUTFH and GETATTR." Despite this, the spec lists NFS4ERR_MOVED as an error that can be returned by PUTFH. The spec should be updated to delete this as a possible error for PUTFH.
While, as noted above, the spec indicates that NFS4ERR_MOVED is not returned for a GETATTR operation, NFS4ERR_MOVED is listed as an error that can be returned by GETATTR. It seems reasonable to allow NFS4ERR_MOVED to be returned by GETATTR's that do not interrogate the fs_locations attribute while maintaining the exception which allows GETATTR to be used to get fs_locations information by establishing the rules that GETATTR's which interrogate fs_locations (with or without additional attributes) will not return NFS4ERR_MOVED.
Migration or referral events naturally create situations in which all of the attributes normally supported on a server are not obtainable. RFC3530 is in places ambivalent and/or apparently self-contradictory on such issues. The spec should be updated to take a clear position on these issues (and it should not impose undue difficulties on support for migration).
The first problem concerns the statement in the third paragraph of section 6.2: "If the client requests more attributes than just fs_locations, the server may return fs_locations only. This is to be expected since the server has migrated the filesystem and may not have a method of obtaining additional attribute data."
While the above seems quite reasonable, it is seemingly contradicted by the following text from section 14.2.7 the second paragraph of the DESCRIPTION for GETATTR: "The server must return a value for each attribute that the client requests if the attribute is supported by the server. If the server does not support an attribute or cannot approximate a useful value then it must not return the attribute value and must not set the attribute bit in the result bitmap. The server must return an error if it supports an attribute but cannot obtain its value. In that case no attribute values will be returned."
While the above seems reasonable in that it allows clients to simplify their attribute interpretation since they can assume that all of the attributes they request are present making it often possible to get successive attributes at fixed offsets within the data stream, it seems to contradict what is said in section 6.2, where it is clearly anticipated, at least when fs_locations is requested, that fewer (often many fewer) attributes will be available than are requested. It could be argued that you could harmonize these two by being creative with the interpretation of the phrase "if the attribute is supported by the server". You could argue that many attributes are not supported by the server for an absent fs even though the text by talking about attributes "supported by a server" seems to indicate that this is not allowed to be different for different fs's (which is troublesome in itself as one server might have filesystems that do support and don't support acl's for example).
Note, however that the following paragraph in the description says, "All servers must support the mandatory attributes as specified in the section 'File Attributes'". That's reasonable enough in general, but for an absent fs it is not reasonable and so section 14.2.7 and section 6.2 are contradictory. The spec should be modified to remove the contradiction, while allowing servers to use the approach outlined in section 6.2. It should also make sure that it is clear that the server may choose to return other requested attributes (e.g. fsid and mounted_on_fileid) rather than fs_locations alone.
A related issue concerns attributes in a READDIR. The spec already allows partial attribute return when rdattr_error is requested but indicates that if it is not requested errors must be returned if not all requested attributes can be obtained. When READDIR is done on a directory which contains mountpoints for absent fs's (either those that were once present and then migrated or simple referrals), this would seem to indicate that NFS4ERR_MOVED must be returned if the directory is in absent filesystem or any of the directory entries is the root of absent fs. This seems unduly restrictive, but if that is the correct interpretation, it should be made clear that the exception indicate in section 6.2 does not apply in the READIR case, to avoid possible confusion.
The most important of these is an explanation of how referrals fit into the v4 migration model. Since the existing discussion does not specifically call out the case in which the absence of a filesystem is noted while attempting to cross into the absent file system, it makes it hard to understand how referrals would work within the existing protocol. This needs to be corrected to allow better understanding of the capabilities of NFSv4.0 which will be retained in NFSv4.1 and future minor versions of NFSv4. See my note "Referrals in NFS-v4.0" for some useful discussion of referrals. This material would probably be best handled as a new subsection, following the "Migration" section, and would be section 6.2.1, given the current numbering scheme of RFC 3530.
There are a number of cases in which the existing wording of RFC seems to unnecessarily restrict the use of the referral case of the migration feature in order to implement a global namespace. In the following cases, some suggestions are made for edits to tidy this up, with the new material italicized.