From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 12/19/02-01:09:22 PM Z
Message-ID: <C8CF60CFC4D8A74E9945E32CF096548A070A5F@SILVER.nane.netapp.com> From: "Noveck, Dave" <Dave.Noveck@netapp.com> Subject: Comments on draft-ietf-nfsv4-repl-mig-proto-00.txt Date: Thu, 19 Dec 2002 11:09:22 -0800 RPC/XDR or something else? First thing is the RPC issue that Mike Eisler brought up. Unless there is some strong reason to do otherwise, let's follow the path of least resistance. We've got enough difficult issues at the application level, given the range of different filesystems we are going to deal with, without dealing with transport-level stuff. If RPC is really broken in this case for some reason, then we can go in a different direction, but until someone makes a strong case for that, I'd stick with RPC. The design refers to the efficiency issue. It says you don't want to send lots of small packets and I agree. I think the messages as they are now constructed don't lend themselves very well to simply being transposed to rpc's. In fact I would argue that they have some efficiency problems even in the non-RPC model (security token on every piece). So I think you need a send RPC that allows you send a whole buffer of the tiny data items that you now have as RPC's. I also have problems with the statement, "The use of XDR is also subject to change". I suppose that at this stage everything is subject to change, but why call this out? I'm not a big XDR fan and find it kind of annoying for a little-endian server to talk to a little-endian client in big-endian, but let's keep things in proportion. This happens in a whole lot of protocols, and it would be hard to argue that this one has a greater requirement to avoid byte-swaps than all those others. If someone really wants to devote the effort to coming up with an alternative to XDR for this, I wouldn't reject it out of hand, but to me this "subject to change" language screams out "We're not serious, yet. Don't try to build a prototype because nothing is decided". OK, I'll stop ranting now. Server semantics Intermittently, there has been discussion of the consequences of the fact that the spec leaves the "semantics" of the file system unspecified and up to the server. One implication of this is that a client may find one compliant server meets its needs while another does not. While an exhaustive list of server semantics and other characteristics is not possible, we have discussed publishing a "taxonomy" of server characteristics so that server would have a standard format in which to publish their choices, so that clients could more easily check suitability up front. There has also just recently been discussion of making such information available within the protocol (in a minor version) for the client's reference (e.g. defining a big structure with all relevant characteristics and making that a new read-only per-fs attribute). It occurs to me that a server-server migration protocol has a much greater need for the definition of such a server-semantics structure. First, if your file system allows renames while the file is open, for example, then migrating fs's to a server which does not is liable to seriously discomfit the client. I'm not saying the administrator shouldn't be able to override that but the protocol should make the information available for negotiation. Also, the destination server may be capable of adopting for export multiple semantic models (by configuration or other options) so that we should allow the source to tell the destination server whether it want to prohibit renames of open files, for example. There is a ton of stuff like this. If my server supports UTF-8 names but only for characters upto 64K, then I'd like the destination to do the same, especially if I ever want to migrate that fs back. It is true that there is a skeletal provision in the strawman for "capability negotiation" but I think that is focused on capabilities with regard to to the data transfer or at least the data to be transferred. I'm thinking that we need something broader where the main focus (at least for me) is the functions of the server with respect to clients and the need to preserve the semantics the client's see. So what I think makes sense here is for someone to gather up all of this server-characteristics stuff in a separate draft that defines a big structure or set of structures. This could then be pulled in by the migration/replication protocol as a basis for negotiation and by a future minor version as a new attribute. The obvious thing is to make that someone me as a punishment for bringing this up :-), but maybe I can avoid that. It occurs to me that if there's someone out there who has been interested in getting more involved in the work of nfsv4-wg but has found a lot of the existing stuff kind of daunting, this would be an ideal low-barrier-to-entry kind of thing to do first. It doesn't have all the complicated history that the base protocol has and that makes some of the minor version stuff difficult as well. Anybody interested in taking this on? If you might be and want to talk about it, send me some mail and I can answer any questions you have. Section 4.4: It says that this creates a session to send the full contents or an incremental, but I don't see where the distinction is made. The source and the target had better agree which it is. Section 5.1: If we are going the RPC route, as Mike and I think is desirable, then this breakup of messages does seem to make the overhead a bit high. Why not have one SEND RPC covering all the current SEND_* messages, except for maybe SEND_CHECKPOINT. Then create a union of all the individual current SEND-type stuff and allow a SEND RPC to send a variable-length array of such elements. That seems a lot better to me. If you did that, sec_token and msg_id could be taken out of all of individual data items. sec_token would be superseded by the switch to rpc but msg_id would be in the SEND rpc. Section 5.2: In the first paragraph, the ordering constraints need to be clarified. It isn't clear whether you are saying that all DATA messages have to happen before all LOCK_STATE messages before all SHARE_STATE messages. Also, since the filesystem may be changing as data is transferred, people may want to go back and send changes that happened while the rest of the transfer was going on. Not clear how that will be integrated with the fixed (e.g. inode) order. As far as the directory stuff, it isn't clear what algorithm you are supposing will be used to reconstruct changes. You talk about removes and renames but not creates or links. I'm supposing that somehow, you are gathering the names of objects as the objects themselves are transferred, and that creates are somehow dealt with that way, but I don't see how this would work, given what is defined. I wonder how your treatment of named attributes will work, particularly on incremental. It doesn't seem to have the rename and remove capability, so I'm not sure how deletion of a named attribute will be handled. If it is handled by simply not sending that named attribute, then you would be forced to send all named attributes for a file any time anything in the file changed (even one byte or the access time if you are sending those changes). Section 5.3: is_named_attr is determinable from the obj_type so why is it there. obj_acl should be in the attributes and not a separate thing. The same argument can be made for obj_type. It is an attribute. Note that fs (rather than file) attributes should not be sent for every object, so there should be a provision to send this once. I'm not clear about file_id. Is this the same as the fileid attribute, in which case you already have it once. I don't understand the RM_CIFS_ATTR stuff. The v4 attributes were extended to include a lot of CIF stuff. I think that if we want to put in places for fs-specific metadata extensions, we should do that, but that should be in addition to v4 stuff, rather than as an alternative to it. I note that the parent directory is not in this message so I don't know how directory information could be gathered from such messages. Maybe I just don't understand what is intended here. In any case, I don't think you could add the fileid of the parent (or it would have to be optional) because many fs's don't have this information when going through the fs in inode order. Section 5.4: You have is_hole, length, and data, which includes the length. So you have length twice except in the case of a hole. Why not have SEND_DATA which doesn't have length (the data is OK) and SEND_HOLE which has no data but the length, and is_hole goes away. I think that's nicer. Section 5.5: There is no definition of RMowner that I can see. Clearly it contains the opaque owner string, but what else? Clientid won't work, will it? I'd think you'd need the client-string. There is no provision to send that stuff over. RMlock_desc has no provision for the exclusive bit. It is not clear how it envisioned that this will work for incrementals. Is it the assumption that all lock state is sent over for a file, if it has changed? For example, if I send a set of locks for a file in one transfer, and then in the next I don't send any information for that file, what does the destination assume that it still has the same locks? In if they have all gone away, I send a null set of locks. Section 5.6: Basically some of the same issues as in 5.5. Section 5.7: I assume the file_id is the file-id of the directory. It says that this will cause removal of the designated object. In the case of hard links, I assume that actual removal is not meant (i.e. that you only decrement the link count). But there is a problem here. Suppose you made a link to an existing file and deleted the old link. If the directory in which the old link resided came first in the inode order, then it would have a zero link-count at one point. Is it assumed that deletions would happen at the end of an incremental. I think that is the only thing that would work. I think the incremental model needs some tightening up at the very least. Of course, it could be that I don't understand it, in which case I'd argue that the description needs to be reworked to make it more understandable. Section 5.8: This only works for renames within a directory. How are renames between directories handled? Mike mentioned the possibility of SEND_LINK. Does that have a role in resolving the above? Section 5.9: The description is confusing. It should start out with the basic function, which I assume is to send the initial directories, rather than incrementals. Or am I wrong about that, you have the names come in with the object itself and say that this is just for renames/removes. Are you assuming that somehow directories will be constructed from the files that are in them? But I don't see how that can be done, especially if there are multiple- linked files. The second sentence in this section doesn't seem to make sense as written.
This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:42 AM Z CST