Comments on draft-ietf-nfsv4-repl-mig-proto-00.txt

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 12/19/02-01:09:22 PM Z


Message-ID: <C8CF60CFC4D8A74E9945E32CF096548A070A5F@SILVER.nane.netapp.com>
From: "Noveck, Dave" <Dave.Noveck@netapp.com>
Subject: Comments on draft-ietf-nfsv4-repl-mig-proto-00.txt
Date: Thu, 19 Dec 2002 11:09:22 -0800

RPC/XDR or something else?

First thing is the RPC issue that Mike Eisler brought up.
Unless there is some strong reason to do otherwise, let's follow
the path of least resistance.  We've got enough difficult issues
at the application level, given the range of different filesystems
we are going to deal with, without dealing with transport-level
stuff.  If RPC is really broken in this case for some reason,
then we can go in a different direction, but until someone makes 
a strong case for that, I'd stick with RPC.  

The design refers to the efficiency issue.  It says you don't 
want to send lots of small packets and I agree.  I think the
messages as they are now constructed don't lend themselves
very well to simply being transposed to rpc's.  In fact I would 
argue that they have some efficiency problems even in the 
non-RPC model (security token on every piece).  So I think you 
need a send RPC that allows you send a whole buffer of the tiny
data items that you now have as RPC's.  

I also have problems with the statement, "The use of XDR is also 
subject to change".  I suppose that at this stage everything is 
subject to change, but why call this out?  I'm not a big XDR fan 
and find it kind of annoying for a little-endian server to talk 
to a little-endian client in big-endian, but let's keep things 
in proportion.  This happens in a whole lot of protocols, and 
it would be hard to argue that this one has a greater requirement 
to avoid byte-swaps than all those others.  If someone really 
wants to devote the effort to coming up with an alternative to 
XDR for this, I wouldn't reject it out of hand, but to me this 
"subject to change" language screams out "We're not serious, 
yet.  Don't try to build a prototype because nothing is decided".  
OK, I'll stop ranting now.

Server semantics

Intermittently, there has been discussion of the consequences of the
fact that the spec leaves the "semantics" of the file system unspecified
and up to the server.  One implication of this is that a client may
find one compliant server meets its needs while another does not. 
While an exhaustive list of server semantics and other characteristics
is not possible, we have discussed publishing a "taxonomy" of server
characteristics so that server would have a standard format in which 
to publish their choices, so that clients could more easily check 
suitability up front.  There has also just recently been discussion
of making such information available within the protocol (in a minor
version) for the client's reference (e.g. defining a big structure 
with all relevant characteristics and making that a new read-only 
per-fs attribute).  

It occurs to me that a server-server migration protocol has a 
much greater need for the definition of such a server-semantics 
structure.  First, if your file  system allows renames while
the file is open, for example, then migrating fs's to a server
which does not is liable to seriously discomfit the client.  I'm
not saying the administrator shouldn't be able to override that
but the protocol should make the information available for 
negotiation.  Also, the destination server may be capable of
adopting for export multiple semantic models (by configuration
or other options) so that we should allow the source to tell
the destination server whether it want to prohibit renames of 
open files, for example.  There is a ton of stuff like this.
If my server supports UTF-8 names but only for characters upto
64K, then I'd like the destination to do the same, especially
if I ever want to migrate that fs back.  It is true that there
is a skeletal provision in the strawman for "capability negotiation"
but I think that is focused on capabilities with regard to 
to the data transfer or at least the data to be transferred.
I'm thinking that we need something broader where the main 
focus (at least for me) is the functions of the server with
respect to clients and the need to preserve the semantics the
client's see.

So what I think makes sense here is for someone to gather up 
all of this server-characteristics stuff in a separate draft that
defines a big structure or set of structures.  This could then
be pulled in by the migration/replication protocol as a basis 
for negotiation and by a future minor version as a new attribute.
The obvious thing is to make that someone me as a punishment for
bringing this up :-), but maybe I can avoid that.  It occurs to me that
if there's someone out there who has been interested in getting more
involved in the work of nfsv4-wg but has found a lot of the existing
stuff kind of daunting, this would be an ideal low-barrier-to-entry
kind of thing to do first.  It doesn't have all the complicated history
that the base protocol has and that makes some of the minor version
stuff difficult as well.  Anybody interested in taking this on?
If you might be and want to talk about it, send me some mail and
I can answer any questions you have.

Section 4.4:

It says that this creates a session to send the full contents or an
incremental, but I don't see where the distinction is made.  The
source and the target had better agree which it is.

Section 5.1: 

If we are going the RPC route, as Mike and I think is desirable, then
this breakup of messages does seem to make the overhead a bit high.
Why not have one SEND RPC covering all the current SEND_* messages,
except for maybe SEND_CHECKPOINT.  Then create a union of all the
individual current SEND-type stuff and allow a SEND RPC to send 
a variable-length array of such elements.  That seems a lot better
to me.

If you did that, sec_token and msg_id could be taken out of all of
individual data items.  sec_token would be superseded by the switch to
rpc but msg_id would be in the SEND rpc.

Section 5.2:

In the first paragraph, the ordering constraints need to be clarified.
It isn't clear whether you are saying that all DATA messages have to
happen before all LOCK_STATE messages before all SHARE_STATE messages.
Also, since the filesystem may be changing as data is transferred,
people may want to go back and send changes that happened while
the rest of the transfer was going on.  Not clear how that will
be integrated with the fixed (e.g. inode) order. 

As far as the directory stuff, it isn't clear what algorithm you are
supposing will be used to reconstruct changes.  You talk about 
removes and renames but not creates or links.  I'm supposing that 
somehow, you are gathering the names of objects as the objects
themselves are transferred, and that creates are somehow dealt with
that way, but I don't see how this would work, given what is defined.

I wonder how your treatment of named attributes will work, 
particularly on incremental.  It doesn't seem to have the rename and 
remove capability, so I'm not sure how deletion of a named attribute
will be handled.  If it is handled by simply not sending that 
named attribute, then you would be forced to send all named attributes
for a file any time anything in the file changed (even one byte
or the access time if you are sending those changes).

Section 5.3:

is_named_attr is determinable from the obj_type so why is it there.
obj_acl should be in the attributes and not a separate thing.  The
same argument can be made for obj_type.  It is an attribute.  Note
that fs (rather than file) attributes should not be sent for every
object, so there should be a provision to send this once.  I'm
not clear about file_id.  Is this the same as the fileid attribute,
in which case you already have it once.  I don't understand the
RM_CIFS_ATTR stuff.  The v4 attributes were extended to include a 
lot of CIF stuff.  I think that if we want to put in places for
fs-specific metadata extensions, we should do that, but that should 
be in addition to v4 stuff, rather than as an alternative to it.

I note that the parent directory is not in this message so I don't
know how directory information could be gathered from such messages.
Maybe I just don't understand what is intended here.  In any case, I 
don't think you could add the fileid of the parent (or it would have
to be optional) because many fs's don't have this information when
going through the fs in inode order.

Section 5.4:

You have is_hole, length, and data, which includes the length.  So
you have length twice except in the case of a hole.  Why not have
SEND_DATA which doesn't have length (the data is OK) and SEND_HOLE
which has no data but the length, and is_hole goes away.  I think
that's nicer.

Section 5.5:

There is no definition of RMowner that I can see.  Clearly it contains
the opaque owner string, but what else?  Clientid won't work, will it?
I'd think you'd need the client-string.  There is no provision to send
that stuff over.

RMlock_desc has no provision for the exclusive bit.

It is not clear how it envisioned that this will work for incrementals.
Is it the assumption that all lock state is sent over for a file, if it
has changed?  For example, if I send a set of locks for a file in one
transfer, and then in the next I don't send any information for that
file, what does the destination assume that it still has the same locks?
In if they have all gone away, I send a null set of locks.

Section 5.6:

Basically some of the same issues as in 5.5.

Section 5.7:

I assume the file_id is the file-id of the directory.  It says that this
will cause removal of the designated object.  In the case of hard links,
I assume that actual removal is not meant (i.e. that you only decrement
the link count).  But there is a problem here.  Suppose you made a link
to an existing file and deleted the old link.  If the directory in which 
the old link resided came first in the inode order, then it would have
a zero link-count at one point.  Is it assumed that deletions would happen
at the end of an incremental.  I think that is the only thing that would 
work.

I think the incremental model needs some tightening up at the very least.
Of course, it could be that I don't understand it, in which case I'd argue
that the description needs to be reworked to make it more understandable.

Section 5.8:

This only works for renames within a directory.  How are renames between
directories handled?

Mike mentioned the possibility of SEND_LINK.  Does that have a role in 
resolving the above?

Section 5.9:

The description is confusing.  It should start out with the basic 
function, which I assume is to send the initial directories,
rather than incrementals.  Or am I wrong about that, you have the
names come in with the object itself and say that this is just
for renames/removes.  Are you assuming that somehow directories
will be constructed from the files that are in them?  But I
don't see how that can be done, especially if there are multiple-
linked files.

The second sentence in this section doesn't seem to make sense as
written.


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:42 AM Z CST