Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft]

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: David Robinson (David.Robinson@sun.com)
Date: 02/06/03-11:21:20 AM Z


Message-ID: <3E429990.2050705@Sun.COM>
Date: Thu, 06 Feb 2003 11:21:20 -0600
From: David Robinson <David.Robinson@sun.com>
Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft]

Dan Oscarsson wrote:
>>What I was getting at is that if the server does not enforce a
>>normalization form for filenames then the clients had better use one
>>normalization form to avoid interop problems, and if the client does the
>>normalization then that task can be moved to user-level, as in libc.
>>
> 
> 
> Yes, and because of that I think the protocol must mandate
> normalisation form C. Then the server (kernel) code kan assume the
> format to be that and do not need to do any normalisation. If a client
> sends text in another form and thereby violates the protocol strange
> things may happen and bad answers. That is ok, that should happen if
> you do not follow the protocol. The whole meaning of defining a protocol
> is to agree on what "language" to speak and the meaning of the "words".
> If the protocol says UCS form C encoded using UTF-8 and somebody
> transmitts UCS-2 you violates the protocol.
> If we define the normalisation form to be "undefined", we get a protocol
> where the "words" are loosly defined. I see no reason to allow more than
> one form of a "word". Why complicate the world? By selecting the most
> favoured normalisation form many systems can directely send text over
> the protocol without change. Both the Unix community and the World Wide Web
> have selected form C to be used. I am sure there are many more.

I am far from an expert on normalization, but in following this thread
it seem to boil down to the question of "who makes it right" and what
"right" is. For the latter, I am going to defer to the normalization
experts to determine what is the best "form".

Traditionally NFS has had a philosophy of dumb servers and smart clients
which has allowed development of high performance low impact servers.
The client was the party that was responsible for mapping its semantics
on to the protocol definition.  Based on this and my understanding
of Unicode, at most the server should be limited to validating that
the utf8 string it receives is actually a valid encoding, the server
should not perform the complex process of normalization.

If we follow this approach, two clients that use different encoding
schemes may send different utf8 strings to access the same file, one
or both may fail depending on the form the server stored the name in.
Again for simplicity the server is just doing a bitwise comparision.
Some files may be inaccessible by certain clients, but as Nico says
above, we already have this today and it doesn't seem to be a problem.

To help this problem we can either mandate a normalization form on
the wire, "MUST use form XYZ" or we can recommend that clients
use a common normalization form, "SHOULD use form XYZ". I would
tend to favor the latter as it allows clients in a homogenous
environment to not pay the price of normalizing to an unfavorable
encoding. It is also useful to note that if a filename uses
an encoding that is not the preferred encoding, the client can
still access it by simply performing a READDIR and returning the
bits acquired in a subsequent LOOKUP. What is presented to the
application need not be what the server returns, as long as the
client performs the mapping function.

So the interesting options are:

	1) Server performs normalization
	2) Protocol specifies wire standard normalization
	3) Client uses recommended common normalization

Given where we are in the IETF process, I suggest #3 and
the WG publish the recommended normalization form. Over time,
(in a minor version?) we could migrate to option #2.

	-David


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-02:12:05 AM Z CST