RE: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis- 05 draft]

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Noveck, Dave (Dave.Noveck@netapp.com)
Date: 01/28/03-08:29:24 AM Z


Message-ID: <C8CF60CFC4D8A74E9945E32CF096548A072A54@SILVER.nane.netapp.com>
From: "Noveck, Dave" <Dave.Noveck@netapp.com>
Subject: RE: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis- 05 draft]
Date: Tue, 28 Jan 2003 06:29:24 -0800

Mike Eisler wrote: 
> If the server advertised that it is supporting one of the case insensitive
> attributes, it MUST use form KC. If neither of the case
> attributes is supported by the server, then normalization is
> optional by the implementation and
> could be made a SHOULD or MUST in a future
> revision of the specification, such as advancement to DRAFT
> status, or in a minor revision. Given
> that that we are riding RFC3454, the only
> normalization form we could use in the future is KC.

Here's what the unicode technical report says about form KC
(and KD):

> Normalization forms KC and KD must not be blindly applied to arbitrary 
> text. Since they erase many formatting distinctions, they will prevent 
> round-trip conversion to and from many legacy character sets, and unless 
> supplanted by formatting markup, may remove distinctions that are important
> to the semantics of the text. The best way to think of these normalization
> forms is like uppercase or lowercase mappings: useful in certain contexts
> for identifying core meanings, but also performing modifications to the 
> text that may not always be appropriate. They can be applied more freely
> to domains with restricted character sets, such as in Annex 7: Programming
> Language Identifiers.

My impression is that filenames in v4 are much more like "arbitrary text"
and don't fit the restricted model that would be implied by the Programming 
Language Identifer analogy.

I understand Mike's point that we don't have free choice in this area.

The basic thrust of the protocol with regard to naming has been that this
is pretty much part of the filesystem semantics (the filesystem may not 
support the full character range, may have other restrictions on what
names are valid, etc.).  With the stringprep stuff we deviated from that
model, and at least in the case of case-insensitive mapping, imposed very
detailed requirements on the filesystem.  It looks like we didn't have a 
choice.

Most of us are going to be doing case-sensitive matching where I think
the basic v4 model remains.  Unless forced to do so, I would hope that
we would not go down the path of imposing detailed requirements on the
naming architecture of the filesystem, without a clear understanding of
the consequences.  I'm pretty sure I don't have such an understanding
now, and it's not clear to me that I will ever have such an understanding.

As the protocol stands now, I believe filesystems (if case-sensitive
matching is in effect) MAY map canonically equivalent filename strings
to the same file.  They might even do so for compatibility equivalent
strings.  If it doesn't impose a normalization form (except internally),
then we don't run afoul of the IESG.  At some point, we could have 
enough implementation experience to make a reasonable decision about
what to do with regard to this area. 

 
 


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:51 AM Z CST