From: Nicolas Williams (Nicolas.Williams@sun.com)
Date: 02/04/03-11:30:51 PM Z
Date: Tue, 4 Feb 2003 23:30:51 -0600 From: Nicolas Williams <Nicolas.Williams@sun.com> Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft] Message-ID: <20030204233051.A18728@binky.central.sun.com> On Mon, Feb 03, 2003 at 05:29:58PM +0100, Dan Oscarsson wrote: > >happy (my security instincts say that nonetheless the server should > >enforce a normalization form, but I can't yet find a convincing security > >argument). > > Well, as many programs will not check for unnormalised or different > normalisations and instead expect the code to be form C, there can be > names that the display device happen to show as the same to the user > but for the program are different, that can result in security > problems. I was thinking along those lines, and along the lines of covert channels (could a filename which cannot be referenced by most clients be a covert channel?) But then, this problem exists today - most users may not even notice a file named " ", say. So I don't think a new problem is introduced by not requiring that the server enforce a normalization form. > In all other areas I have seen working with text/UCS, they want one > normalisation and one form to avoid more than one representation > for the same thing. This because of security matters and to make things > much simpler to handle. Yes, but here we're not talking about naming security entities, as in Kerberos, or domain labels, in DNS/IDN. So the security implications of not enforcing a normalization form are different from those in Kerberos, DNS, PKI, or what have you. > > >> Even if I switched to UTF-8 as my local character set it will fail, if > >> the UTF-8 encoded text is not normalised form C. No other form > >> is acceptible to use due to things like invalid semantics, to > >> much data space and complex and CPU consuming handling of that format. > > > >User-level code will have to be able to cope with unnormalized Unicode > >text [by normalizing as necessary], with or w/o NFSv4. > > Not at all. Most user and os kernel code need only handle UTF-8 in form C. > At least in the Unix/Linux world. That is the standard format used there. What I was getting at is that if the server does not enforce a normalization form for filenames then the clients had better use one normalization form to avoid interop problems, and if the client does the normalization then that task can be moved to user-level, as in libc. > > > >> You cannot expect systems to switch to unnormalised UTF-8 in their > >> file system to help NFSv4. It will break most applications. > > > >The C runtime environment will have to do as much normalization as is > >necessary under the hood, yes. > > Not at all. The C runtime will read the file names through the kernal > system calls and expect/deliver UTF-8 form C (or legacy encoding). > Unnormalised or form D will break the applications. I'm talking about where codeset conversions and/or normalization of filename arguments to system calls should take place - in the library stubs? or in the kernel? This is an implementation detail, of course. > >Pages 572 and 573 of the same book mentioned above make it very clear to > >me that composites added after Unicode 3.0 are disalloed in text > >normalized to form C. The book references Unicode Annex #15. > > > > Yes, that is to make in unnecessary for form C to have new precomposed > characters being added later. But as each revision comes along, new > characters will be included in form C too. New pre-composed characters can and have been added to the Unicode repertorire, but not to normalization form C; you could say that over time normalization form C will asymptotically approach form D :) Cheers, Nico --
This archive was generated by hypermail 2.1.2 : 03/04/05-02:12:05 AM Z CST