From: Dan Oscarsson (Dan.Oscarsson@kiconsulting.se)
Date: 02/03/03-10:29:58 AM Z
Message-Id: <200302031627.h13GR7n13942@malmo.trab.se> Date: Mon, 3 Feb 2003 17:29:58 +0100 (CET) From: Dan Oscarsson <Dan.Oscarsson@kiconsulting.se> Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft] >happy (my security instincts say that nonetheless the server should >enforce a normalization form, but I can't yet find a convincing security >argument). Well, as many programs will not check for unnormalised or different normalisations and instead expect the code to be form C, there can be names that the display device happen to show as the same to the user but for the program are different, that can result in security problems. In all other areas I have seen working with text/UCS, they want one normalisation and one form to avoid more than one representation for the same thing. This because of security matters and to make things much simpler to handle. >> Even if I switched to UTF-8 as my local character set it will fail, if >> the UTF-8 encoded text is not normalised form C. No other form >> is acceptible to use due to things like invalid semantics, to >> much data space and complex and CPU consuming handling of that format. > >User-level code will have to be able to cope with unnormalized Unicode >text [by normalizing as necessary], with or w/o NFSv4. Not at all. Most user and os kernel code need only handle UTF-8 in form C. At least in the Unix/Linux world. That is the standard format used there. > >> You cannot expect systems to switch to unnormalised UTF-8 in their >> file system to help NFSv4. It will break most applications. > >The C runtime environment will have to do as much normalization as is >necessary under the hood, yes. Not at all. The C runtime will read the file names through the kernal system calls and expect/deliver UTF-8 form C (or legacy encoding). Unnormalised or form D will break the applications. There is no reason to allow unnormalised or form D text as that will just require more memory, complexer code and more CPU power. If we get NFSv4 code into the kernel that do not return the file names as UCS form C (or legacy encoding), we will definitely get problems. >Pages 572 and 573 of the same book mentioned above make it very clear to >me that composites added after Unicode 3.0 are disalloed in text >normalized to form C. The book references Unicode Annex #15. > Yes, that is to make in unnecessary for form C to have new precomposed characters being added later. But as each revision comes along, new characters will be included in form C too. Regards, Dan
This archive was generated by hypermail 2.1.2 : 03/04/05-01:50:51 AM Z CST