From: Nicolas Williams (Nicolas.Williams@sun.com)
Date: 02/06/03-11:37:07 AM Z
Date: Thu, 6 Feb 2003 11:37:07 -0600 From: Nicolas Williams <Nicolas.Williams@sun.com> Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft] Message-ID: <20030206113707.Z18728@binky.central.sun.com> On Thu, Feb 06, 2003 at 09:30:08AM -0800, Noveck, Dave wrote: > I want to separate the two issues. Checking UTF-8 and > checking form C. These two are, in fact, distinct. > It's my impression that the spec already requires that > you check stuff that supposed to be utf-8 and reject it > if it isn't valid utf-8. I can't quote a specific statement > to that effect but when it says that that's what's valid, my > conclusion is that anything else is invalid. In fact, > on my bug list (way down, to be sure) is a problem that > Peter Astrand's test suite complains that I don't reject > a *tag* that doesn't consist of valid UTF-8. The spec speaks of UTF-8 throughout - I don't see how it could be construed as correct behaviour to allow non-UTF-8 encodings or invalid UTF-8 encodings (e.g., overlong sequences). > As to checking form C, it really doesn't make much difference > to me whether the spec requires the server to check it. Saying > that the client has to produce correct form C, but that the > server doesn't have to check it, with the client getting "weird" > results if he uses the wrong form, is not something that I > would be prepared to live with. I would wind up checking > the normalization rather than having to wonder, in any > internationalization situation in which something wierd > happened, whether a client with bad normalization was involved. It would be useful to have benchamrks of Unicode normalization. > I'm assuming that checking for valid normalization form C > is simpler than actually doing the normalization. I'm hoping > that for most actual strings processed the execution time > will be roughly comparable to just checking UTF-8, even though > there will obviously be examples that are considerably more > expensive to process. Your assumption is correct. I don't know about the perf aspect, but users of mostly-ASCII filenames would likely not notice any impact. Cheers, Nico --
This archive was generated by hypermail 2.1.2 : 03/04/05-02:12:05 AM Z CST