Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft]

New Message Reply About this list Date view Thread view Subject view Author view Attachment view

From: Nicolas Williams (Nicolas.Williams@sun.com)
Date: 02/06/03-11:37:07 AM Z


Date: Thu, 6 Feb 2003 11:37:07 -0600
From: Nicolas Williams <Nicolas.Williams@sun.com>
Subject: Re: [Dan.Oscarsson@kiconsulting.se: Comments on NFSv4 rfc3010bis-05 draft]
Message-ID: <20030206113707.Z18728@binky.central.sun.com>

On Thu, Feb 06, 2003 at 09:30:08AM -0800, Noveck, Dave wrote:
> I want to separate the two issues.  Checking UTF-8 and
> checking form C.

These two are, in fact, distinct.

> It's my impression that the spec already requires that 
> you check stuff that supposed to be utf-8 and reject it
> if it isn't valid utf-8.  I can't quote a specific statement
> to that effect but when it says that that's what's valid, my 
> conclusion is that anything else is invalid.  In fact,
> on my bug list (way down, to be sure) is a problem that
> Peter Astrand's test suite complains that I don't reject 
> a *tag* that doesn't consist of valid UTF-8.

The spec speaks of UTF-8 throughout - I don't see how it could be
construed as correct behaviour to allow non-UTF-8 encodings or invalid
UTF-8 encodings (e.g., overlong sequences).

> As to checking form C, it really doesn't make much difference
> to me whether the spec requires the server to check it.  Saying 
> that the client has to produce correct form C, but that the 
> server doesn't have to check it, with the client getting "weird" 
> results if he uses the wrong form, is not something that I
> would be prepared to live with.  I would wind up checking
> the normalization rather than having to wonder, in any 
> internationalization situation in which something wierd 
> happened, whether a client with bad normalization was involved.

It would be useful to have benchamrks of Unicode normalization.

> I'm assuming that checking for valid normalization form C
> is simpler than actually doing the normalization.  I'm hoping
> that for most actual strings processed the execution time
> will be roughly comparable to just checking UTF-8, even though
> there will obviously be examples that are considerably more 
> expensive to process.

Your assumption is correct.  I don't know about the perf aspect, but
users of mostly-ASCII filenames would likely not notice any impact.

Cheers,

Nico
-- 


New Message Reply About this list Date view Thread view Subject view Author view Attachment view

This archive was generated by hypermail 2.1.2 : 03/04/05-02:12:05 AM Z CST