[Servercert-wg] Discussion Period Begins on Ballot SC48 - Domain Name and IP Address Encoding

Corey Bonnell Corey.Bonnell at digicert.com
Wed Jul 14 12:40:08 UTC 2021


Hi Yoshiro,
> The term "P-Label" is newly defined and is not familiar with readers.  So
I think it is not appropriate, then, I propose to use "ACE prefixed label"
instead, because A-label is too restrictive for the purpose.  ACE (stands
for ASCII Compatible Encoding) prefix is well defined in RFC 3490 (IDNA2003)
and effective in RFC 5890 (IDNA2008) as well.

The challenge here is that we need to express a type of domain label
with the following qualities:
1. Is an XN-label
2. Contains valid output from the Punycode algorithm. This Punycode output
may be the encoding of IDNA 2003, UTS46, IDNA 2008, and more. A bug/feature
(depending on how you view whether CAs should issue for registered domains
that contain invalid IDN) is that
valid Punycode output but invalid IDNs may appear.

As far as I know, there is no standard available today that defines a "slot"
that can simultaneously include IDNA 2003, UTS46, or IDNA 2008 labels. Since
we have no standard definition we can reference, we by necessity need to
define our own type of label. While I realize "P-Label" is a novel term, we
have historically created our own terminology to concisely refer to concepts
that only exist in the BRs. The issue with using "ACE prefixed label" is
that it is too permissive for our purposes; section 5 of RFC 3490 [1]
defines it as any domain label with two alphanumeric ASCII characters
followed by two hyphens. For example, "aa--foobarbaz" is an ACE prefixed
label.

>   - [middle term target] CA should validate ACE prefixed label (at least)
adjacent to PSL (stands for Public Suffix List) according to IDNA2008.
    ex: for "www.xn--love.com", the PSL is "com" so that "xn--love" should
be validated according to IDNA2008.

I agree that we should discuss whether restricting the repertoire of allowed
code points on a per-TLD basis is a viable path forward. There are
complications here, since there are legacy registered domains under the
"com" TLD that contain valid IDNA 2003/UTS 46 but are invalid according to
IDNA 2008 (e.g., xn--19g.com, which is mentioned in section 2 of
UTS 46). 

> [middle term target] Current BR does not define how to handle EAI (stands
for Email Address Internationalization) in certificate (which is defined in
RFC 8399), so it should be addressed.  Especially, how to designate EAI in
CAA RR should be clarified (which is not defined in RFC 8399).

RFC 8398 has been discussed quite a bit in the S/MIME WG, and CAA processing
for S/MIME certificate issuance has also been briefly discussed.
None of these details have been fully fleshed out or finalized, so your
participation and insights in that working group would be very welcome.

Thanks,
Corey

[1] https://datatracker.ietf.org/doc/html/rfc3490#section-5

-----Original Message-----
From: Yoshiro YONEYA <yoshiro.yoneya at jprs.co.jp> 
Sent: Wednesday, July 14, 2021 4:28 AM
To: Corey Bonnell <Corey.Bonnell at digicert.com>
Cc: Ryan Sleevi <sleevi at google.com>; CA/B Forum Server Certificate WG Public
Discussion List <servercert-wg at cabforum.org>
Subject: Re: [Servercert-wg] Discussion Period Begins on Ballot SC48 -
Domain Name and IP Address Encoding

Hi Corey and Ryan,

Thank you for your responses.

Probably, I understood Corey's point.  Delaying BR revision is not my
intention, my comments were to clarify terminology and references.  I'd like
to continue discussion about how to validate domain names written on
certificates and hope to revise BR stepwise in appropriate timings (not this
timing).

My understanding for the purpose of SC48 is former part of Ryan's point (one
views SC48 as clarifying existing requirements), and with that
understanding, I rephrase my comments as follows:

  - The term "P-Label" is newly defined and is not familiar with readers.
So I think it is not appropriate, then, I propose to use "ACE prefixed
label" instead, because A-label is too restrictive for the purpose.  ACE
(stands for ASCII Compatible Encoding) prefix is well defined in RFC 3490
(IDNA2003) and effective in RFC 5890 (IDNA2008) as well.

Followings are my memorandum for future/further discussion.

  - [middle term target] CA should validate ACE prefixed label (at least)
adjacent to PSL (stands for Public Suffix List) according to IDNA2008.
    ex: for "www.xn--love.com", the PSL is "com" so that "xn--love" should
be validated according to IDNA2008.

  - [middle term target] Current BR does not define how to handle EAI
(stands for Email Address Internationalization) in certificate (which is
defined in RFC 8399), so it should be addressed.  Especially, how to
designate EAI in CAA RR should be clarified (which is not defined in RFC
8399).

  - [long term target] When SLD LGR (stands for Second Level Domain names
Label Generation Rules) were deployed moderately in the future, CA should
validate label (not limited to IDN) adjacent to PSL according to SLD LGR of
the PSL.  (Please see
<https://www.icann.org/public-comments/lgr-second-level-2-2020-03-30-en> for
background reading.)

Best regards,

--
Yoshiro YONEYA <yoshiro.yoneya at jprs.co.jp>

On Tue, 13 Jul 2021 18:33:23 +0000 Corey Bonnell
<Corey.Bonnell at digicert.com> wrote:

> Hi Ryan,
> 
> Comments inline.
> 
>  
> 
> > I believe the point being made here is that they're already prohibited
today, by IDNA. SAC095 [1] provides the citations to explain how that flows,
and this is true for both IDNA2003 and IDNA2008 (that is, that these
characters are DISALLOWED).
> 
>  
> 
> Agreed: emoji are disallowed by all three IDNA standards.
> 
>  
> 
> > As I understand it (and I hope I'm not misrepresenting the concerns),
the concern here is that the definition of P-Label then has the same
functional effect (on invalid IDNA) that 202 would have had on underscores:
It ends up explicitly permitting them, when today, they are implicitly
forbidden (by IDNA).
> 
>  
> 
> This raises the question of what role the CA plays in prohibiting or
allowing domain names which conform to preferred name syntax but whose
presentation format in a higher-level "protocol" (i.e., IDNA) may be
incorrect. Despite IDNA 2003 being prescribed by RFC 5280, the consensus a
few years ago was that CAs are permitted to include labels that violate IDNA
2003 in certificates [1] as they are not expected to check for such labels.
Given this precedent, CAs are currently not required by at least one Root
Program to check adherence to IDNA, as the role of the CA is to validate
control of the Applicant's domain.
> 
>  
> 
> This historical view is further bolstered by Root Programs' statements
that CAs should use ACE encoding exclusively and not encode CN as U-Labels,
as the presentation of IDNs is a user client concern and should not be
forced by the CA. In other words, there's a clean "separation of concerns":
the CA is responsible for the validation of domains, and the browser is
responsible for the presentation of such domains.
> 
>  
> 
> > I think the core perspective issue is whether one views SC48 as
clarifying existing requirements or whether it's introducing requirements
that don't exist. If it's the former, the change to P-Label's definition to
reiterate the prohibition on DISALLOWED characters is no change (provided
they're disallowed by both IDNA2003 and 2008). If it's the latter, however,
then the change would be imposing a more stringent requirement than
practiced today, and something to be measured just in case. Similarly, if
stating existing requirements is the goal, then the current P-Label
definition doesn't state existing requirements, but in fact opens them up
more widely (similar to underscore), while if the latter, then the current
definition is a stepping stone to a more consistent set.
> 
>  
> 
> >From the Ballot 202 discussions, it was clear that browser expectations,
written requirements and interpretations held by CAs were not aligned. This,
coupled with the failure of Ballot 202 and the MDSP discussion referenced
above, leaves us in a state where there are no clear requirements regarding
CA obligations for IDN processing. It also important to note that the
definition of P-Label is not less restrictive than the validation
requirements for XN-labels as proposed in Ballot 202. The intent of SC48 is
to align those expectations with policy, at least to the extent proposed in
Ballot 202. And for that, it is a useful improvement. By no means is it
intended to be an end-state.
> 
>  
> 
> > While I'm firmly of the belief that the goal is to state existing
(longstanding) requirements, I understand that as with many other technical
requirements, CAs are often unaware of them, and thus their customers have
come to depend on the invalid behaviour, and so transition time is needed.
So I'm supportive of deferring this work - not because our goal is to allow
these invalid characters (which I agree with Yoshiro's seeming conclusion,
that this language will be read to be doing exactly that), but to minimize
disruption or reasons for CAs to (wrongly) object, as they did with Ballot
202.
> 
>  
> 
> As I explained above, the current written requirements regarding IDN
validation for CAs is not clear, as it is not stated on what the role of the
CA, as an ecosystem participant, is regarding enforcing IDNA adherence.
Several TLDs continue to permit invalid IDN registered domains despite the
ICANN-provided guidance that you pointed to in SAC095. And as I mentioned
previously, clients accept such invalid domain names. If the role of the CA
is to also encompass the validation of the well-formedness of IDNs in
addition to Punycode output, then the requirements should state that. Clear
guidance in this regard is also incredibly important, as the definition of
"valid IDNA" is dependent upon several inputs to the ToASCII algorithm.
These are currently unstated for the web PKI and prescribing a particular
set of inputs may cause breakage in certain user agents.
> 
>  
> 
> > I'm not supportive of using the behaviour as client software as a
justification for misissuance (#1 from above), no different than any other
issue. As I understand it, these have always forbidden, so I would hope and
expect to see CA incident reports (or an analysis about why SAC 095 is
wrong). But I wouldn't want to fully block progress here on that.
> 
>  
> 
> Given the discussion on MDSP as well as the proposed language in Ballot
202, I would be interested to see where in the relevant documents (BRs,
RFCs, etc.) it is stated that CAs must not include domain names in
certificates if they violate IDNA.
> 
>  
> 
> Thanks,
> 
> Corey
> 
>  
> 
> [1] 
> https://groups.google.com/g/mozilla.dev.security.policy/c/ad6NfLGZ730/
> m/2XoMxZ4FFQAJ
> 
>  
> 
> From: Ryan Sleevi <sleevi at google.com>
> Sent: Tuesday, July 13, 2021 1:06 PM
> To: Corey Bonnell <Corey.Bonnell at digicert.com>; CA/B Forum Server 
> Certificate WG Public Discussion List <servercert-wg at cabforum.org>
> Cc: Yoshiro YONEYA <yoshiro.yoneya at jprs.co.jp>
> Subject: Re: [Servercert-wg] Discussion Period Begins on Ballot SC48 - 
> Domain Name and IP Address Encoding
> 
>  
> 
>  
> 
>  
> 
> On Tue, Jul 13, 2021 at 6:53 AM Corey Bonnell via Servercert-wg
<servercert-wg at cabforum.org <mailto:servercert-wg at cabforum.org> > wrote:
> 
> Hi Yoshiro,
> Comments inline.
> 
> > I understand what you said, but I'm still feeling that the concept 
> > of
> "P-Label" is too wide and including invalid IDNs (even in IDNA2003, 
> which prohibit registering unassigned code point). Emoji domain names 
> are the case of such invalid IDNs.  I think registering such invalid 
> IDNs is TLD registry operators' fault, and CAs should not provide
patchwork to rescure tha fault.
> 
> As I understand it, these domains with invalid code points are 
> supported in client software today. Prohibiting certificates to be 
> issued for a whole class of registered domains is a rather large and 
> potentially disruptive change. While the topic is certainly worthy of 
> discussion and potential future proposals, I don't think such a 
> proposal fits with the shorter implementation timelines proposed in this
ballot.
> 
>  
> 
> Corey,
> 
>  
> 
> I believe the point being made here is that they're already prohibited
today, by IDNA. SAC095 [1] provides the citations to explain how that flows,
and this is true for both IDNA2003 and IDNA2008 (that is, that these
characters are DISALLOWED).
> 
>  
> 
> I believe the point that you're trying to highlight is that despite them
being disallowed:
> 
> 1.	Client software has not consistently enforced this on the client
side
> 2.	This bears similarity to how underscores ('_') were always
prohibited by the "preferred name syntax", but still required a ballot
(SC12) to adopt and explicitly sunset that for CAs.
> 3.	Separate from any TLD policies, subdomains can, have been, and are
registered with these invalid characters (e.g. xn--ls8h.example.com
<http://xn--ls8h.example.com>  is a domain that can be registered today,
despite TLD policies)
> 
> As I understand it (and I hope I'm not misrepresenting the concerns), the
concern here is that the definition of P-Label then has the same functional
effect (on invalid IDNA) that 202 would have had on underscores: It ends up
explicitly permitting them, when today, they are implicitly forbidden (by
IDNA).
> 
>  
> 
> I think the core perspective issue is whether one views SC48 as clarifying
existing requirements or whether it's introducing requirements that don't
exist. If it's the former, the change to P-Label's definition to reiterate
the prohibition on DISALLOWED characters is no change (provided they're
disallowed by both IDNA2003 and 2008). If it's the latter, however, then the
change would be imposing a more stringent requirement than practiced today,
and something to be measured just in case. Similarly, if stating existing
requirements is the goal, then the current P-Label definition doesn't state
existing requirements, but in fact opens them up more widely (similar to
underscore), while if the latter, then the current definition is a stepping
stone to a more consistent set.
> 
>  
> 
> While I'm firmly of the belief that the goal is to state existing
(longstanding) requirements, I understand that as with many other technical
requirements, CAs are often unaware of them, and thus their customers have
come to depend on the invalid behaviour, and so transition time is needed.
So I'm supportive of deferring this work - not because our goal is to allow
these invalid characters (which I agree with Yoshiro's seeming conclusion,
that this language will be read to be doing exactly that), but to minimize
disruption or reasons for CAs to (wrongly) object, as they did with Ballot
202.
> 
>  
> 
> I'm not supportive of using the behaviour as client software as a
justification for misissuance (#1 from above), no different than any other
issue. As I understand it, these have always forbidden, so I would hope and
expect to see CA incident reports (or an analysis about why SAC 095 is
wrong). But I wouldn't want to fully block progress here on that.
> 
>  
> 
> [1] https://www.icann.org/en/system/files/files/sac-095-en.pdf
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4990 bytes
Desc: not available
URL: <http://lists.cabforum.org/pipermail/servercert-wg/attachments/20210714/856d8794/attachment-0001.p7s>


More information about the Servercert-wg mailing list