[Servercert-wg] Ballots SC20 and SC21

Mon Jun 3 11:54:40 MST 2019

On Mon, 3 Jun 2019, Ryan Sleevi wrote:

> On Sun, Jun 2, 2019 at 9:04 PM Tobias S. Josefowitz <tobij at opera.com> wrote:
>
>>> However, in this case, I'm considering the 'when things go wrong'
>> scenario.
>>>
>>> When Hans doesn't do this, and the Web server gets popped and an XSS is
>>> added to slurp the RA's/CA's credentials, and then use that to manually
>>> approve a certificate for 'google.com' (for example), I go ask the CA
>> for
>>> an incident report and ask them to tell me what they were doing and why
>>> their systems should have fixed this. When they reveal Hans wasn't
>> actually
>>> doing this, it's clear that the CA was violating the NetSec requirements,
>>> and calls into question what the auditor was doing to assess that Hans
>> was
>>> and had been, since they clearly weren't. We've got an unambiguous
>>> violation of the requirements.
>>
>> And here I am unsure if you do not give the current 1h of the NSRs more
>> credit than it deserves. First of all, "[Human] Review" certainly includes
>> a bit of "best effort"; if not in the scope of items to be reviewed (even
>> that is unclear to me), then certainly in what will be the result of the
>> review. We do, after all, incorporate the human element, and humans make
>> mistakes (or so they say).
>>
>
> I don't think that in any way excuses things, as the past decade of the CA
> Web PKI has shown us. A human failure is, unfortunately, the most common
> failure it seems for CAs today - again, simply looking at the number of
> "Some-State" / "Some-City" certificates, and the challenges some CAs are
> facing (claiming they need 2+ months to replace), shows we've got many
> systemic issues.
>
> If a CA has not designed a system robust to account for Hans' human error,
> than the fault lies with the CA, full-stop. They're not meeting the level
> of expectation and trust placed in them.
>
>
>> Imagine the scenario of a CA based on Windows-family operating systems. An
>> adversary makes a change to the registry that cripples the security of the
>> CA. Now, is the registry configuration? Are only some parts of it
>> configuration? Can we blame Hans if he misses a change (I am assuming the
>> registry may change wildly, this may not be the case in the real world),
>> and can we blame Hans for deeming a change in it - or any configuration
>> actually - to be benign when it is not?
>>
>
> These are all questions the CA needs to address, and must take the
> maximally conservative definition upon these in order to uphold the trust
> placed in them. If and when such an issue is detected, the factors that

Ok, now how about the CA needed to address this as part of a documented 
assessment? Especially if we assume (whether the currently proposed 
language achieves that or not) that the assessment is to be documented 
specifically so that the assessment can be assessed itself.

Granted, the proposed language indeed is weaker in the sense that it 
captures only security-relevant configuration. I see no reason to assume 
CAs would not have to "take the maximally conservative definition upon 
these in order to uphold the trust placed in them".

> went into that decision, the risk analysis, the number of controls and
> design, all of those factor in to the incident report, so that we can
> formalize best practices, and look at both the specific implementation
> questions as well as the generalized rule.
>
> I don't look at Hans' specifically, but I think any CA that fails to ask
> themselves these questions, fails to take a view that says "For safety, we
> should say yes", and fails to design systems to account for that as being
> problematic.

I perceive a bit of an interesting gap in how you describe "Review 
configurations of Issuing Systems, Certificate Management Systems, 
Security Support Systems, and Front-End / Internal-Support Systems" (CAs 
have a very specific purpose and they therefore must use utmost caution, 
err on the side of safety, and apply the widest possible safety margin, 
and implement this in the most sensible way with layers and layers of 
controls and remediations) compared to what you say about "the CA shall 
identify which configurations of Issuing Systems, Certificate Management 
Systems, Security Support Systems, and Front-End / Internal-Support 
Systems are security relevant. Such configurations shall be systematically 
implemented, policy- and standard violations shall be detected" (CAs will 
game the system for no particular reason or gain and can absolutely not be 
trusted to do anything right). I find I have trouble closing this gap I 
perceive.

There is a way to read SC20 to mean "why yes of course you can use tooling 
to support the detection of violating or otherwise unwanted configuration 
(changes) as opposed to having to have a human look at loads and loads of 
stuff, and in turn please document the judgement calls you will 
undoubtedly have to make anyway and bind yourself to remedy any findings". 
I know this because that is essentially how I read it. You obviously very 
strongly do not read it like this, and that may very well be the failure 
of us who worked on the Ballot Draft, but we tried.

>> And last but not least, what if a CA says "yes sure, Hans complained 
>> about this issue since 2016, but we are not in a habit of listening to 
>> him anyway"; the current language in the NSRs does not require any 
>> resulting action whatsoever. That is, unless you consider at least some 
>> configuration issues found under review to be a "Critical 
>> Vulnerability" (do you?). Or am I missing another mechanism?
>>
>
> I do agree that the proposal benefits from placing remediation in scope.
> However, and again, this is nothing new as the past several years of
> incident reports show, a more practical and common answer is "Yes, Hans
> complained about this for years, so we scheduled this for our 2021
> refresh." And that's part of why I say the proposed improvement is... not
> much.

Indeed, if what you describe is the norm, the gain would be frustratingly 
small. I wonder though if this would best be improved as part of the

>
>> That said, what would we need to do language-wise to put you in a 
>> position to act on a "less-than-stellar" assessment of 
>> security-relevant configurations in the same way that you say you could 
>> act on review failures?
>>
>
> I think the existing language, while not perfect, addresses that better
> than the proposed language, in that it reduces ambiguity and
> interpretation. I'm still uncertain about the need to leave things
> generalized by limiting it to a subset of the affected systems, a subset of
> the configuration, or ambiguous statements such as "where applicable". In

I feel the need to point out that the current 1h also restricts the scope 
of review to "Issuing Systems, Certificate Management Systems, Security 
Support Systems, and Front-End / Internal-Support Systems".

> short, I don't think there is or should be room for interpretation, or when
> it is afforded, it should be accounted for with some degree of
> transparency; whether in the results, the disclosure, or some other form is
> perhaps situation dependent.

Well, what if it is not applicable? By listing some items explicitely 
(which the current NSRs just do not) we would force CAs to check them even 
if they simply do not exist. For example on an offline system that is so 
offline that is has no network hardware, no network drivers, and no 
network stack, for example.

> Today, that transparency is partially accounted for through the incident
> reports that Mozilla makes public, which helps the whole community - Forum
> included - better understand where challenges are. If there are specific
> cases you can demonstrate where non-compliance existed, but was deemed to
> be reasonable, that might help find better language - as we saw in the
> revocation discussions. If there are specific implementation examples where
> the existing requirements prevent or create challenges for CAs to implement
> reasonable mitigations, that too is useful to share - as we saw in the
> validation discussions. But I don't think we'll end up with good results
> trying to abstract it away.

The motivation has not been to abstract anything away. To repeat the 
motivations:

* Automated tooling to detect violating or unwanted changes would be
   compliant, human review does not need to be implemented in so far as the
   tooling covers the issue; we assume this to give better results than
   human review
* The assessment of scope would now be explicit and documented

My personal view additionally is that, once CAs have adopted this, they 
would probably want to use the systems they use for implementation and 
detection for other systems as well, which are out of scope, and as a 
result of having moved to tooling supported configuration implementation 
and violation detection, we may (more) easily get CA support in extending 
the scope and reducing timeline for detection.