[Servercert-wg] Ballot SC-52 version 2: Specify CRL Validity Intervals in Seconds

Wed Jan 5 18:53:57 UTC 2022

On 5/1/2022 6:43 μ.μ., Ryan Sleevi wrote:
>
>
> On Wed, Jan 5, 2022 at 7:31 AM Dimitris Zacharopoulos (HARICA) 
> <dzacharo at harica.gr> wrote:
>
>     I clearly understand your opinion about the "ceiling"
>     requirements. However, this is not what those documents currently
>     say.
>
>
> Could you be more specific about what you believe the documents say?

They say "at least as" which means a CA would expect to be compliant if 
it fulfilled the absolute minimum. However, with the introduction of 1'' 
requirements, a CA that was previously compliant with just the bare 
minimum is now out of compliance because of 1''. I hear you about trying 
to get CAs to go above and beyond but in reality most TLS certificates 
are still issued with 397-day validity (with a few exceptions :-). This 
is a message that you have tried to convey to the CAs and all I'm saying 
is that we could write it in the document so it's abandonedly clear. 
After seeing incidents of CAs missing deadlines by seconds, and since 
this ballot introduces language that defines an hour and day in the 
precision of a second, I think it should include such a warning 
statement. I don't see any harm in doing that.

>
> That is, for any requirement talking about a date range and some 
> upper-bound, it's inherently a ceiling as written, because you cannot 
> let more time elapse than the requirement. We don't have, as far as I 
> know, any requirements that require you do something exactly (no 
> sooner, no later) - they're all of the variation of (no later). But 
> maybe I'm overlooking something, so if you can be precise about the 
> concern, that may help.
>
>     If we are to move to this direction -which I am fully supportive-
>     it must me done collectively and with clear language. With that
>     said, if we are to move to this approach defining "ceilings", we
>     need to justify the purpose and security implications. For
>     example, I would argue that a CA should be able to perform a
>     pen-test annually without caring about the additional second or
>     even a day. I would argue that security-wise, performing a
>     pen-test every 365 days or every 366 days has negligent impact on
>     the security of the system. Obviously this is different from the
>     certificate, OCSP, CRL validity some of which have clear and
>     precise language coming from normative RFCs.
>
>
> I totally hear you on this, but I think this is overlooking my 
> previous reply. The goal is *not* to have a pen test every 365/366 
> days, it's to have a pen test *at least* every 365/366 days. In 
> practice, a forward thinking CA would do this /more/ frequently: the 
> annual requirement just represents the upper-bound.

I meant it for CAs that want to implement the bare minimum as currently 
allowed by the "at least" language. If we want to have more forward 
thinking CAs, we should drive them there with appropriate language in 
the BRs. Not many CAs read these email threads :-)

>
> This is where the Baseline Requirements are /not/ the definition of 
> what a "great" CA looks like, but rather, the absolute minimum that 
> CAs need to meet.
>
> Just to recap where we're talking about intervals:
>
>   * 4.9.7 CRL issuance ("/at least/ once every twelve months", "/at
>     least /within 24 hours", "MUST NOT be more than twelve months")
>   * 4.9.10 OCSP ("/at least/ every twelve months", "/within/ 24 hours")
>   * 6.3.2 Operational periods ("MUST NOT have a Validity Period
>     greater than 398 days")
>   * 8.1 Frequency or circumstances of assessment ("no earlier than
>     twelve (12) months prior to", "within ninety (90) days")
>   * 8.6 Communications of results ("no later than three months after")
>   * 5.4.3 Retention period for Audit Logs ("for at least two years")
>   * 5.5.2 Retention period for archive ("at least seven years")
>   * 8.4 Topics covered by assessment ("but in no case may any non-core
>     control be audited less often than once every three years")
>
> There's no reference to weeks in the BRs, AFAICT.

I was thinking of the NSRs.

>
> So all of these represent upper-bounds on security sensitive tasks, 
> and as far as I can tell, are all explicitly worded as upper-bounds. 
> Am I overlooking an interpretation here?
>
>     Understood. Please see my previous comment about making this
>     explicit in the document(s) because until this is clearly written,
>     there will be CAs out there who will read "SHALL execute this at
>     least every 4 days" and will set their systems exactly at that
>     number and risk becoming non-compliance because of a second (like
>     with the 397, 398 certificate validity periods), when the
>     expectation is "SHOULD execute this at least every 3 days and
>     SHALL execute at least every 4 days".
>
>     I'm sure CAs would welcome a clear "catch-all" requirement that
>     says "you must take all those time periods as the absolute ceiling
>     and should implement controls below those documented limits for
>     safety".
>
>
> I mean, isn't that in the very name of the document - the Baseline 
> Requirements? All of these represent the bare minimum controls/safety 
> margins, and just like in any safety/security critical environment, 
> the expectation is that you build tolerances in to your processes. If 
> the requirement is you must support X load, then as engineers, you 
> make sure you support X load plus more. If the requirement is you must 
> do something no later than 24 hours, then you build your process to 
> look at 12 hours, and start escalating at that point, and then hit the 
> safety critical alarms when  you get at 8 hours. If you're building 
> something that requires you support 200psi, then your "danger zone" is 
> precisely as you approach 200psi, because your system is not rated for 
> more by spec - even if you would have built in extra tolerances.
>
> That's perhaps the disconnect with understanding your feedback.

The current BRs don't have "soft" or "hard" limits except for some cases 
where we use SHOULD and SHALL for 397 and 398 days validity. If we want 
to have this notion of "hard" deadlines, if the expectation is to run a 
task at least monthly, we already say that it needs to be executed "at 
least every 31 days". The expectation is clear (monthly), the precision 
is aligned (31 days) and the implementation is simple and convenient.

This is not the same for cases where the expectation is to execute 
something quarterly and the requirements say "at least every 90 days" 
because the precision misses the expectation. In this case, the CA is 
forced to drift the quarters to be on the safe side. If we wanted to 
follow the existing patterns, we would need to make this "at least every 
93 days". Even for the status of subCAs in OCSP responses we didn't say 
"at least every 360 or 365 days" but used "at least every 367 days" to 
help CAs implement it annually.

Our "danger zone" should be practical if possible. Like I said, the 
drifting problem may create security problems if a CA needs to build 
custom code and complicated procedures for otherwise simple scheduling 
tasks.

>     I believe the previous clarification that this accuracy applies
>     for hours and days only, was very helpful. To your question about
>     the frequency of offline ceremonies, There are those who support
>     that offline ceremonies should be done only when absolutely
>     necessary (because they contains several additional risks) and
>     those that support that they should be done more frequently than
>     absolutely necessary because repetition also brings improvements
>     and stability in documented procedures.
>
>     For the specific issue regarding CRL/OCSP signing ceremonies, we
>     could start a separate thread if you believe they should be
>     performed more frequently. In order to avoid the risks of
>     inaccessibility to offline facilities (as we've seen some cases
>     due to COVID-19 lockdowns), we could move to a position of
>     proposing and then requiring (SHOULD and then SHALL) the issuance
>     of Root CRLs/OCSP responders/responses twice a year but maintain
>     the validity to 12 months.
>
>
> We certainly could, but I think we both agree that's a separate 
> discussion. I think the disconnect is whether, in the absence of such 
> a mandate (SHOULD/SHALL), is it still a good idea? And I think, as 
> with most of the BRs, the answer is "Yes, it is". This is effectively 
> the past discussions about lifetimes, and how setting shorter 
> lifetimes (voluntarily) is /still/ good, and shouldn't require the 
> Forum to bless it before a CA does so.

Yes, agreed.

>     Raising "slightly" with negligent step back for security in order
>     to ensure CAs have simple/clean implementation paths and specific
>     language that these are indeed "ceilings" and CAs are expected to
>     "do better than", doesn't sound so bad to me. But I understand the
>     concern and makes sense to try to push this expectation to the
>     document rather than "adding slack" to existing requirements.
>
>
> Hopefully the concrete discussion above helps identify, and clarify, 
> why slack seems to be a step backward for security, and an unnecessary 
> concern given the existing language.

Some justified "slack" may be a step forward for security, when it leads 
to simpler and easier managed procedures/practices. We have historically 
had cases where we gave more "slack" for convenience and simplicity. The 
30 --> 31 days in 3e of the NSRs, the 12 months --> 398 days in the EV 
Guidelines. There might be more.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cabforum.org/pipermail/servercert-wg/attachments/20220105/21f0ee3d/attachment-0001.html>