[cabfpub] Changes to improve OCSP interoperability, security, and usability

Brian Smith bsmith at mozilla.com
Tue Oct 22 20:45:53 MST 2013


[I originally tried to send this from my brian at briansmith.org email, which is the email address I normally use for my work at Mozilla. However, the message bounced, saying that the list is read-only for non-members. Sorry if you end up getting this twice.]

We added support for OCSP stapling to Firefox 25 (beta) and we just
had to temporarily disable it right before the Firefox 25 release
because of interoperability issues. Our OCSP stapling implementation
is designed to be a precursor to the implementation of Must-Staple
(a.k.a. "OCSP Stapling Required"). Accordingly, we've made our OCSP
stapling implementation somewhat strict--perhaps stricter than other
OCSP stapling implementations--so that it works like Must-Staple would
have to work when the server actually staples an OCSP response. This
experience has showed us that there are some changes that need to be
made from all sides--clients, servers, and CAs--to create a secure and
high-availability Must-Staple mechanism. Here I will detail the
changes that I think are required from the CA side. I suggest that we
convert these ideas into requirements to be added to the baseline
requirements. I also encourage CAs to make these changes ASAP, even
before (if ever) they get incorporated into the baseline requirements.

1. The nextUpdate field must be present in all OCSP responses. Some
CAs are omitting the nextUpdate field and this seems to cause
interoperability issues and perhaps performance issues. Basically, the
server has to guess when clients will consider the OCSP response to be
stale if nextUpdate is omitted, and some servers seem to guess badly.
Note that RFC 5019 Section 4 already requires that nextUpdate be
provided in OCSP responses [1]: "If the nextUpdate field is absent,
the client MUST reject the response." Because some CAs are
non-conformant here, browsers must also be non-conformant. RFC 5019
conformance has been one of the biggest things that people have
requested from Firefox. We should make RFC 5019 conformance a
requirement in the baseline requirements. Currently, the baseline
requirements say "OCSP responses MUST conform to RFC2560 and/or
RFC5019." The "and/or" should be replaced with "and." The baseline
requirements further say that "OCSP responses from this service MUST
have a maximum expiration time of ten days." The only way a CA can
conform to this requirement is by including an explicit nextUpdate.
Thus, nextUpdate is already required in the baseline requirements, but
it isn't clear. We should make it more clear.

2. RE: "OCSP responses from this service MUST have a maximum
expiration time of ten days." This requirement is too lax for OCSP
stapling. A successful attacker is guaranteed to control the network
between the client and the web server, and he can choose to staple
whichever valid (non-expired) OCSP response he wants. However, a
successful attacker isn't necessarily guaranteed to control the
network between the client and the OCSP responder. In the case where
the attacker doesn't control the client <-> OCSP responder path, but
where he does control the client <-> web server path, OCSP stapling
makes the attacker more powerful. To mitigate this, we should make the
valid expiration time of OCSP responses shorter. I recommend that we
change the above text to "OCSP responses from this service MUST have a
nextUpdate field which is no more than 48 hours from the thisUpdate
field." 48 hours is a very long time to allow any attack to succeed,
but ten days is definitely an eternity, especially considering this
negative security aspect of OCSP stapling.

3. Recently, the baseline requirements were changed to mandate that an
OCSP responder must not return "Good" for an unknown certificate. Many
CAs changed their OCSP responders to return "Unknown" for unknown
certificates, and that is great. However, there was an unintended
negative consequence. Apparently, some CAs also now return "Unknown"
as soon as a certificate expires. Some clients, e.g. Firefox, allow
users to override "expired certificate" errors but not "OCSP responder
says 'Unknown'" errors. If the OCSP responder returns "Unknown"
automatically as soon as a certificate expires, then users won't be
able to override the "expired" error like they could before. Returning
"Unknown" for an expired certificate is totally reasonable. However, I
think it seems reasonable to be a little bit more relaxed about that
so that browsers don't get forced into allowing "Unknown" to be
overridden all the time. I would like "Unknown" to be(come) equivalent
to "mis-issued" so avoiding user overrides of "Unknown" errors seems
like a good idea to me. Note that some of our academic friends have
showed us some evidence that the vast majority of expired certificates
get replaced within 14 days, so having OCSP responders return "Good"
for expired certificates for a day or a few days seems pretty
reasonable. Suggestions?

4. Sometimes OCSP responders are returning "Unknown" for a while right
after a certificate that was just issued, because the OCSP responder's
database hasn't been updated and/or because "Unknown" responses have
been cached. I know this is a hard problem to solve, but I think there
are some things we can do to reduce the frequency of this. In Firefox
we will likely change our OCSP logic so that, if we have a cached
"Unknown" response, we will always try to fetch a new OCSP response
(via POST, not GET), and only return the cached "Unknown" response if
the fetch failed. Servers' OCSP stapling implementations should be
changed to do this too. However, our efforts will likely be thwarted
by caching HTTP proxies. It seems like it is important to set
Cache-Control, Expires, and/or Pragma:no-cache specially for "Unknown"
responses so that proxy caches will not cache them as long as they
will cache "Good" or "Revoked" responses.

Note that one sysadmin told me that his site was affected by #3 AND #4
at the same time: The expired certificate was "Unknown" because of #3
and the replacement certificate was "Unknown" because of #4.
Consequently, he was completely locked out of his own site for a
while, even though he replaced his certificate the same day it
expired.

5. In the NSS implementation of OCSP GET, NSS will first try the GET
method, and then fall back to POST if it gets a response other than
"Good" or "Revoked" or if it gets an expired response. I know that
some CAs have implemented special logic in their CDNs for returning
cached responses for POST requests, to work around Firefox's lack of
OCSP GET. OCSP GET support is now in NSS and I expect it to be in Firefox
soon-ish. At least once we make that change, I think it would be a
good idea for CAs to consider undoing their caching logic for the POST
requests so that POST requests always result in a revalidation with
the origin OCSP responder, and/or POST responses are cached by the CDN
for a much shorter period of time than the corresponding GET responses
are. This way, clients' fallback-to-POST logic should allow better
recovery from bad responses that have been cached by proxies.

6. For must-staple in particular, it may be worthwhile to consider
adding a backup OCSP AIA URI to certificates, that is used in case the
normal OCSP responder is not working. Then we could change servers'
OCSP stapling implementations to try each of the URIs in order. This
would likely improve the availability of sites that use must-staple in
the event that a CA's CDN experiences downtime.

[1] http://tools.ietf.org/html/rfc5019#section-4

Cheers,
Brian
--
Mozilla Networking/Crypto/Security (Necko/NSS/PSM)


More information about the Public mailing list