[cabf_validation] Using dedicated DNS resolvers for domain validation

Tue Aug 6 22:52:35 UTC 2024

Hi Gurleen,

On Tue, 6 Aug 2024, Gurleen Grewal wrote:

> Any publicly available ACME CA essentially operates as a public resolver -
> by invoking the ACME protocol and requesting domain validation, an attacker
> can cause the CA to perform DNS queries for a domain of their choice. In
> this case, solely running a dedicated resolver is not an effective
> mitigation against the kinds of attacks described.

ACME is fundamentally JSON over HTTPS, and as such, it is fundamentally 
orders of magnitude more expensive to cause a large amount of lookups over 
ACME as compared to over DNS over UDP (or even over DNS over TCP). This 
especially holds true when you take the receiving end into account (i.e. 
the CA's ACME endpoint), the attacker could indeed cheat a bit and drive 
complexity down on their end.

As such, an attack mounted via ACME will be inherently more noticeable to 
an attacked CA, as it might even cause resource exhaustion, and would 
presumably be visible in metrics and logs. By comparison, an attack 
mounted on a resolver operated by a DTP but used for DV would not be 
visible to the CA at all, and even if the DTP noticed the attack traffic 
they would not understand the significance of it, and it would also not be 
something they would prioritize to mitigate when it happens; not only 
because nobody will distrust them or qualify their audits or whatever when 
they do not, but because it is simply not going to be in the envelope of 
guarantees and considerations they offer the service under. At least as 
long as the DTP is not specifically in the business of operating a "DV 
Resolver".

Furthermore, attacking DNS resolution or the resolver itself through ACME 
obviously inherently gives an attacker less degrees of freedom in how to 
mount the attack, obscure DNS protocol features would not be accessible, 
for example.

These are all significant benefits over simply using a resolver that can 
be directly queried by an attacker, and these benefits alone, in my 
opinion, illustrate why using a dedicated resolver operated for this 
purpose alone is the only way to implement a DV process worth its salt.

You are however entirely correct in noting that more can and should be 
done.

> Could you provide more detail on what mitigations you expect a CA to
> implement relative to DNS resolution - e.g. rate limiting DNS queries seems
> to be one of the implied measures a dedicated resolver would be expected to
> implement? Before the discussion gets too far along, it would help to have
> a list of potential threats and tie potential mitigations back to them.

It has not been my intention to only imply I would expect rate limits. I 
would indeed expect them. I worded it the way I did for a reason, however 
that may have caused some confusion. Let me try to provide some 
background:

You do not necessarily even need to operate a nameserver for the DV 
process, or use any (non-authoritative) nameserver in the process at all. 
Many programming languages offer DNS protocol support in ways that allow 
you to query authoritative nameservers directly.

Using an actual recursive nameserver to do this work has its benefits, the 
usual suspects of nameserver implementations are of course written, 
maintained and supported by experts with a high level of expertise and a 
commendable commitment to writing secure code and providing appropriate 
response to security issues.

But fundamentally, only a small portion of the nameserver's features are 
relevant to the process of DV validation, while at the same time the 
nameserver's API - the DNS protocol - is very powerful and can in 
principle access a much larger portion of the nameserver's features and 
code.

Because of that, I would indeed consider it extremely important that such 
a nameserver could not be reached by anything other than the DV process 
and related, tightly controlled other aspects of the CA's operation (like 
name resolution occuring in the context of making requests to e.g. QGIS, 
WHOIS, RDAP).

And indeed, I would expect rate limits. These would not be enforced only 
on the FQDN level, but also on the public suffix level as determined by 
the Public Suffix List. And this is where what I wrote maybe is 
confusing: I expect the resolver cannot be reached by anything but the DV 
process and related tightly controlled use cases, and as such, 
appropriate rate limiting would in my mind probably be enforced in the 
part of the DV process that is querying the DV Resolver to begin with.

TL;DR I firmly believe the following measures significantly increase the 
security of a DV process:

* Rate limits
* Prevent non-DV-related queries to the DV Resolver
* Query using TCP instead of UDP, or other, similarly secure mechanisms
   where available (e.g. QUIC and possibly others pointed to by Aaron Gable
   in
   https://lists.cabforum.org/pipermail/validation/2024-July/002000.html)
* When querying over UDP, detect attempts at injecting forged responses

However, I am certain there could be additional measures providing great 
benefit. Since I am not involved in any DV process at any level, I can 
really only contribute at the level of more obvious measures without 
digging into it further.

> Generally, GTS is in favor of additional requirements on DNS resolvers for
> Domain Validation and we'd welcome more work in the community on best
> practices for robust DNS resolution.

Not saying that would be what you meant, but I just want to make it clear 
that in my mind, it is not just about configuring a DNS resolver 
"appropriately", it is in fact about being very conscious and careful 
about how and for what a resolver is used in the DV process and how to 
design this involvement in a way that gives an adequate level of 
robustness to DNS level attacks.

Tobi