[cabf_netsec] Draft Notes of Meeting 24-August-2017

Sun Aug 27 13:17:35 MST 2017

Here are my draft notes of last Thursday's meeting.

In Attendance:  Ben Wilson, Dean Coclin, Travis Graham, Xiu Lei, Jeff
Stapleton, Kirk Hall, Rick Agawalla, Robin Alden, Curt Spann, Dimitris
Zacharopoulos, Tim Hollebeek, Steve Hillier, Tobi Josefowitz, Peter Bowen,
Jeff Ward, Tim Crawford, Mark Richie, Tom Ritter, Alex Craig, Ryan Hurst,
Patrick Milot, Colin McIntyre

The group reviewed the priorities of topics to address based on the
SurveyMonkey poll.  They were in the following order:

1.	Defining "Root CA System", "Offline" and "Air-Gapped" and clarifying
associated requirements 
2.	Defining terms like "workstation", "account", "zone", "CA System,"
and "Issuing System" 
3.	Clarifying log review, "human review" of logs vs. automated reviews 
4.	Defining "Critical Vulnerability" and "Critical Security Event" and
clarifying actions to take and clarifying action to take within 96 hours of
detecting a vulnerability not otherwise addressed by CA's procedures 
5.	Providing guidance on the criteria for acceptable penetration tests
and vulnerability scans 
6.	Including mitigating factors and compensating controls in the NCSSR 
7.	Penetration tests after changes the CA determines are "significant" 
8.	Clarifying audit documentation requirements for network/system
configurations (1.f, g, and h) 
9.	Addressing software development vulnerabilities and processes 
10.	Modifying 2.j. "review all system account configurations every 90
days" 
11.	Addressing wireless security vulnerabilities 
12.	Timeframes in which to disable system access of former employees
(currently within 24 hours) 
13.	Password rules (currently 12 characters, OR 8 characters + changes
every 90 days, OR a documented policy) 

Ben is now tracking a second wave of changes to network security
requirements.  We looked at suggested potential language for offline and
air-gapped Root CAs.

Peter suggested we discuss the difference between offline CA and root CA. We
should be focused on offline CAs. Root CAs are an example of something that
has to be offline, I don't think we should limit are changes to just roots.
Peter said that defining "root" would be hard and it would be easier to say
offline CAs must do "XYZ" and then it's up to the CA operator or trust
service provider to say these are my offline CAs and these are the other
ones.

Ben asked whether we wanted to try and edit the network security
requirements or come up with high level concepts of things we need to do.

Peter said everyone is focused on roots because so many of these
requirements don't make sense in an offline scenario. Do we try to carve a
bunch of stuff out or do we say there is a different set? One of the
challenges is going to be, how do we define offline? Does it include systems
powered on in a rack that are air gapped from everything else?  When we
discussed this there was an assumption that we have these offline systems
that would prevent a network attack. There is also a concept of a powered
off CA. We say we keep it in a tamper proof bag or even a safe deposit box
at a bank. Patching it on a 30-day basis would be insane.

Dimitris: You're referring to 1C, where it says: Maintain a root CA system
in a high security zone and in an offline state or air gapped from all other
networks. So, if it is powered off, it's offline. If online but not
connected to any networks then it is air gapped. If it's in a safe in an
evidence bag then it's both offline because it has no power and air gapped
because it's not connected to anything.

Ryan H.: Somebody has to physically touch it to put it online, I don't think
all air-gapped systems have that property. We should also think about how we
define air gapped. 

Peter: I think what Ryan might be hitting on, if a definition of air-gapped
is not connected to the internet. The challenge that we have is you can end
up with "Dark Fiber" running between two buildings. That could be an
air-gapped network because it is a network but doesn't have any connectivity
other than those two buildings. 

Ryan H.:  Or we could be relying on switching layers to provide a logical
network segmentation. It is effectively air gapped because hackers won't get
routed to it by this logical routing magic. There is still a cable plugged
into it and a vulnerability in that control plane. We should spend some time
thinking about how we should define that.

Peter: I wouldn't say that would win the air gap award because taking that
further it turns out VPNs could be considered air-gapped.

One common thing is that there is a common goal between both air gapped and
offline is that they are trying to reach a common goal on a security level.

Peter: One thing in this discussion is that we have requirements, but we
don't have goals. Given that this is the top item that the working group
wants to work on, rather than sit here, the question should be what is the
goal we're trying to meet.

By goal, you mean, what is the risk were trying to mitigate.

Peter: Yes, given that this is security.

Ben: I think one of the issues is the fact that Root CAs can't be revoked,
except by adding them to a browser untrusted list or removing from a root
store. Maybe it isn't that hard for the browsers to do or maybe it is. The
impact of taking a root out also affects other communities of interest, not
just who was attacked. The issue is if you're looking at risk it's the
amount of damage that could be done rather than the likelihood that it can
be done, which would be low. The risk is greater because there will be more
outfall. 

Peter: We don't really have a way to revoke it, in a broad sense. What we're
trying to do is setting a high bar. There is a set of things for which we
are willing to accept a high pain threshold to even use in a proper way.

Dimitris: I think what we're trying to mitigate here is the key compromise
for a Root CA. We all agree it is the worst impact you can have for a CA
business. Adding controls to protect usage and the security of this private
key is what we should focus on.

Ben: It's not just the key compromise, it's the entire Root CA system. By
keeping it offline you reduce the possibility that someone can hack through
and get to the Root CA system and misuse it.

Peter:  The core risk we are trying to mitigate with all of this is misuse
of the key. Key compromise is I get a copy of a key, but even if it's locked
in the HSM, and I never get a copy of it, it's the ability to sign arbitrary
stuff that is just as bad.  The system itself doesn't really matter, except
to the extent that it can independently sign stuff in an unexpected way.

Ryan H.: . or influence. You can imagine protecting the key but not the
systems that it relies on, which could trigger the things to control the
operation.

Peter: You could imagine that some CA had a design to keep the CA server
online all the time, what we're going to do is the HSM is going to plug into
the CA server and only when that happens can I sign stuff. If you could
sneak in and have code that says as soon as the card comes in, sign
something, it is just as bad.

Dimitris: The network security requirements just mention the systems, but
they do not discuss how you handle the backup of the keys and how you handle
the keys in an encrypted way, this is part of BRs.

Peter: One option would be, we're just going to scope all the systems out,
for offline. It leaves us vulnerable to that scenario of someone hacking the
system in such a way that as soon as it can do something, it does. 

Ben:  That is, we leave the current language, but we add more requirements
to it. But I know that there are some people on the call who just want to
clarify what we have. There is that risk of someone inserting code, that
would then wait for the . . . 

Peter:  Given that threat, what is the reasonable mitigation there.

Ryan H.: Is there any way we could put audit requirements on the HSM? Where
the system can be compromised but there is a way to get to an audit log of
HSM and compare the two where the HSM would be able to report it?

Peter: That is a detective control. The challenge is to avoid needing to
revoke the roots. HSM log review would help with the discovery but not
prevention. I'd have to go look at the HSM documents to even see what
options there are. In offline mode, I don't know if they all have the same
features.

Ryan H.:  That is correct.

Peter:  The nice fancy HSMs do, but the Thales model doesn't actually store
the data, I don't know if the logging would even work.

Ryan H.: Again, that is a detective control, and by that time the damage is
already done because the hard part is preventing these.

Peter: Maybe this is one of those places where we said, we're not going to
point at other requirements but it feels like this is something someone has
resolved.

Ben: You do it with integrity checks on files associated with the CA.

Peter: I meant in the high value, is there another industry that we can go
and borrow from this.

Ben: There may be something in the 800-53 overlay created in the Federal PKI
group.

Peter: I'm sure they have, I also wonder if PCI has gone over this also. I
wonder what they've come up with.

Ryan H.: I imagine these requirements exist for US Federal High Assurance
Data requirements.

Peter: The reality is, they rely very heavily on air gapping and background
checks of people who touch the system. 

Ben:  With an eye aimed at not adding more things but refining and improving
what we have, I recall an email from Entrust about this. Could someone look
at their email for changes of the revisions of 1.c?

Entrust: I think what we were looking for on the Entrust side, for example,
it is clear the offline air gap requirement applies to the root CA systems
based on the text. What aren't clear are the other requirements, whether
they apply to root CA systems. Where does root CA systems fit within issuing
systems and other definitions?  If you are managing a CA offline in an air
gap, where it's not reachable outside that physical zone, then how do we
achieve these other requirements, how do you scan it, how do you patch it,
etc.? Is that necessary?  To achieve that, you have to build a network, and
then it is no longer air-gapped. That is what we were trying to clarify,
because even log review. We're spinning up our CAs once a week to check our
logs?  But it was powered down? We'd like to clearly define that weekly log
review does not apply if your managing your CA in this way. For example,
physically air gap, powered down, offline.

Peter: Because you said powered down, can we fix this simply? By simply
turning it into within x period after powering on?

Entrust: If I recall the suggestions of what Kirk forwarded on, I think it
was a lot on the definitions itself. Which is what I think was second in the
list of survey which was the definition. We might resolve some of these
issues just by clearing up the definitions.  Certificate systems is a
perfect example, is the root system included or is the root CA systems a
unique category of systems?

Ben: I think that if we change certificate systems to Certificate Issuing
Systems then we can say it does not include the root CA system.  Then we
could have a whole other set of requirements for roots.

Entrust: I agree with you, and with the other requirement such as scanning.
If we don't explicitly list Root CA Systems, then it doesn't apply to the
Root CA Systems. Because your managing it offline, it's air gapped, it's
powered down.

Kirk: The language I got was from Pat, and I forwarded it in an email dated
August 2. The definition of Root CA System is an offline system used to
create . . . offline being the only word added, then a new sentence. Root CA
system is a unique category of system and is not considered to be an Issuing
System. Issuing Systems is changed slightly, it says a system used to sign
end entity certificates.

Peter:  The reason I'm a little hesitant on that is that we have Issuing
Systems that are powered off in a safe.

Kirk: I was trying to read the language that was out there. It sounds like
people agree there needs to be a treatment between powered off and air gap.

Entrust: If we change the word from Root CA System to Offline CA System then
it doesn't have to be a Root. It could be an Issuing System it's just how
you're treating that system.

Peter: Can we define an offline system as one that is powered off. Would
that meet the needs?

Ryan H.: In a scenario where it's offline, when you brought it back up to do
activities would you not want to go forward and run the AV scans or apply
patches before you go and execute any signing operation?

Peter: What I was trying to suggest, when we add the rules: do it within X
days. We basically say within X days or on next power on. What I'm hearing
Entrust say, the only reason these systems are getting powered on is to
patch them then turn them off.

Ben: I don't like the Idea of adding additional requirements, like scanning
and patching to Root CAs when you power them back on. Something may have
been end-of-lifed, for all we know, and we're still using them.

Entrust: I have a similar concern, imagine an emergency revocation that
needs to be done and before I can do that I have to do all of the work to
qualify for a patch. I like the idea of an offline definition as a whole.
We need to think about how does it get back in compliance. The current state
of affairs where things stay out of compliance forever or bring things back
online unnecessarily, exposing them to risk is bad.

Peter: How much infrastructure is actually required? I would hope that we'd
get to a point that says an HSM or a laptop that has no wireless or network
could be a complete Root CA System.

Entrust: How do you get data to and from that system? That needs to be
accounted for.

Peter: One option is it comes in a USB stick or a CD.

Entrust:  That bridges the air gap, so when doing that, what is the lowest
risk to do that?  On the  Entrust side, we went full air gap, to the literal
sense. If it was a laptop we'd have to put in on network, scan it, review
logs and patch it. What we were trying to achieve was that it's only brought
online when needed for specific infrequent tasks limited to issuance of
certificates or revocation data. The only reason we bring it up is we issue
a cross cert or a CRL and it goes back down. Not that we have it up three to
four days. 

Ben: We don't even do that, we use a pristine image of Windows 10 without
doing updates. We don't have it in an online state.

Ryan H.: I always have people do a Linux distro with no drivers, a signing
tool that they own the source to. These things minimize the code footprint
on it so you don't have to update. 

Peter: But auditors are pushing that what you are describing is not meeting
the requirements.

Ryan H.: Let's not design a system that supports doing a dangerous thing
anyways. One CA was using Windows boxes and wifi drivers that were exposed
to a way-old driver bug that allowed a beacon to take over the system any
way. None of those belonged in that system so maybe it is important to run
scans and to do patches in these scenarios.

Peter: I think the concern that I've heard here, is we want to be able to
say you don't have to build this, but if you do build something with these
requirements, it has to be met. There is a lot of bad ways to build it, but
there are certain things that I think we all understand that if you do it
this way this is what we want to allow.

Dimitris: We have some emails sent by Ken Myers, he recommends some
definitions for air gap, offline CA, Online CA. The intent is to capture an
air-gapped CA as either not networked or isolated to a single network. So,
if it's on a single network you can do some updates.

Ryan: We should say "physically single network".

Peter: There is the question that if I'm running fiber, do different waves
in the fiber count? How physically separate? Like the fiber has to be a
separate fiber?

Entrust:  It's not VPN, it's not any of those things, those are purely
logical and you are a VLAN typo away from putting it online.  For the Root
CA System, would anyone complain if it was absolutely physical segregated?

Dimitris: That would mean separate network equipment?

Ben: On it's own network, Yes.

Entrust: It can be as simple as a laptop to a crossover to a network HSM.
There is not physical connectivity between it and the production network. 

Peter: That has to be in the same physical boundary, right?

Ryan H.: I had a client that was moving key blobs from machine to machine
over network that were only logically separated for a root key.

Peter: Your saying it should or should not be?

Ryan: Should not be. 

Entrust: In the definition of physically separated that would not be
possible, it would be something that would be for an issuing system but not
for a Root System.

Peter: Is there any need to support the concept of having systems in
location A with access from location B?

Ryan H.: In this case, the physical properties and all intermediate
locations would need to be the same.

Peter: Right, you could have a secure facility and you have two of them and
a piece of fiber running between the two.

Entrust:  As long as they were physically the same and the path between
them. When we start to think about running fiber between the two rooms, they
have to meet the same requirements, the two rooms and connectivity between
them.

Peter: You could use a network encrypter in each room.

Entrust: I think it's possible, but as soon as we go down that path it
complicates things.

Peter: Right, that's why I said does anybody need this? 

Entrust: To rule it out, do we identify this scenario as a risk that can't
be mitigated, and should we rule it off the table?

Peter: The requirements for mitigation are quite high, and I think it can be
mitigated, but I don't know if it's worth it.

Entrust:  The complications increase the risk that you won't need it.

Dimitris: I disagree, we have discussed in the past that we can't describe
all cases. We need to find the vulnerabilities and threats. You can mitigate
by having it offline completely or you can through a specialized encryption
device end to end. 

Peter: I agree compensating controls in general, I'm just making a proposal
that says, if we want to declare this special case calling them offline, you
either have to meet all requirements or do this special thing. It all has to
be in one room and physically segregated, if you do that it's a compensation
control against all the other stuff.

Entrust: Right, if you do that, then you don't have to change passwords
every 90 days.

Peter: Right, but if you do need to have two rooms physically separate, you
are back to changing passwords every 90 days. 

Entrust: That would be fine with us on the Entrust side, that is our big
issue. We are trying to do the ultimate physical separation, but from a
compliance perspective we're trying to meet all the requirements. We are
booting up stuff to change passwords every day.

Peter: Right, we should define a specific design that has a reasonable level
of specificity that is an acceptable compensating control.

Meeting adjourned.

Ben Wilson, JD, CISA, CISSP

VP Compliance

+1 801 701 9678

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cabforum.org/pipermail/netsec/attachments/20170827/57333317/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 6113 bytes
Desc: not available
URL: <http://cabforum.org/pipermail/netsec/attachments/20170827/57333317/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4974 bytes
Desc: not available
URL: <http://cabforum.org/pipermail/netsec/attachments/20170827/57333317/attachment-0001.p7s>