[cabf_netsec] Minutes from Today's Call 19-Oct-2017

Thu Oct 19 16:39:06 MST 2017

In Attendance:  Ben Wilson, Fotis Loukos, Travis Graham, Xiu Lei, Robin Alden, Wayne Thayer, Dimitris Zacharopoulos, Tim Hollebeek, Neil Dunbar, Tobi Josefowitz, Ryan Hurst, Kefeng Chen, Rick Agarwala, Patrick Milot, Colin McIntyre

Ben noted that the last call was before the F2F.  During the F2F, we recapped our progress and explained why we are working on what we are.  He explained that we were going to try and go to the next item to discuss in the Network Security Requirements, but that it might be based on what topic was easier to tackle, rather than on the prioritized list.

Dimitris gave an update on the work of the subgroup that has been discussing the threat/risk assessment and CA system architecture. Dimitris explained that it was hard to participate during the F2F based on technical difficulties.  The group has started listing the risks and some of the mitigating factors.  The subgroup would like more participation.  The subgroup will consider simplifying next steps by just documenting basic vulnerabilities and threats and omitting the compensating controls for now.

Neil suggested that the subgroup begin meeting weekly.  Weekly meetings on Wednesdays (Fridays?) were proposed.  Dimitris said that weekly meetings would be fine to gain momentum, and then the group could fall back to bi-weekly meetings.

Tim had proposed that we do threat modeling of the CA system with a high-level view of the architecture-maybe have two architectures, one for traditional and one for cloud and third-party data center arrangements.  Neil noted that if a root CA is supposed to be offline, then having it cloud-based is virtually impossible. There was general agreement that we were really talking about third-party data center arrangements, however, Tim said that portions of a CA system could be cloud-based.  He said that when doing a threat modeling exercise, you have to have an adequate description of the system, but when we check with CABF members, we'll likely find that they are some operating in a corporate data center where others are operating in someone else's data center, and the properties are somewhat different.  There was discussion of the two types of environments, and it was agreed that we need to come up with a model that works for both corporate data centers and collocated facilities.  Dimitris said that they would be proceeding with a traditional model first.  Dimitris said at the end of the exercise, we'd need to map the current network requirements and see how they fit into this architecture.  Ben wondered whether  some other group has written up / documented what the CA environment is, so that we don't have to re-create the wheel?  Dimitris said it isn't as important how we sketch it out, at the end of the day we'll be discussing the same problems, such as, how do you transfer data, update operating systems/software, etc., and that physical security is probably the easier part.  Neil said the main issue for him has always been, how do you know the state that the system is in when you are patching, or whatever.  How do you know that what you did to the system had the desired effect?  How do you audit that?  Do you have vulnerability scanning for a system that is completely offline?  That is unlikely, etc.

The group then prioritized the list of work items - green (easy to work on), yellow (moderate difficulty), and red (difficult), as follows:

1.      Defining "Root CA System", "Offline" and "Air-Gapped" and clarifying associated requirements
2.      Defining terms like "workstation", "account", "zone", "CA System," and "Issuing System" (easy)
3.      Clarifying log review, "human review" of logs vs. automated reviews (moderate)
4.      Defining "Critical Vulnerability" and "Critical Security Event" and clarifying actions to take and clarifying action to take within 96 hours of detecting a vulnerability not otherwise addressed by CA's procedures (difficult)
5.      Providing guidance on the criteria for acceptable penetration tests and vulnerability scans (difficult)
6.      Including mitigating factors and compensating controls in the NCSSR (difficult)
7.      Penetration tests after changes the CA determines are "significant" (moderate)
8.      Clarifying audit documentation requirements for network/system configurations (1.f, g, and h) (moderate)
9.      Addressing software development vulnerabilities and processes (difficult)
10.     Modifying 2.j. "review all system account configurations every 90 days" (easy)
11.     Addressing wireless security vulnerabilities (moderate)
12.     Timeframes in which to disable system access of former employees (currently within 24 hours) (easy)
13.     Password rules (currently 12 characters, OR 8 characters + changes every 90 days, OR a documented policy) (moderate).

Ben noted that we could talk about 2., ("workstation", "account", "zone", "CA System," and "Issuing System"), but that  it likely falls within the architectural/threat modeling.  It was agreed that this issue would fit in within the work of that subgroup.

Ryan suggested we talk about passwords today, 2.g.  There were lots of opinions about the quality of passwords required, 8 characters being insufficient, how the password provision applies/doesn't apply to other authentication mechanisms, etc.  Ryan suggested we look at anti-hammering mechanisms, lockouts, etc.  Dimitris noted that the Network Security Requirements already mention lockouts (2g and 2k).  Ryan noted that Windows login and TPMs have these mechanisms to slow down failed login attempts.  He noted that these can cause denial of service, which is OK in a local system, but might not be OK as mechanisms with a remote system. Ryan suggested that we provide a lot of examples, rather than be overly specific so that people can choose better solutions. Ben asked whether this meant that we should provide more guidance and fewer requirements.  Ryan said that there has to be more flexibility in which mechanisms are used.

Ben noted that discussions of passwords might take a lot of time, and maybe we should break this discussion off into a subgroup.  Ryan suggested that for topics like passwords, subgroups could come back to the group with recommendations.  Tim said a useful piece of homework for passwords would be for people to read section 5.1.1 of NIST 800-63b (https://pages.nist.gov/800-63-3/sp800-63b.html).  Volunteers for the subgroup included Ryan, Ben, Tim, Robin and Fotis.

The group turned its attention to audit logging and manual log review.  Sections 3.b to 3.e. were reviewed.  Ryan and Ben noted that these sections could be reworded so that provisions in 3.c. would encompass parts of 3.e. (the responsibility for making sure things are functioning correctly).   Logging can stop functioning for a variety of reasons, e.g. disk failure, and it's important to have processes in place to catch that, but the human review, when auditors have interpreted this, they have taken it to mean human review of the actual log with an eye to finding anomalies.  It's not realistic for humans to find patterns in the data.  So it's one thing organizations do, but not because it adds value.  You need systems in place to identify patterns, as stated in subsection c., but you don't want to have humans doing that.

Tobi asked whether the design of subsection e. might have been to catch things like classes of messages that are not being forwarded to your logging architecture.  Ryan said you could have periodic reviews of the design, and if that's the intent, then the requirement should be that you have yearly reviews of your logging strategies to ensure they are capturing the right things vs. a monthly review of logs.  What ends up happening is the new guy is assigned the task of doing manual log reviews, which provides no value.  Tobi said that he interprets e. to mean that he could glance over the log to see that it is operating properly and could spot anomalies.  Ryan said that for Let's Encrypt, with millions of log entries daily, you're never going to see it.  Tobi said that he wouldn't expect to see a single one, but if there is a whole class of warnings he would definitely notice them.  That was countered with arguments that anomalies would be buried in systems where hundreds of subsystems are logging.

Ryan said that he thought there ought to be someplace in the requirements where we address these issues, but that he didn't think a manual review of an at-scale system logs is valuable.  There should be a requirement for automation that captures security-sensitive events.

Ben said that e. is really two requirements.  One requirement is that the system has integrity and the other requirement is to have some human review of the log (with the use of an in-house or third-party audit log reduction and analysis tool), and maybe we should split them up into e. and a new subsection f.   One way to edit e. would be to remove language so that it would read, "Ensure that application and system logs have integrity and are operating properly."  Then f. could read, "Conduct a human review of logs once a month using an in-house or third-party audit log reduction and analysis tool."  However, he said he didn't know whether the latter, human review, was necessary.  We want to make this easy, practical and realistic.  If we think that this is covered in the previous subsections a-d, we could make that edit and just move on.

Dimitris said his interpretation of subsection e. is that you just have to make sure your systems are logging. It doesn't say anything more about drilling in and trying to find if there are any security alerts hidden in the logs.  You just need to check once a month that all of your critical systems are actually logging, and you can automate this process.  Tobi said that you can obviously tell if something is not working.  Dimitris said it doesn't say you have to find attacks or any weird activity, which is described under subsection c.

Ben said that Dimitris' reading of e. is the logical conclusion, but then e. has stuff about human review and log reduction analysis tools, which confuse the interpretation.  As the drafter of these requirements, he knows that there was an intent to require human review of logs using automated tools, but that he isn't arguing that it was the right thing to do, especially when it is causing this confusion. By making Ben's suggested changes, it would ensure that Dimitris' interpretation is carried forward-that the system has integrity and is operating properly.  Ben then said that if we wanted to, we could go on and talk about human review of an audit log using analytical tools, which is an extra step that is a lot of work.  Tobi said he didn't think that anyone was expecting anyone to sit down and go through massive amounts of log entries-that makes no sense.

Ben asked whether we add in a new f. a monthly review or do we make the assumption that if you  have continuous monitoring, etc., in b. and c. that you have already covered what would be provided by a monthly review?  Dimitris said he thought we have covered those under b. and c., and they are already mandatory.  There was a question as to whether we should define what "integrity" means?  Tobi said that integrity means that you are getting what you think you are getting.  It was then noted that CAs have been asked to hash logs, sign logs, stamp logs, etc.  Ryan said he didn't interpret integrity as Tobi did but that integrity means that it has not been modified since it was put there, which is different than what you put there is what you expected you put there.  The word "integrity" was highlighted and it was decided that we would pick up again when we reconvened talking about the meaning of integrity.

Meeting adjourned.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cabforum.org/pipermail/netsec/attachments/20171019/a9a2498d/attachment-0001.html>