Does DARPA's Cyber Grand Challenge Need A Safety Protocol?
By Jeremy Gillula, Nate Cardozo, and Peter Eckersley
Today, DARPA (the Defense Advanced Research Projects Agency, the R&D arm of the US military) is holding the finals for its Cyber Grand Challenge (CGC) competition at DEF CON. We think that this initiative by DARPA is very cool, very innovative, and could have been a little dangerous.
In this post, we’re going to talk about why the CGC is important and interesting (it's about building automated systems that can break into computers!); about some of the dangers posed by this line of automated security research; and the sorts of safety precautions that may become appropriate as endeavors in this space become more advanced. We think there may be some real policy concerns down the road about systems that can automate the process of exploiting vulnerabilities. But rather than calling for external policy interventions, we think the best people to address these issues are the people doing the research themselves—and we encourage them to come together now to address these questions explicitly.
The DARPA Cyber Grand Challenge
In some ways, the Cyber Grand Challenge is a lot like normal capture the flag (CTF) competitions held at hacker and computer security events. Different teams all connect their computers to the same network and place a special file (the “flag”) in a secure location on their machines. The goal is to secure your team's machines to make sure nobody else can hack into them and retrieve the flag, while simultaneously trying to hack the other teams' machines and exfiltrate their flag. (And of course, your computer has to stay connected to the network the whole time, possibly serving a website or providing some other network service.)
The difference with DARPA's Cyber Grand Challenge, though, is that the “hackers” participating in the competition are automated systems. In other words, human teams get to program completely automated offensive and defensive systems which are designed to automatically detect vulnerabilities in software and either patch them or exploit them, using various techniques including fuzzing, static analysis or machine learning. Then, during the competition, these automated systems face off against each other with no human participation or help. Once the competition starts, it's all up to the automated systems.
In principle, autonomous vulnerability detection research like this is only an incremental step beyond the excellent fuzzing work being done at Google, Microsoft and elsewhere, and may be good from a cybersecurity policy perspective, particularly if it serves to level the playing field between attackers and defenders when it comes to computer and network security. To date, attackers have tended to have the advantage because they often only need to find one vulnerability in order to compromise a system. No matter how many vulnerabilities a defender patches, if there's even one critical bug they haven't discovered, an attacker could find a way in. Research like the Cyber Grand Challenge could help even the odds by giving defenders tools which will automatically scan all exposed software, and not only discover vulnerabilities, but assist in patching them, too. Theoretically, if automated methods became the best way of finding bugs, it might negate some of the asymmetries that often make defensive computer security work so difficult.
But this silver lining has a cloud. We are going to start seeing tools that don't just identify vulnerabilities, but automatically write and launch exploits for them. Using these same sorts of autonomous tools, we can imagine an attacker creating (perhaps even accidentally) a 21st century version of the Morris worm that can discover new zero days to help itself propagate. How do you defend the Internet against a virus that continuously finds new vulnerabilities as it attacks new machines? The obvious answer would be to use one of the automated defensive patching systems we just described—but unfortunately, in many cases such a system just won't be effective or deployable.
Why not? Because not all computer systems can be patched easily. A multitude of Internet of Things devices have already been built and sold where a remote upgrade simply isn't possible—particularly on embedded systems where the software is flashed onto a microcontroller and upgrading requires an actual physical connection. Other devices might technically have the capability to be upgraded, but the manufacturer might not have designed or implemented an official remote upgrade channel.1 And even when there is an official upgrade channel, many devices continue to be used long after manufacturers decide it isn't profitable to continue to provide security updates.2
In some cases, it may be possible to do automated defensive patching on the network, before messages get to vulnerable end systems. In fact, some people closely familiar with the DARPA CGC have suggested to us that developing these kinds of defensive proxies may be one of the CGC’s long-term objectives. But such defensive patching at the network layer is only possible for protocols that are not encrypted, or on aggressively managed networks where encryption is subject to man-in-the-middle inspection by firewalls and endpoints are configured to trust man-in-the-middle CAs. Both of these situations have serious security problems of their own.
Right now, attacking the long tail of vulnerable devices, such as IoT gadgets, isn't worthwhile for many sophisticated actors because the benefit for the would-be hacker is far lower than the effort it would take to make the attack successful. Imagine a hacker thinking about attacking a model of Internet-connected thermostat that's not very popular. It would probably take days or weeks of work, and the number of compromised systems would be very low (compared to compromising a more popular model)—not to mention the systems themselves wouldn't be very useful in and of themselves. For the hacker, focusing on this particular target just isn't worth it.
But now imagine an attacker armed with a tool which discovers and exploits new vulnerabilities in any software it encounters. Such an attacker could attack an entire class of systems (all Internet of Things devices using a certain microprocessor architecture, say) much more easily. And unlike when the Morris worm went viral in 1988, today everything from Barbie dolls to tea kettles are connected to the Internet—as well as parts of our transportation infrastructure like gas pumps and traffic lights. If a 21st century Morris worm could learn to attack these systems before we replaced them with patchable, upgradable versions, the results would would be highly unpredictable and potentially very serious.
Precautions, Not Prohibitions
Does this mean we should cease performing this sort of research and stop investigating automated cybersecurity systems? Absolutely not. EFF is a pro-innovation organization, and we certainly wouldn’t ask DARPA or any other research group to stop innovating. Nor is it even really clear how you could stop such research if you wanted to; plenty of actors could do it if they wanted.
Instead, we think the right thing, at least for now, is for researchers to proceed cautiously and be conscious of the risks. When thematically similar concerns have been raised in other fields, researchers spent some time reviewing their safety precautions and risk assessments, then resumed their work. That's the right approach for automated vulnerability detection, too. At the moment, autonomous computer security research is still the purview of a small community of extremely experienced and intelligent researchers. Until our civilization's cybersecurity systems aren't quite so fragile, we believe it is the moral and ethical responsibility of our community to think through the risks that come with the technology they develop, as well as how to mitigate those risks, before it falls into the wrong hands.
For example, researchers should probably ask questions like:
- If this tool is designed to find and patch vulnerabilities, how hard would it be for someone who got its source code to turn it into a tool for finding and exploiting vulnerabilities? The differences may be small but still important. For instance, does the tool need a copy of the source code or binary it's analyzing? Does it just identify problematic inputs that may crash programs, or places in their code that may require protections, or does it go further and automate exploitation of the bugs it has found?
- What architectures or types of systems does this tool target? Are they widespread? Can these systems be easily patched and protected?
- What is the worst-case scenario if this tool's source code were leaked to, say, an enemy nation-state or authors of commercial cryptoviruses? What would happen if the tool escaped onto the public Internet?
To be clear, we're not saying that researchers should stop innovating in cases where the answers to those questions are more pessimistic. Rather, we're saying that they may want to take precautions proportional to the risk. In the same way biologists take different precautions ranging from just wearing a mask and gloves to isolating samples in a sealed negative-pressure environment, security researchers may need to vary their precautions from using full-disk encryption, all the way to only doing the research on air-gapped machines, depending on the risk involved.
For now, though, the field is still quite young and such extreme precautions probably aren't necessary. DARPA's Cyber Grand Challenge illustrates some of the reasons for this: the tools in the CGC aren't designed to target the same sort of software that runs on everyday laptops or smartphones. Instead, DARPA developed a simplified open source operating system extension expressly for the CGC. In part, this was intended to make the work of CGC contestants easier. But it was also done so that any tools designed for use in the CGC would need to be significantly modified for use in the real-world—so they don't really pose much of a danger as is, and no additional safety precautions are likely necessary.
But what if, a few years from now, the subsequent rounds of the contest target commonplace software? As they move in that direction, the designers of systems capable of automatically finding and exploiting vulnerabilities should take the time to think through the possible risks, and strategies for how to minimize them in advance. That's why we think the people who are experts in this field should come together, discuss the issues we're flagging here (and perhaps raise new ones), and come up with a strategy for handling the safety considerations for any risks they identify. In other words, we’d like to encourage the field to fully think through the ramifications of new research as it’s conducted. Much like the genetics community did in 1975, we think researchers working in the intersection of AI, automation, and computer security should come together to hold a virtual “Autonomous Cybersecurity Asilomar Conference.” Such a conference would serve two purposes. It would allow the community to develop internal guidelines or suggestions for performing autonomous cybersecurity research safely, and it would reassure the public that the field isn't proceeding blindly forward, but instead proceeding in a thoughtful way with an eye toward bettering computer security for all of us.
- 1. Of course, manufacturers could turn loose autonomous patching viruses which patch users' devices as they propagate through the Internet, but this could open up a huge can of worms if users aren't expecting their devices to undergo these sorts of aggressive pseudo-attacks (not to mention the possible legal ramifications under the CFAA).
We're looking at you, Android device manufacturers, mobile carriers, and Google.