What Makes a Good Security Audit?

By Peter Eckersley
Published on 2014-11-08, on the EFF blog.

EFF recently began a new Campaign for Secure & Usable Crypto, with the aim of encouraging the creation and use of tools and protocols that not only offer genuinely secure messaging, but are also usable in practice by the humans who are most vulnerable to dangerous surveillance, including those who are not necessarily sophisticated computer users. The first phase of this campaign is the Secure Messaging Scorecard, which aims to identify messaging systems that are on the right track from a security perspective. In subsequent phases of the campaign, we plan to delve deeper into the usability and security properties of the tools that are doing best in the Scorecard. One crucial aspect of the Scorecard and the campaign is and will be code auditing. We've gotten a lot of questions about the auditing column in the Scorecard, so we thought it would be good to expand on it here.

In order to have confidence in any software that has security implications, we need to know that it has been reviewed for structural design problems and is being continuously audited for bugs and vulnerabilities in the code. All well-run projects should perform such reviews and audits, as they decrease—but do not eliminate—the risk of problems like Heartbleed, Shellshock, and thousands of other severe vulnerabilities that have received less dramatic press.

Unfortunately, there is a huge variation in the quality and effectiveness of audits. When we use software, our security depends in part on the nature and quality of these auditing processes, but they are difficult to measure. Audits can be partial or thorough; the people conducting them can vary enormously in their levels of skill and experience; the audit can look mostly for common kinds of security errors, or also search for bugs and design issues that are more subtle and particular to the codebase; they can rely primarily on generic software for static analysis and vulnerability scanning, somewhat customized software for "fuzzing" an application, and/or incorporate a great deal of manual analysis by experienced humans. The vulnerabilities found by audits may or may not be fixed, and especially in the case of design and structure flaws that are partially mitigated, it may or may not be clear whether they have been fixed.

In the course of constructing our Secure Messaging Scorecard, we encountered a significant challenge around these variations. We know it is essential that users pick software that is well-audited, but it isn't obvious how to define an objective and practical-to-evaluate metric for the quality of audits. We considered a few options on this front:

Transparency. If an audit is published, then the security community can look at its methods and findings, and form opinions about how thorough it was. But some high-quality auditors who work on very widely used software told us that they were nervous about bidding for commercial auditing projects where the audit would be published. Their message was essentially, "if the audit will be published, we will inevitably have commercial incentives to only find bugs that are quick and easy to fix, and not design flaws that are hard or impossible to resolve."

Vouching. Will the auditor vouch for the product after the audit is complete and bugs have been addressed (completely, partially, or not at all)? It's a good sign if the auditor is enthusiastic about the code, but it's also risky to try to measure and act on that ("if you won't sign off on our product, we'll find an auditor that will!").

Audit metrics. How many bugs of different types were found and fixed, and by what means and how quickly were they found? The problem with such metrics is they commingle the ease of finding bugs in a codebase (which is strongly connected to its security) with the skill levels of the auditors. Unless one can control for one of these variables, the audit metrics may not be especially informative.

Facilitating audit quality. Evaluate tools not just on whether they are audited, but on whether they do other things to make auditing (including independent auditing) easier.

Given these considerations, for the Scorecard we included not one but three columns that we believe are indicative of good code review practice, though they cannot categorically guarantee it:

We included a check mark for a recent audit. Audits have to be regular (at least yearly); conducted by individuals or teams other than the developers of the software; and they must examine the design and structure of the project as well the code itself. For the reasons discussed above, we don't require companies to publish their audits, and we don't ask the auditors to vouch for the tools they audited, though we require that the audits be conducted by an identifiable party.
We included a check mark for projects that publish a clear and technically detailed design document, which is essential for both external and internal review of the design; and
We included a check mark for projects which publish independently reviewable and buildable copies of their source code, which ensures that the maker of the software isn't also a gatekeeper for all white-box audits.

We did not review, judge, or vouch for the audits of each technology. However, we wanted to both encourage communication software developers to regularly audit their code and give an indication to everyday users about which tools are at least making a systematic effort to review their codebases. In the near future we plan to publish a document to provide more detail regarding what the developers of each tool said about their audits.

The Secure Messaging Scorecard is the first phase of a longer Campaign for Secure & Usable Crypto. During subsequent phases of this campaign, we intend to delve deeper on the auditing front. This Scorecard represents only the first phase of the campaign. In later phases, we are planning to offer closer examinations of the usability and security of the tools that score the highest here. EFF does not endorse particular communication tools, and we recognize that different users may have different security concerns and considerations. These scores, and particularly the auditing column, are merely indications that the projects are on the right track.

As always, we value feedback from the security community as well as the larger technical community, and we hope to continue to refine our Secure Messaging Scorecard to make it as useful and accurate as possible. To learn more about protecting your communications from surveillance, visit EFF's Surveillance Self Defense.

← Home