Many Internet users are aware that they ought to look for a padlock icon in their web browsers when doing something important online. Some users understand that the padlock is supposed to indicate that their communications are “secure”—confidential and in some sense authentic. Privacy advocates have welcomed the increased use of encryption on the web; in an example of a tremendously positive development for users’ privacy, Google has introduced HTTPS encryption by default for Gmail and Google Docs, and as an option for Google search, so that users can access these services privately.1 Users who type “https://” before a site name (or are sent automatically to a secure site) may see their browsers display a reassuring padlock, revealing that encryption is in use. Where does this padlock come from and how meaningful are its guarantees?
A cornerstone of communications security is Whitfield Diffie and Martin Hellman’s 1978 invention of public-key cryptography, which removed previous codes’ crucial but burdensome requirement of a pre-shared secret key. Before public-key cryptography, people who wanted to communicate privately would first have to pre-arrange a secret code—but also a separate secret code for each pair of potential communicators. So if 1000 people wanted the potential to communicate privately, there would need to be about half a million explicit prior arrangements. Virtually nobody but military organizations would go to that level of trouble just for the possibility of private conversations.
Public-key cryptography offered a surprising breakthrough: each user could publish a single key2 that other people could use to send private messages to the key’s owner without other prior arrangement. This works because of clever mathematical insights about one-way or “trap-door” functions: transformations that are straightforward to perform but extraordinarily difficult to reverse.3 It is sometimes compared, albeit imprecisely, to a drop-box that anyone can put messages into but only the owner can unlock.
Phil Zimmermann, author of the PGP e-mail encryption software, once described the result as an ability to “communicate securely with people you’ve never met”; applied to web browsing, it means the ability to browse and interact securely with websites with which you have no prior relationship. This is an astonishing technical breakthrough. Like other encryption methods, it could prevent a computer network operator eavesdropping on or tampering with the contents of messages sent over the network (something that would otherwise be trivial), although the network operator can still block messages from being sent at all—whether with firewall rules or simply by unplugging a cable.
Unfortunately, there’s a major subtlety often overlooked even by computer experts. Although the keys belonging to e-mail users or websites can be public, others who want to communicate with them still need a reliable way to look those keys up. That is, secure communication requires that computer users—or their computers—have a dependable way to learn the right cryptographic key for each party or site they want to communicate with. Without this information, users are inherently vulnerable to a man-in-the-middle attack, where an attacker impersonates the party at the other end.4
The diagram on page 134 shows how a man-in-the-middle attack works. (It simplifies the process by presenting the computers’ data exchanges in English and omitting several steps that would be a part of a real web browsing connection. Real keys, or key fingerprints, also need to be much longer than the examples, which is part of what makes them unmemorable to human beings. Google's real public key fingerprint as of this writing is c4:70:74:fb:69:f9:e3:94:7e:8b:28:a4:00:73:de: 01, abbreviated in the example to just c4:70.
For an effective defense, users cannot rely on their intuitions derived from a site’s appearance, content, or behavior; during a successful attack, a user is communicating indirectly with the genuine site and so its contents and behavior typically appear completely genuine. Since the man-in-the-middle can forward all communications back and forth, the web site appears authentic to the Internet user, and vice versa. For example, in this case, the Internet user would be reading her actual e-mail in her Gmail account (assuming the man-in-the-middle chose not to reveal himself by modifying anything), so nothing would appear out of the ordinary. The only difference between the first and second scenario, from the user’s point of view, is the numeric cryptographic key apparently presented by the server.5 If the user is to be protected against this attack, then it’s crucial to ensure that there is a meaningful way to verify whether the key is correct or not (regardless of whether the key is displayed on-screen).
Since this information can be made public with no harm to the secrecy of the ensuing communications, the task of distributing the correct keys is often thought of as a minor logistical detail. Computer security scholar Simson Garfinkel recounts that Diffie and Hellman originally imagined these keys published in a sort of telephone directory. After all, telephone directories seem straightforward to compile. Wouldn’t this be a simple clerical task?
In fact, control over this “phone book” conveys a surprising sort of power. If we suppose there is to be only a single authoritative phone book, preventing its publication would deny interested users and web sites the ability to confidently communicate privately. Omitting a particular site from the directory, or intentionally introducing a misprint, would selectively deny that site’s users the chance to communicate securely with it. The phone book’s publisher would also have to deal with a steady stream of users misplacing or changing their keys, and with the thorny question of how it should authenticate parties that want to be listed in the book—given that many people might want to use bribery or trickery to obtain false listings in the hope of later impersonating interesting entities. This “clerical” task expands to incorporate tremendous power and responsibility: basically, the power to disrupt or corrupt everyone’s private communications.
There is no actual “phone book” of this sort. Its role, known as a public key infrastructure or PKI (because it is the infrastructure through which users, or software programs like web browsers, come to learn which public keys to use) has usually been served by one of two models: a decentralized web of trust or a centralized set of certificate authorities. The web of trust model involves announcing keys essentially by word of mouth, through existing social networks and relationships. It is most associated with the PGP e-mail encryption software, although it can in principle be used anywhere public key encryption is used. It is also best known for encouraging users to make individual, explicit decisions about whether keys are authentic—making users (perhaps painfully) aware of the complexities involved in deciding whether a key is genuine.
The certificate authority model involves delegating these decisions to a few entities called certificate authorities (CAs), which are supposed to be responsible for verifying identities and issuing digital certificates stating that they are satisfied that a particular public key really does belong to a particular entity. In the man-in-the-middle scenario above, we might hope that Google could show a certificate showing its ownership of c4:701, while the attacker would not be able to produce any credible certificate indicating that e4:d5 belongs to Google. Ideally, the browser would then warn the user about the problem; hopefully, the user would accept the warning instead of trying to disable it!
Usually certification authorities are paid for this service, typically by collecting a fee for each year of validity of an issued certificate. Certification can be an extremely lucrative business, partly because the certificate is simply numeric data and quite inexpensive to issue. The most successful certification firms have revenues in the hundreds of millions of dollars. South African space tourist Mark Shuttleworth, the first African in space, achieved his fortune from his certification firm Thawte. (He had enough money left over to single-handedly fund the development of Ubuntu, the most popular desktop Linux operating system.)
A CA could be any kind of organization that other people are willing to recognize as an authority for this purpose and entrust with the power to make these important decisions on their behalf. CAs that participate in the so-called “global public-key infrastructure” are surprisingly numerous; there are dozens of them around the world, including private corporations, national and regional governments, and a few other kinds of organizations. The most-used authorities are generally private companies in Western countries because they entered the field early and achieved broad name recognition for their services. But that does not mean that these authorities’ certifications are, or are treated as, more trustworthy than those by other authorities.
One might imagine that a certificate from such an authority is merely one piece of information among many that might help a user decide whether a public key, such as that presented by a web site, is correct and genuine. But a major priority for web browser developers since this infrastructure was introduced by the Netscape Corporation in 1994 was to reassure users that it was safe to use their credit cards to shop online —and, correspondingly, to hide the complexity of the cryptography from the user. Thus, browsers chose to accept digital certificates absolutely and unquestioningly, as long as they came from an authority approved by the browser developer. This approval process was originally quite informal and based largely on custom and accident. Today, browser developers have adopted more formal processes for deciding who makes the cut, but Firefox already trusts about 40 different certificate authorities and Internet Explorer about 100 (though unfortunately Microsoft now makes it difficult for users to see the list of trusted authorities). Modern browsers accept a digital certificate from any one of these authorities as complete, prima facie evidence that the public key described in that certificate actually belongs to the website it mentions.
This model has faced considerable skepticism since it was created, particularly from experts who worried that certificate authorities’ incentives don’t align well with end users’ (since authorities generally get paid for issuing a certificate, but will pay no direct penalty if the information in a certificate turns out to be wrong). Matt Blaze remarked cynically that CAs “protect you from anyone from whom they are unwilling to take money.” There has also been considerable concern that certificate authorities could be tricked into issuing false certificates or persuaded by national governments to do so; Google has warned that “a highly capable source may be in a position to sign certificates with a standard, pre-installed [CA] which [. . .] would allow [interception of encrypted communications] without any apparent warnings to the user.”6 Worse, since browsers trust all authorities equally and completely, adding a new authority to the list can only increase the risk to users: the chain is only as strong as its weakest link.
The main argument in favor of the certificate authority approach is that it provides convenience and apparent clarity to users—if the authorities do their jobs correctly, verifying information carefully before issuing a certificate and never issuing inaccurate certificates—users can visit new sites securely and seamlessly, and have confidence that their communications with those sites are cryptographically protected. The padlock icon conveys the web browser’s assurance that the site being visited uses HTTPS encryption and that it presented a valid certificate from a trusted authority.
A recent series of scandals involving the global public key infrastructure have re-invigorated skepticism of this model and renewed doubts about how trustworthy CAs are, how browser developers choose which CAs to trust, and whether CAs simply have too much unaccountable power. Here is a representative but not exhaustive list of recent events that have spurred scrutiny of the CAs’ role:
Continued use of vulnerable, obsolete algorithm. A team of researchers showed that a major CA continued to use an obsolete algorithm called MD5 for signing its certificates; exploiting this algorithm’s mathematical flaws, the researchers demonstrated that they could effectively trick the CA into signing a false certificate which would have delegated all the CA’s powers to the researchers themselves.
Verification step omitted entirely. A major CA had a business partner which apparently did not understand that it was supposed to verify the contents of certificates at all; as a result, it willingly issued certificates, for a fee, in the name of the major CA without checking whether the recipients really owned the sites mentioned. This made it trivial to obtain false certificates that browsers would accept as genuine.
People’s Republic of China (PRC) entity as certificate authority. A PRC quasi-governmental organization called the China Internet Network Information Center (CNNIC), which has a variety of technical responsibilities related to administering the Internet in China, followed the procedures of the major browsers and requested to be added as a trusted certificate authority. Microsoft granted this request around 2008 with little public attention; Mozilla likewise granted CNNIC’s request in 2010 amidst considerable concern from users inside and outside China that CNNIC could be induced to sign false certificates at the request of government entities. (Notably, all major web browsers already trusted many certificate authorities operated by other governments and government-affiliated entities—in Internet Explorer’s case, over two dozen governments, browser developers argued that CNNIC should be included as a trusted CA because it had followed all the required procedures.)
Surveillance industry suggests certificate authorities will comply with government requests. Two American researchers published a draft paper describing marketing presentations by Packet Forensics, an American manufacturer of network surveillance equipment sold exclusively to governments. Packet Forensics appeared to claim that its devices could be used to perform undetectable surveillance of encrypted communications using cryptographic keys or certificates that investigators obtained “potentially by court order.” This suggestion created heightened concern that certificate authorities around the world may be issuing deliberately false certificates in response to government requests.
In our view, these scandals all show that helping users communicate securely is ultimately not as simple as compiling a phone book; the stakes are too high. They also show that it’s time to rethink the certificate authority approach. There are too many certificate authorities, and they have too much power and too little accountability. Each time a new authority is added, all users who trust that authority by default become more vulnerable; every user’s connection to every web site is only as secure as the “weakest link” CA that the user’s browser trusts. Even assuming the existing design were otherwise reasonable, there is no reason to assume that all Internet users would be willing to trust precisely the same authorities. The designers of the current certification regime were inspired by a military model where there is a single, central authority and a clear chain of command and responsibility acknowledged by everyone in the organization. This is clearly not the case on the Internet today.
We think the computer security community is coming to recognize that concerns about the CA model have substance, and that it is already past time for research on ways to supplement this model so users can detect attacks even when CAs make mistakes or are induced to lie. To help researchers and site owners understand how certificates are used and to detect some attacks after the fact, we are creating an SSL Observatory service. The Observatory is probing all HTTPS websites and creating a database of the certificates they present. In the future, the Observatory will also be able to accept reports of the certificates participating end-users and ISPs encountered on the web. Site operators, for example, can consult the Observatory to learn if participating users have encountered a spurious certificate for a particular site (and, if so, which CA issued the certificate).
This could detect some attacks and problematic CA behavior after the fact, but it doesn’t directly help the users who were subject to those attacks. A much thornier problem is how to best help users make decisions in real-time about the authenticity of the keys websites are presenting. For instance, users can consult historical information from their own browsing or (using a scheme from Carnegie Mellon University) from widely-distributed “notary servers,” to see if keys are consistent across time and space; they could check whether the CA that authenticated a particular site suddenly appears to be replaced by a different CA; or websites could try to publish the correct keys through more channels, like the new secure DNSSEC directory service. Unfortunately, all of these information sources have limitations; for instance, a key might legitimately change suddenly, or a business with data centers in different countries might intentionally use different keys when serving users in different countries. If these users compared their experiences, they would be unsure whether the discrepancy was intentional on the service provider’s part (and hence not a problem) or the result of a man-in-the-middle attack (and hence a barrier to secure communications).
Another idea is that each legitimate server should somehow tell the user’s browser “what to expect” from other legitimate servers: for instance, a website’s legitimate servers in Los Angeles, Paris, and Beijing might each deliberately use different keys, but they all could be made aware of one another’s existence and advise browsers to anticipate variation depending on where the site is accessed from. This mechanism does not yet exist; putting it in place could mean considerable demands on website operators, but might in principle provide a way for browsers to differentiate anticipated changes from unexpected ones.
We plan to continue researching how the various sources of information can most usefully be combined to help Internet users. Ideally, they will lead to an alternative that is simpler and more accurate than today’s global PKI. Unfortunately, that outcome is not guaranteed. The possibility of false CA-issued certificates, for a wide range of reasons, means that there is already greater uncertainty about keys’ validity than widely realized. Users who are concerned with man-in-the-middle attacks may ultimately have to confront this complexity and uncertainty somehow; Microsoft has acknowledged that “[w]hether a user [. . .] should trust a root certificate for any particular purpose can be a difficult question. [. . . U]sers are expected to [. . .] ensure that acceptance would not cause undue risk to a user’s security [but such] user trust decisions [are] complex.”7 Essentially, what browser developers offer today is a rough attempt at making these decisions on behalf of all their users, with a high priority on convenience, even though different users could face quite different kinds of attacks and risks.
Finally, misbehavior by certificate authorities and the issuance of false certificates are not the greatest threats to privacy and security online today. Far greater threats come from malicious software on end-users’ computers, which is alarmingly widespread. Eric Rescorla notes that the computer security community tends to prefer working on the problems it knows how to solve—for example, with cryptographic protocol design, which gives us a kind of handhold for abstract, formal reasoning about security guarantees. But the usefulness of cryptographic protocols depends on having secure endpoints: ideally, computers owned by their users and are without security vulnerabilities, viruses, spyware, or keyloggers. SSL is meant to protect communications while they travel over an untrusted and potentially hostile network between computers; it can be bypassed entirely with no cryptographic subterfuge if a user’s computer is infected or the user is in an Internet café whose proprietor has installed a keylogger at every workstation. These risks are so pronounced that Gene Spafford has called the use of encryption “the equivalent of arranging an armored car to deliver credit card information from someone living in a cardboard box to someone living on a park bench.”8 For instance, it was recently discovered that malicious software was bundled with the popular VPSKeys Vietnamese text input method software. This software is thought to be able to record everything typed on a computer where it was installed. So even Vietnamese Internet users who carefully used sophisticated encryption software were ultimately unprotected by that software. In this sense, the highest priority for users who want privacy and security for their online communications is the integrity and trustworthiness of their computers. Our concerns about public key infrastructure are only meaningful for Internet users who are using computers they trust.
Notes
1. Despite our concerns about these mechanisms’ limitations, we urge end-users to continue to take advantage of them whenever possible. In the absence of specific, sophisticated attacks, the use of encryption provides users with heightened privacy protection. We commend web sites that support and encourage the use of encryption. Indeed, we have recently developed a Firefox extension called HTTPS Everywhere that makes a user's web browser use HTTPS automatically on sites known to support it. This article aims to highlight a particular threat to web encryption security that may be unfamiliar to many users, not to suggest that using encryption online is irrelevant or useless. ^
2. In modern cryptography, a key is a numerical value that can instruct a computer on how to transform or verify data, in order to scramble, unscramble it, or check its integrity. There are many kinds of keys and many ways of representing them, but all of them are essentially large numbers that were generated in a partly random way. Hence they are not very meaningful or memorable to human beings, which can be unfortunate in cases where we want to talk about a particular key. ^
3. Multiplication and factoring are the most familiar examples: it is easy to calculate that 66491×36384377 is 2419233611107, but challenging to determine which two numbers should be multiplied together to yield 15940563869. Diffie, Hellman, and other mathematicians found ways to get practical security benefits from this and other asymmetries. ^
4. There is considerable controversy about how important it is to defend against man-in-the-middle attacks. Encryption systems that don’t even try are simpler to design, build, and use. That means they might be more widely deployed and adopted, which might help communications security overall, since users would still be protected against passive eavesdropping. The instant-messaging (IM) encryption package Off-The-Record Messaging tries to achieve a happy medium: it lets individual users choose whether to verify keys or not. If they never perform the verification, they are automatically protected against ordinary eavesdropping; if they do, they are also protected against man-in-the-middle attacks. But users should re-verify whenever either starts using a new computer. ^
5. Some readers may wonder why the man-in-the-middle could not simply assert that its cryptographic key was c4:70. The answer is that the encrypted session negotiation depends on the actual numerical value of the key. As a result, the mathematics of the process prevent attackers from convincingly claiming to be using specific cryptographic keys that they do not actually possess. ^
6. Google Web Search Features, “SSL Search,” Google, http://www.google.com/support/websearch/bin/answer.py?answer=173733. ^
7. Microsoft TechNet, “Microsoft Root Certificate Program,” Microsoft, http://technet.microsoft.com/enus/library/cc751157%28printer%29.aspx. ^
8. Eugene Spafford is a professor of computer science at Purdue University and a leading expert in the field of computer security. ^