Humble Bundle: What counts as a breach?
A recent breach at Humble Bundle, marketer of discount computer games, exposes some interesting subtleties about what “data breach” should mean. You can get full details at the link below.
https://gamerant.com/humble-bundle-data-security/
In this case, an attacker entered Humble Bundle’s system but was not able to carry off information wholesale. They did, however, exploit a flaw in Humble Bundle’s code that allowed them to answer a number of yes-or-no questions about Humble Bundle’s customers. The attackers essentially worked their way down a list of emails extracting information for each on whether that email was attached to an active subscription.
This provides a good illustration of a general principle: the more information the bad guys have, the more they can get. Absent other information, the hacker’s exploit of Humble Bundle is pretty useless… given a good guess at all the emails that might have a Humble Bundle subscription, the hacker’s exploit is as good as making off with the full list.
Every little bit of information that can be kept out of the hands of criminals is meaningful.
Basics of Synthetic Identity Theft
Identity Theft Tips
What is(n't) blockchain to infosec? Part Four: A Data Breach Liability
Blockchain is hyped as a security panacea but blockchains really create as many problems as they solve. Blockchain does nothing in particular to keep criminals from stealing sensitive data and to the extent that a blockchain duplicates data across multiple servers (which any blockchain by its nature must to some degree) it is really just creating more surface area and more breach risk. None of these problems are insoluble, but none of the solutions are uniquely blockchain solutions. They are solutions for the servers that make up the blockchain and work as they would be applied on a conventional centralized server architecture.
Bitcoin, as the original blockchain, is a natural case study. Anyone can download the software to set up a mining node on whatever computer they like - that computer is now both part of the blockchain and the same old-school centralized computer it was before. All of the data of the Bitcoin blockchain will be available on the file system of that machine as other files are, and if this were sensitive information it could be stolen by hackers just like any other information. The computer on which the node was set up is not instantly more secure for having this data on it and could be riddled with malware, worms, and you-name-it other security liabilities. It is actually pretty easy to see the consequences of this situation, as blockchain transactions are really quite public, Bitcoin is stolen all the time via other software lurking and interfering with users’ wallets, and so on.
Because a blockchain inevitably duplicates data across nodes, a blockchain is only as safe from breach as its least secure node - a theft from one node is as good as a theft from any other. The nodes are just centralized servers themselves, so there is nothing qualitatively different about blockchain breach security either. To make a blockchain secure from data breach, one needs to make its nodes secure from data breach.
It is here that subtle security liabilities become apparent. If you are considering moving data from a purely centralized system to a blockchain, you can only expect to maintain whatever level of breach security you had before, and you can only achieve this level of security by uniformly implementing your old security uniformly on each node. This sounds easy enough, but it will take you away from decentralization - a centrally mandated list of security practices for each node, if successfully implemented, is well down the road to putting blockchain as we saw earlier in this series that some of blockchain’s desirable properties actually depend on some level of distrust or at least non-collusion between nodes.
Blockchain does have some interesting security properties, notably that it provides strong tamper-resistance via immutability as discussed in Part One. However, these emerge from distrust between nodes and are weakened if all the nodes are within one organization and subject to centralized governance.
Coming up in Part Five, we’ll continue to explore this same theme, that a blockchain is a group of centralized servers and thus retains limitations of centralized servers, as we examine various (bogus) computing performance claims around blockchain. Also, be on the lookout for an upcoming post about Capnion’s approach to these breach liability problems and how they can be solved in blockchain and non-blockchain contexts.
What is(n't) blockchain to infosec? Part Three: Not quantum at all
Blockchain is often presented as a solution to the problems that quantum computing will pose for cryptography and these claims are false. Blockchain has no particular relationship to quantum computing whatsoever. Some very brief backstory: encryption methods are invariably built on a math problem that is believed to be difficult and quantum computers are a sophisticated new technology that promises to make some of these math problems less difficult. Although there is considerable propaganda out there to the contrary, blockchain is built on top of the same cryptographic techniques as everything else, it will be compromised by quantum computing like everything else, and if it gets patched up it will likely be in the same way as everything else.
As we discussed briefly in Part One, blockchain is not an innovation in cryptography per se but a distributed program built with heavy use of existing cryptography. Notably, blockchain makes heavy use of cryptographic hash functions and asymmetric (public and private) key encryption. Any frequent user of a cryptocurrency is implicitly familiar with the latter as it is important to keep track of one's private keys, which essentially give ownership of accounts with currency, while the corresponding public keys represent one's identity on the ledger. The public vs. private key algorithms used in most blockchains are not unique at all but are well-tested algorithms, even down to the level of particular implementations, that are applied many other places.
It is the public vs. private key algorithms that are threatened by quantum cryptography and as blockchain is using the same algorithms it is also threatened. It is well beyond the scope of this post to give the details of how quantum computing works, but for those readers who want to do their own Googling we will hit some of the high points. Asymmetric key cryptography is typically built around some form of a type of math problem, the discrete logarithm problem, which involves certain computations that are efficient in one direction but very difficult to invert. The security of the cipher is dependent on this difficulty, and the danger of quantum computing is that it allows new algorithms that make this inversion much easier.
For the moment, though, there is no danger. The engineering difficulty around building a practical quantum computer is vast and existing (extraordinarily expensive) prototypes contain just a few quantum-analog logic gates. Even as they improve, there will be a long period where very few actors have real access to them - they will be a sort of cryptographic nuclear weapon. There are also new cryptographic techniques, notably lattice-based cryptography, that may prove more resistant to quantum attacks. Blockchain's could easily be fixed by swapping in these new parts, but it would be the new components resisting quantum cryptography and not any aspect of the blockchain algorithm itself.
To recap, blockchain has nothing to do with quantum computing and won't do anything on its own to protect you from quantum attacks. Next in Part Four, I will talk about how blockchain doesn't do anything itself to protect your data from theft and rather presents significant new liabilities.
What is(n't) blockchain to infosec? Part Two: Consensus, but maybe not what you wanted.
An important part of a blockchain is the process by which the nodes come to an agreement about what to add to the ledger - you might call this a consensus process. This consensus process has implicitly gotten a lot of attention, both in the abstract context of the "Byzantine general's" problem and for it's application in business settings. Unfortunately, much of this attention is not justified by the reality of the algorithm.
The core of the consensus algorithm, the part that actually makes a decision per se, is a simple, familiar majority vote. If 51% of the nodes agree that the next block in the chain should look a particular way, that is the consensus verdict. The 51% number gives it's name to the "51% percent attack" where a sufficiently large (51% or more) group of nodes ("miners" in the cryptocurrency context) can make the blockchain ledger anything they want, and this attack is thus often discuss in the context of centralization in cryptocurrency mining. That there is danger in too much friendliness between nodes is a point we will visit again.
The original blockchain architecture hardens its consensus process by making suggesting a new block expensive, originally via the "proof-of-work" concept. To submit a new block, a node must also solve a computationally expensive (and notably, totally useless otherwise) cryptographic problem. This deters bad guys who might set up malicious nodes, as running nodes is expensive and running enough nodes to approach the 51% number is extremely expensive.
An important subtext in these last two paragraphs is that we hope our nodes distrust each other. We hope they can't collude and that the are checking each other's "proof-of-work" homework. Blockchain is not quite the "trustless" innovation advertised, but something powered by distrust.
Elsewhere, you can find a lot of skeptical commentary on "private" blockchains and it all relates to the distrust issue discussed above. If you are trying to run a blockchain inside a single business, it is liability that your nodes might be run by people who all work for the same people and drink together after work. You might be running an expensive, from a computational standpoint, architecture that really is just a simple majority vote without the distrust among nodes that makes it work. Much of the interest in blockchain from large businesses revolves around it's functionality as a consensus process, but it is not a very good consensus process if implemented inside a single business.
Blockchain is also not the comprehensive solution to the "Byzantine general's problem" (a landmark type of problem in network communication) that it is sometimes proclaimed to be. We have seen above that it requires a specific sort of human context to work appropriately. It also has more subtle problems, too subtle to discuss here but embodied in debacles like the $70 million DAO hack. The short story is this hack exploited confusion about just which nodes had what information and when, and this the essence of the problem facing the Byzantine generals.
Thus, blockchain is in part an interesting and novel consensus process, but this consensus process depends on human context and has technical limitations. Commentary on blockchain is often hagiographic and careless with both of these.
In Part One, we talked about what blockchain definitely offers: immutability. Here in Part Two, we talked about what blockchain is and isn't on its own: a consensus protocol. In the following installments, we'll start to examine the things blockchain definitely is not beginning with a discussion of how blockchain definitely does not offer any unique safe harbor from the security problems posed by quantum computing.
What is(n't) blockchain to infosec? Part One: Blockchain is Immutability
The central, novel property of blockchain is immutability. This means that records can not be changed once they are accepted, and implicit here also is that records receive a timestamp that can not be changed once it is agreed. Immutability was key to the success of the Bitcoin network as it guaranteed there would be no tampering with the older sections of the ledger. Immutability is also the property by which blockchain can offer something genuinely new to information security, but to see clearly how this works we should first examine some basic concepts in cryptography and blockchain architecture.
The mysticism around blockchain imagines it as being everywhere and nowhere, but a blockchain is tangibly a group of databases on a number of different computers, commonly called nodes, communicating constantly via a cryptographic protocol in order to make sure they are all keeping records in the same way. This protocol isn't really an innovation in cryptography in the pure sense, but really a bunch of old ingredients linked together including a very common ingredient called a cryptographic hash function. Informally, the important properties of these hash functions are that 1) they are easy to compute, but very difficult to invert, 2) their output depends chaotically on every little bit of input, and 3) they (hopefully almost) never produce the same output given two inputs. If I give you an output from a hash function, you are going to have a hell of a time finding an input that produces this output unless I give you mine.
So thus far we have databases sharing cryptographic information to try and stay on the same page. This could be a big headache if we are working with a lot of data, and this is where the blocks, their arrangement in a chain, and our hash function get put to work. All the data goes into blocks as we get it, and we put the data of the n-1 -th block into a hash function and include this output in the n -th block. Because of property 2) listed above, this means that any change in a block will radically change all the hashes appearing in all later blocks, while properties 1) and 3) make it extremely difficult to cheat and come up with new, fake blocks that prevent these radical changes. Thus, the efforts of our many databases to stay on the same page need focus on the most recent block as any violation of prior consensus will upset the hash appearing in this most recent block. This resistance to changes in prior consensus, which we previously called immutability, is what makes the whole enterprise workable, both computationally and on level of human trust.
Immutability is useful in information security because it guarantees tamper resistance. We might want to ensure that malicious actors are not doctoring our records towards their own ends, and the transmission of hashes from one block to the next ensures that this is very difficult or impossible. But...in the blockchain context immutability depends on the collaboration of the nodes of our network and their consensus process. The role played by this consensus process, and how the consensus process determines what blockchain can and can't do for information security, will be the topic of the next post in this series.
Singapore, Healthcare Consolidation, and Data Security
Singapore Health Services was recently hit by a massive breach with 1.5 million records lost. Although 1.5 million would still be an eye-popping number in the United States, in Singapore this breach affects one in four citizens - comparable to a breach affecting 80+ million plus Americans. This 80 million numbers seems hard to imagine, but it is becoming more and more plausible as ongoing consolidation in healthcare drives ever greater centralization in data storage.
Even in the past few months, Cigna has purchased Express Scripts for $67 bn and CVS has bought Aetna for $69 bn while the widespread expectation is that the approval of the AT&T & Time Warner merger can only prompt more consolidation. The time is coming when breaches at single firms, including healthcare firms holding medical information, will compromise big percentages or even majorities of American consumers.
Lessons from Timehop
The Timehop breach, and this TechCrunch article about in particular, have a lot to teach about security as well as the media optics surrounding security. Timehop is overhauling their security in response to the breach and this has inevitably exposed them to public questions about what they were doing with their security before - it is more than awkward to have TechCrunch stating "questions should be asked why it took an incident response to trigger a “more pervasive” security overhaul." Pervasive encryption is really an urgent "must" at this point and this is exactly why Capnion is working to minimize its burden on the rest of your business.
The breach itself was another example of lax security, as it attacked a cloud computing account without two-factor authentication. Most cloud infrastructure services, including AWS and Digital Ocean for example, offer this service free and encourage you to use it aggressively. You need only download an (also free) app like DuoMobile to get started. Not only is this kind of security a must, but the spectacular revelation you don't have it is an invitation for consumers to question your competence.
The ALERRT breach and the many public risks that breaches pose
Conversation about data breaches often focuses on consumer data held by businesses but there are all sorts of databases out there that might be dangerous in the wrong hands. The recently announced breach of the ALERRT (Advanced Law Enforcement Rapid Response Training) is a great example. More detail can be found here.
ALERRT is an organization that provides active shooter response training to law enforcement officers. The compromised database unfortunately presents significant risks to the public, not only to the 100,000+ law enforcement officers whose personal information was directly compromised but also the public in general via the information on likely targets and response readiness in municipalities across the country.
Exactis & Breaches at Aggregators
The data breach at Exactis is notable for its gravity, how it has probably been under-reported, and for how it provides a window into the buying and selling of consumer data behind the scenes.
You can find more detailed reporting here.
It is common, much more than the general public is aware of, for companies to buy and sell databases of information on consumers for sales purposes. Credit rating agencies are certainly not the only ones who strive to keep a record on the habits of every Americans. Unfortunately, the companies that do this sort of aggregation are positioned to do special damage if they are compromised.
What is P.I.I.? P.I.I. as Regulatory Category
The informal definition of P.I.I. (personally identifiable information) is plain - it is information that can be used to identify an individual person. It is important, though, that this informal definition is a bit fuzzy around the boundaries and P.I.I. is also a formal regulatory category defined differently in different jurisdictions.
For example, in the United States the NIST defines P.I.I. as "any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information."
On the other hand, the European Union's General Data Protection Regulation does not utilize the concept of P.I.I. but rather the more inclusive concept of "personal data" which includes all information about an individual whether it can be used to identify them or not.
No Data, No Breach
The most sensitive data often has the least real content. Your social security number, for example, doesn't say much of anything about you - you might say it was a bit of math that refers to you. Why should businesses carry it around like it really means something and thus put you at risk? They could replace it with something else that remembers enough to do the work of your SSN but not enough to be useful to a hacker. This is the Capnion philosophy: no data, no breach.
Zero-knowledge proofs and the total-knowledge status quo
There are also sorts of processes out there that are really about proof but rarely stated this way. When you call the bank and verify your identity your mother's maiden name, they are not interested in the name per se but the proof that you know it. Record linkage processes behind the scenes essentially operate on proof, done in the CPU of a computer, that a collection of records all refers to the same person in real life - just what a person's name is doesn't matter, but how it corresponds to other names in other records. It's not a term in wide circulation, but you might call these total-knowledge proofs in that the information about the names is exposed.
There is a cryptographic technique called a zero-knowledge proof that allows these linkages and verifications to be performed without giving away anything about the data in question. They are a natural fit for the P.I.I. (personally identifiable information) held by businesses about consumers, as this information is rarely of interest in it's own right but is instead used for the sort of matching and identification mentioned. Capnion's position is that these zero-knowledge methods should replace their total-knowledge counterparts throughout the economy, eliminating the need for many businesses to ever hold unencrypted data on consumers.
What to watch in the Dixons Carphone breach...
There are a number of interesting takeaways from the data breach announced by European electronics firm Dixons Carphone earlier this week.
First, the breach provides partial validation of the new chip-and-pin technology. Many compromised cards had this new technology and as secondary authenticators, like CVV values and PINs, were NOT compromised these consumers may be relatively safe. This is is also a validation of the general principle that it is good design to set up multiple necessary points of failure that attackers must compromise before real damage is done.
Second, the Dixons Carphone breach will be worth following going forward as it may involve violations of the new GDPR data privacy regulations. If Dixons is punished under the new law they may be among the first and their case will set a tone for how the law is applied going forward.
Finally, an interesting side note is the prior 2015 breach at Dixons breach which proceeded by a rather innocuous attack vector: an out of data WordPress site...
Data breach and identity theft
Identity theft is a growing problem and probably one of the creepier threats to your finances at large today. There are many things we can personally do to protect ourselves, such as shredding documents when they are no longer needed, regularly checking credit reports, etc. However, data on consumers lost by business in data breaches is a major source of the sensitive data (names, credit card numbers, and so on) that fuels criminals. In some cases, the most famous being the loss of information on nearly 150 million consumers at Equifax, there isn't a whole lot for private citizens to do. This is why it's so important that the public become involved in driving the next wave of data privacy technology - the consumer is the principal beneficiary and can take a role in making sure that enterprise is doing what needs to be done to protect them.
Blockchain and Data Privacy
There are many businesses considering putting data (supply chain data) on a blockchain but not much conversation of the liabilities this presents. In most blockchain architectures, all the data on the blockchain is actually held by all the nodes on the network and then each of these nodes has the power to lose this data in a breach. Given how much trouble the world is having protecting data stored on centralized servers, reproducing the data many times is likely to produce an unprecedented data privacy crisis as multiplication of opportunity produces a multitude of new breaches.
What is P.I.I. (a.k.a. PII or personally identifiable information)?
An important category of data is personally identifiable information, often referred to as P.I.I. or PII, and it's name is suggests accurately what it is: information that can be used to identify an individual person. There are many very familiar examples like name, social security number, address, etc. Some more arcane examples are the sorts of things one needs to supply as a secondary verification of identity at the bank, such as mother's maiden name. P.I.I. is often an explicitly spelled-out regulatory category but there are a number of pieces of information that considered as P.I.I. across jurisdictions and the philosophy defining P.I.I. is consistent even when the level of inclusiveness is not. (Is the name of your first pet personally identifiable information?)
P.I.I. is important privacy is not just about what information is available, but what information can be tied back to an individual. Medical records provide a great example. If someone steals your medical records, this is not perhaps so bad if they lack information to tie these records back to you - from their perspective, they don't have your medical records but only some unknown person's medical records.
How can Ghost PII improve the security of what I am building?
Capnion's API is intended to permit developers to enhance the security of the applications using the Ghost PII protocol. Below I will walk through one, hopefully familiar, example of a transaction involving personal information and explain how its security can be improve with Ghost PII.
We've all had to tell a bank website our mother's maiden name, or pass on some similar private information, to prove our identity at some point. This is an unfortunate situation as protecting privacy requires moving around more information that jeopardizes privacy. Capnion's technology has the power to fix this situation - in particular, it can ensure that no computer needs to ever hold your unencrypted response in memory ever at any time.
Here's how it works: software integrated to your browser encrypts your response (your mother's maiden name) when you type it in and this encrypted response is all that is ever sent to the bank. It is all that sent when you open your account and establish your security questions, just as it is all that is sent when you prove your identity later. The bank only holds encrypted data and never has the ability to decrypt it. When the bank needs to check your answer, you can grant them permission to request a special key from Capnion's API that they can use to compare the two ciphertexts you gave them. This special key lets the bank know whether you gave the same answer both times and nothing else.
This transaction is an example of what is called a zero-knowledge proof. You have proven to the bank that you are who you say, and 'zero-knowledge' refers to the fact that the bank has learned nothing about your mother's maiden name.