How do I use Ghost PII?

Just how Ghost PII works is a bit technical but that doesn't mean you need to be a rocket scientist to use it.  The core of Ghost PII is an API, maintained by Capnion, that provides you with specialized encryption keys for doing homomorphic computations (computations on encrypted data) on personally identifiable information like name, social security number, etc.

In a prototypical use case, the first thing you should do when you obtain personally identifiable information is call Capnion's API for an encryption key.  Once the data is encrypted correctly there is no way to lose it to an attacker unless your system and Capnion's suffer total breaches at exactly the same time, and Capnion's system is designed both with top priority given to security and conservative data governance.

Imagine that that down the road you need to compare encrypted addresses, perhaps out of concerns that two addresses only differ in a superficial way like replacement of "Road" with "Rd." or similar abbreviation.  You can then request a specialized key that will permit you to compute the number of characters that two encrypted addresses have in common without any need of decrypting. 

This has many benefits.  You never need to decrypt, which improves security.  You have cut out the time and computational resources you might have spent on de and re crypting, which saves money and time (which is also money).  You do not need to know for sure what kind of entity resolution you want to do down the road when you encrypt, nor do you need to grant the analyst examining duplicated addresses permission to see this personal information, and these break down the opposition between security and convenience.  Ghost PII is a pure win for your business because it essentially eliminates a hard tradeoff, one that has forced many businesses to work with plaintext in the past.

Ghost PII

Personally identifiable information, commonly abbreviate PII, refers to information like a name, social security number, etc.  It is sometimes a formal regulatory category and it is among the more sensitive information commonly lost in data breaches - to lose a person's medical records, for example, is more serious if there is information that can be used to tie a particular person to those records.  Much of this PII is notable for not having a whole lot of content: your social security number doesn't say much about you on it's own, but it is rather an arbitrary number (originally) used to help the government organize records about you.

Capnion has developed a specialized cryptographic protocol called Ghost PII that lets businesses work with your personally identifiable information while it is still encrypted, permitting them to keep it encrypted it all times.  Let me give some detail on how it works.  Any really secure method of encryption should produce two different ciphertexts when applied to the same social security number twice... without homomorphic encryption, there would be no way to determine if two ciphertexts had come from the same social security number originally without decrypting.  This constant need of decryption is part of what drives the breach crisis.  Capnion's Ghost PII is a technique and set of software tools for encrypting data that allows linking records on encrypted identifying numbers, determing which ciphertexts came from the same social security number without need of decrypting.

The Cost of a Breach

The costs of a data breach, to the company breached and to the public, are considerable.  There are direct costs from things like PR and legal fees from ensuing lawsuits.  There are indirect costs, perhaps more formidable, from damage to reputation and loss of trade secrets.  Each breach is different, but it is common to estimate the cost of a breach at around $140 per lost record and some discussion of these estimates is given at this link.

Once again, each breach is different and institutions may have unique liabilities.  One interesting example was the case of Independent News and Media, discussed at this link, an Irish media company that suffered a data breach.  The case was interesting because their data contained secrets about their confidential sources and thus the breach presented a threat to journalistic freedom of general public interest.

If none of these things move you, it is still never fun to lose your job.

Breaches and Human Frailty

Computer security would probably be easier if there were no humans involved.  Almost anything you would do to protect your system can be nullified by sufficiently negligent or malicious actions by your employees.  After-the-fact analyses of data breaches bear this intuition out, like that described at the link below.  It found that 1 in 4 data breaches was the work of insiders.

https://www.theregister.co.uk/2018/04/10/verizon_dbir/

One of the costs of a data breach is embarrassment, and this risk is heightened when there is a potential for a juicy, shareable headline with phrases like "employee worked with outside criminal" 

https://www.usatoday.com/story/tech/2018/04/20/many-1-5-million-accounts-may-have-been-compromised-suntrust-banks/535687002/

This suggests the power of Ghost PII: Why keep holding data that presents this sort of danger if you can get things done without it?  Do you need to know what a customer's SSN if you were only using it to link records?  The answer to the latter question is "No!" and Capnion is working to build a world where no one has the power to cause a breach because they have no need of it.

What is homomorphic encryption? Who cares?

It seems reasonable to presume that to do any sort of work on encrypted data, you should need to decrypt it first, but this is not the case.  Suppose you have two numbers a and b as well as encryption and decryption algorithms Enc and Dec.  These algorithms are said to be homomorphic in addition (substitute in multiplication throughout the following if you like) if there is a third algorithm Add(_,_) such that Dec(Add(Enc(a),Enc(b))) = a + b.

The very short story here is that homomorphic encryption is about doing work on encrypted data without needing to decrypt it, or otherwise learn about it, and still getting the right answer.

It's a big problem today how much plaintext is lying around.  There is a data breach announced almost every day and the data lost was rarely encrypted because someone needed to do work on it.  Working on encrypted data directly allows full-time encryption, and full-time encryption will allow a standard of security that ends data breaches for good.

The problem we want to solve...

People ask too rarely why the data lost in a breach wasn't encrypted.  The answer, in the past at least, has been that someone needed to do work on that data (analytics, ETL, etc.).  However, it has become possible to do this sort of work directly on ciphertext, answering questions about encrypted data without need of decrypting it, and this permits keeping sensitive data encrypted at all times.  This is what we are working on at Capnion!