One of the hottest topics with customers in their “journey” to adopt cloud is the topic of encryption. Data that goes up into space needs to be controlled to avoid leakage to hackers, script kiddies and what not. In the next few series I will be looking at Encryption in Azure, but let’s start with the basics that apply for (almost) all cloud providers.
Encryption perspective
I will take a look at it from the perspective of IaaS workloads, but the layered approach also applies to Web Apps, EC2, DocumentDB etc.
There are 3 layers of encryption possible within IaaS as displayed above.
1. Storage encryption (at rest)
All data written to the disk is encrypted so that if the disk itself is compromised (stolen), a hacker could just connect the disk to another server, boot it up and read the data. Now in theory this sounds easy, and it sort of used to be this easy. Someone having access to your datacenter could indeed rip the drives out of a server (or take the whole server) and extract the data. So, for your local server, your workstations or portable drives, this technology make perfect sense. But cloud providers work a bit differently given they operate at such a large scale, it would be impossible to keep that same architecture. First, your data is not stored on a single drive. Its not even stored on 3 drives for redundancy. It is stored on a very large number of drives in a very – very – large storage array spread out over multiple racks. And the spread is not per file, it is per block of bits. For comparison, say your file was spread out over individual stars in the milky way: try to recompile some information from the below picture.
While there aren’t as many drives in a datacenter as stars in the milky way, the comparison still stands.
As you can see, it would be very hard to take the entire array of disks, spread out in a datacenter and then re-join all the disks to the same array just to get to a storage account to then access the data. Just stealing 1 drive, only give 1/x’s of the actual data set, leaving you with a few bits of individual files.
But how unlikely that scenario is, it could happen that a government entity indeed rushes into a datacenter during time of war and takes control of the entire DC. Which brings me to the next point.
Encryption at Rest protects against offline exposure of data. The keys are accessed at the time of mounting or someone accessing your storage accounts. Most cloud providers never take their storage offline. It needs to be online, as customers expect it to be online. Everything you write/read to the encrypted storage account is encrypted at write and readable as long as the backend has access to the data encryption key (not you individually).
In order to ensure speed and reduce latency, the data on the disks is not encrypted with your actual encryption key in a vault. Not even if you bring your own key. All data is protected by a data-encryption-key – which in turn is protected by your keys in the vault.
When you configure HSM-Backed or Managed keys (Key-Encryption-Keys), you do get the additional control over the key that unlocks the data-encryption-key for your configured storage account. In order to avoid every request for data resulting in a query against the key vault itself, the data encryption key is cached on the storage arrays and the actual customers keys are queried as soon as someone tries to access the storage account or at specific intervals, deleting the Key-Encryption-Key from your vault renders the data useless (usually after a few minutes). But if someone was after your data, they can just copy it straight from your storage account – as its mounted – I would not need a key to get to just the data (taking into account that other security features surround your storage account from any stranger doing that)
2. VHD/service encryption
The second layer for encryption is the Virtual HardDrive. The technologies used for this are usually BitLocker and DM-Crypt or in the case of databases; SQL Transparent Data Encryption.
For your VM’s, this is equal to the encryption used on-premises or your laptop. When your device gets stolen and the attacker has access to the physical drive, all data on that drive will be encrypted and unreadable unless the encryption key is also available. In order to boot the VM, the Hypervisor of the host managing the VM will retrieve the keys to the drive to be able to read the data. Note that the drive itself is not decrypted upon boot, but with the keys, the Hypervisor is able to read the data from the drive. While the HyperVisor still needs access to the keys to unlock the data on the virtual hard drive, this method actually protects against someone copying your virtual harddrive(s) files and trying to access the data in an uncontrolled environment. (For example, a subscription contributor with access to the disks of your domain controller, who is not the domain admin – he/she would be able to copy the NTDS disk and run a DC at home to try to get to the password hashes). Given standard actions (like copying data) are usually not monitored, the disk encryption is actually a valuable method to avoid data leakage and should be used if sensitive data is stored on a VM.
Now even if the drives are encrypted, you have “unencrypted” access to the data (as the process is transparent) as long as the file system is mounted and access is through the OS itself. For example you going to a fileshare of the VM and copying data from the VM over the network. The data is encrypted “at rest” within the VM – but as the VM is online, you have full access to the data in an unencrypted format. Again, if I was after your data, the VM Agent on the VM would allow the CSP to interact with the VM disk contents directly. Or I could attack the VM from network perspective and extract the data that way.
3. File System Encryption
The 3rd level is customer-controlled encryption through 3rd party software. Once the VM is booted and the customer has access to the operating system, it is possible to install 3rd party or OS native encryption for the data drives (it is not possible to use 3rd party encryption on the OS drive as the Hypervisor would not be able to boot the VM). In this case, the data on the data drives is encrypted with customer managed software/keys. For example, it is possible to still enable BitLocker on data drives, where the keys are stored in Active Directory, or apply transparent HDFS encryption using an external KeyStore. Again, while not relying on the CSP encryption methods in this case, as soon as the data is mounted – it is readable through the VM agents, network access etc.
Another way is for your data itself to be encrypted for example inside a database. Reading the data would mean you would get an encrypted version of that data – that your application then needs to unlock. Either by going to an HSM, a Rights Management System or other. This means the data is not humanly readable unless you (the application) actually go and fetch a key from the encryption system. This is sometimes used – but only in certain applications and certainly not in normal services like SMB/NFS etc.
HSM Backed versus Customer Managed versus Service Managed Keys
Most cloud providers allow you to use three types of keys. HSM Backed, Customer Managed and Service Keys. By default, most of the services from any Cloud Vendor uses Service Managed keys. The standard encryption used for services and storage is used with this service. It is virtually invisible to the end-user and is active as soon as you deploy something. In Azure for example, all managed disks created after June 10th, 2017 are encrypted by default using this Service Key as Storage Service Encryption was introduced at that time.
The next level (Customer Controlled Keys) is where businesses or administrators want to have a bit more control on the keys used. This is where KeyVault is required to manage the keys and then per service you specify where these keys are used. The keys themselves are derived from KeyVault and are software backed keys. Meaning, they are derived from the KeyVault and the keys are controlled by the cloud vendor, it allows you to create different keys (based off the CSP root keys) to be used by the different services or in different subscriptions. Also, the key policies are more controllable, as the administrator can set the permitted operations on each individual key.
While using your own Key gives you more control on its usage, it is also an added burden on administrators to keep the keys updated and manage their permissions. The question is, if you absolutely need to have full control over the keys or if service keys are good enough.
The third level is HSM Backed keys, using Bring-Your-Own-Key. Keys are generated in your HSM and uploaded directly from your HSM to the key vault and the cloud provider never sees an unencrypted version of those. That means they can’t give your keys to a government but also protects against an attacker trying to get to your keys directly. This seems to be more secure, but note that while the keys are stored in your on-premises HSM, the cloud provider also doesn’t have the means to give you your keys back in case something happens to your HSM. You are fully responsible for all your keys and these keys provide access to all your encrypted data. As the saying goes, with great power comes great responsibility. The organization and administrators need to be fully prepared if you plan to use BYOK.
So is it worth it to start with encryption on your workloads? For some workloads, many (including me) think it is essential to use encryption beyond the standard encryption available. For example Domain Controllers and PII information in databases should always use disk/data encryption using strong keys. Now whether that key needs to be based on BYOK or Customer Controlled is another question.
Do you want to use BYOK? Just because it’s cool to use, doesn’t mean you should use it. Mary Branscombe at Cio.com formulates the question as follows: “Are you ready to become a bank, because you’ll have to run your key infrastructure with the same rigor, down to considering the travel plans of officers of the company. If you have three people authorized to use the smart card that gives access to your key, you don’t ever want to let all three of them on the same plane.”