A comparison of two options for secure key storage and cryptographic operations
Introduction
Cryptographic keys are essential for protecting sensitive data and ensuring the integrity and authenticity of digital transactions. However, storing and managing these keys securely can be challenging, especially in cloud environments where parts of the encryption process is managed by the cloud vendors. In practice, there are two options for customers who need high levels of security and compliance for their cryptographic key storage: Host Your Own Key (External Key Store) and (Azure) Managed HSM.
This article compares the features, benefits, and drawbacks of using an External Key Store (used by other cloud providers) and Azure Managed HSM for cryptographic operations and key management. To let you as a reader to draw the conclusion as to why Azure does not support Host Your Own Key (External Key Stores), apart from the SLA impacts.
Note: This document focusses on the data encryption and uses storage accounts as an example. While the architecture is similar for BitLocker, LMCrypt, SQL and other services there are slight variances in the decryption / unlocking process, but the general architecture is the same. |
Server Side Encryption (SSE)
Before we dive into the world of key storage and key managers, we first need to figure out how encryption in cloud (and most systems) actually works.
Most cloud providers rely on server-side encryption to secure customer data. While most providers by default use “fabric” generated keys, many also allow customers to provide their own Customer Managed Keys (CMK). It is important to understand the interaction of the Key Management solutions in a server-side-encryption architecture.
Most providers use what’s called Envelope Encryption – where the actual data encryption key is protected by a (or multiple) key encryption keys. This allows quick changes to a Key Encryption Key (and therefore the encrypted data) without the need to re-encrypt all the data upon key rotation.
Note: It is important to understand that the communication between the components is based on TLS encryption. At no point is the DEK available in an unencrypted form on the network. However, given the certificates used for TLS is usually based off of the cloud providers PKI infrastructures, in the mentioned scenario’s (unless otherwise specified) we will assume clear text. |
The architecture used for Server Side encryption is similar for many services but the above image explains it from storage perspective. The architecture is generic and applies to most public cloud providers.
- Upon creation of the storage account or container inside the storage account, the storage controller generates a random symmetric AES256 encryption key. This is called the Data Encryption Key (DEK). The data stored in the storage account will be encrypted with the DEK.
- The DEK is sent to the Key Management API for a “wrap” operation. The Key Management service uses a Key Encryption Key (KEK) to wrap the DEK key in. The KEK is an asymmetric key (public/private key pair).
- The DEK is wrapped in the public key of the KEK and sent back to the storage controller. The storage controller then saves the (public) encrypted DEK in the storage account.
So, in server-side-encryption the data on the storage account is encrypted with the DEK. The DEK is protected by the KEK and the encrypted DEK is stored with the data on the storage account. When accessing the data, the following happens:
- The data is accessed and the corresponding (encrypted) DEK is loaded from storage into the storage controller.
- The controller sends the (encrypted) DEK to the Key Management API URL (which is referenced in the encrypted DEK).
- The Key Management API sends the encrypted DEK to the backend HSM and calls for an “unwrap” procedure
- The HSM unencrypts the DEK using the KEK private key stored in the HSM and returns the unencrypted DEK
- The storage controller caches the unencrypted DEK for a limited time
- And transparently unencrypts the data before serving it to the client
In the procedures above, we left out the actual location of the HSM and Key Management API. The above architecture only shows that:
- The data encryption is handled by a generated (AES256) symmetric key (DEK)
- The DEK is protected by an asymmetric key called the Key Encryption Key (KEK)
- The private key of the KEK does not have to leave the physical HSM
Initial trust in SSE
The architecture of service side encryption shows that the security/encryption relies on two factors that need to be further investigated to validate possible attack and insider threats. The creation and safeguarding of the DEK in unencrypted format, and the safeguarding of the private key of the KEK to protect the DEK in encrypted format:
- The first is the initial trust in the cloud provider, their ability to develop code and ensuring proper Code Development procedures. As cloud providers write their own firmware and software the initial trust that needs to be put in place for server-side encryption is the firmware of the storage controller (or other service using SSE). Rogue firmware could leak the actual DEK prior to it being encrypted by the KEK, or release it when it’s cached in the storage controller when data is being accessed.
A similar trust needs to be put in the procedure that the storage controller follows for actually encrypting and decrypting the DEK, as well as the key management api to execute it’s tasks properly and securely. That is why most cloud providers have root-of-trust and secure-boot validation systems in place. Either in hardware chips of again firmware based – so unauthorized changes in the actual code or root-kits are avoided. - The KEK is (usually) stored in Hardware Security Modules (HSM). These devices are designed, created and managed in such a way that private key material is safeguarded under almost any circumstance. In many cases, the trust in the backend HSM is already inherited by using trusted 3rd party providers such as Thales, nCipher and Entrust. The default procedure for unencrypting the DEK is to send the (public encrypted DEK) to the HSM for an unwrap method that returns the DEK in unencrypted format. In many cases the HSM is programmed to not release the private KEK key materials in an unencrypted format.
Key Management/Storage
As indicated above, the private keys of the Key Encryption Keys (KEK) are always stored (and protected) by the physical HSM. This is also called the key storage component in many architectures. There are 2 views for the key storage: 1. External (on-premises) and 2: Managed HSM (cloud).
- The philosophy of an External Key Store is that the keys and data are physically separated and that the customer is in full control of the physical HSM module and therefore keys.
- The philosophy of a managed HSM is that an equal or better security boundary can be created in cloud without compromising SLA’s of dependent services while retaining full customer control of the keys without HSM management.
Note: While the managed HSM/external HSM discussion is mostly about the impact to cloud services SLA’s (due to the additional dependencies of latency, components and more), these items can usually be (partially) mitigated by adding more HSM modules, redundant networks and other components or the risks can simply be accepted by the customer. In this document we will not incorporate those, as we focus on the other factors that influence the security of Hosting Your Own Keys and why just relying on a remote HSM will not increase the security of the provided key management solutions.
When looking at key management and storage, two items are important in terms of attack vectors.
Private Key Material extraction – The private keys of the KEK’s are protected by many mechanisms provided by the HSM. Extraction of the keys could lead to offline attacks, where an attacker has to have a copy of the encrypted data – and the private keys from the HSM. By default it is the task for the HSM to protect the private key materials (private keys) inside and outside of the HSM. For this, HSM’s rely on Security Worlds or partitions to secure the key material inside and outside of the HSM. The wrap/unwrap functions of a key can/should only be executed by the HSM itself inside the trusted environment, but (protected) backups and (protected) external key storage is usually available as well. The protection is handled by the HSM by wrapping all material leaving the HSM with a “masking key” which is only known to the partition the key was released from. Only the combination of the masking key and key (HSM) backup could expose the private key.
Unauthorized usage of the service – While the keys are protected by the HSM, the leakage of keys does not necessarily have to happen to misuse the service. Unauthorized users capable of using the wrap/unwrap function of the key management API (fronting the HSM) could lead to exposure of the DEK (by sending the encrypted DEK to the service) and therefore compromise the data protected by the DEK. While the HSM protects the private key material, it is the API in front of the HSM that interacts between the calling service and the HSM itself that needs to be secured as well.
It is therefore vital to not just look at the security of the keys inside the HSM, but also the surrounding services and architectures that call upon the HSM to perform its functions.
External Key Store
External Key Store is a feature that allows customers to use their own hardware security modules (HSMs) to support cryptographic operations for cloud services. With External Key Store, customers can store their keys in their own HSMs, which are physically separated from the cloud services and under their full control. This way, customers can meet their own security and compliance standards, such as FIPS 140-2 Level 3 or higher, and prevent unauthorized access or tampering with their keys.
A common architecture for an external key store is displayed above and consists of four components.
- The cloud service – The cloud service calls the Key Management Service to, on-its behalf, provide the unencrypted DEK required to process/unlock the data. The Cloud Service usually uses a (managed) identity to authorize itself to the targeted Key Encryption Key.
- The cloud managed Key Management Service – this component is still required as most of the cloud services are tightly integrated with this service. The cloud key management service can host cloud hosted keys, or have reference/referral keys that point to the on-premises actual key.
- The Key Management Proxy – as customers can choose their own HSM (vendor/type) a translation between the Key Management Service and the actual HSM call needs to be made. The proxy has an (external) URL to which the Key Management Service forwards the wrap/unwrap calls and the proxy has to convert those to the HSM API calls. In many cases, the proxy also has to perform identity conversion and validation as the HSM identity provider might not be the same as the cloud service managed identity.
- The HSM – various HSM’s can be used in the backend, but as continuous access will be required, the chosen HSM needs to be always on-line, and access must be granted on service principles or user accounts (managed by the proxy). Regular usage of the keys cannot use quorum based MFA controls (such as smart-cards).
In the External Key Store architecture, the customer has full control over the physical HSM, and the operational excellence of the HSM – such as providing MFA – multi-eyed principles to the administrative operations. The advantages here are that for specific operations (such as key creation, backup / restore) most HSM’s require multiple administrators to approve the requested operation.
Hosting your own HSM also provides more flexibility in policies and key generation. As encryption types, curves and policies are simply bound to the device’ capabilities.
Apart from hosting the HSM itself (in a secure manner), the customer is also fully responsible for hosting the proxy components as they play a vital part in the service communications. Additional trust has to be put into the proxy software that is ultimately responsible for correctly translating the API calls and in many cases the correct identity translations between incoming call and HSM key usage.
While the general idea might be that “owning” your own HSM might be more secure, there are some factors that have to be taken into account based on the 2 attack vectors:
- Private Key Material Extraction –
- The HSM backup (containing all keys in encrypted state) as well as the masking key (to unencrypt the backup) are in the same location, managed by the same entity and usually protected by the same identity provider (with or without MFA)
- Human readable credentials are required to take ownership of the HSM, to perform daily tasks such as backups, key rotation, monitoring and HSM root partition actions.
- The proxy component is another 3rd party (or open-source/self-managed) component that has full interactions with the keys and encryptions. A compromise of the proxy service can lead to a compromise of DEKs. Depending on how complex the proxy has been designed/written, the proxy could have “all keys usage” privileges – making its compromise scope much larger.
- The Key Management Proxy hosts an external accessible URL that is protected with a TLS certificate of the customer. It is vital that the communications between cloud Key Management Store and Key Management Proxy is not interrupted or compromised while the (unencrypted) DEK is returned.
- Unauthorized usage of the service
- Storing or handeling the credentials for actual HSM usage (wrap/unwrap operations) by the proxy poses a risk of credential theft. With those credentials an unauthorized system can make direct HSM calls for wrap/unwrap operations without the Key Management Service or Key Management Proxy to validate those requests.
- The Key Management Service provided by the cloud provider is still required and requires the same level of trust as all components in the architecture. This service hosts a referral key that is called upon by the cloud service, which proxies the request to the key management proxy on the remote site. A compromise of that service could allow unauthorized key replacements or calling illegal wrap/unwrap instructions.
- The Key Management Service is responsible for authorizing the cloud service to use the reference key. Once the reference key is called upon, the forwarded request to the key management proxy does not (necessarily) know the calling service or able to validate if the request has been authenticated – other than validating the Key Management Services request.
- Other items:
- The customer is fully responsible for procuring, hosting and maintaining a very complex setup of (redundant) HSM’s and proxy services. Usually resulting in additional training, possible human/code errors and (un)intentional risks to the keys and service including dependent services.
- A customer is fully responsible for the policies that apply to the HSM. It is possible to unintentionally set a key to be “exportable” therefore lowering the security of that key as the HSM will allow that key to exit the HSM in an unmasked format.
Note: while External is indicated for access to the Key Management Proxy – this external traffic can be hosted on customer/cloud provider controlled networks – such as direct connections and WAN connections – therefore not implicating public Internet accessible.
Azure Managed HSM
Azure Managed HSM is a fully managed, cloud-based service that provides a dedicated and isolated HSM partition for each instance. Azure Managed HSM is based on FIPS 140-2 Level 3 validated hardware and supports a subset of the Azure Key Vault APIs and features. With Azure Managed HSM, customers can store and manage their keys in a highly secure and scalable cloud service, without having to worry about the maintenance and management of the underlying hardware. The architecture of managed HSM highly depends on a relative new technology called confidential computing.
In this architecture all the components required are in the cloud environment. The philosophy is to create such a secure environment for the Managed HSM (Key Management Service) that it matches / exceeds the external HSM architecture. This is handled in a few ways:
- A Trusted Execution Environment (TEE) is created for each instance used by a customer
- The TEE is based on external trust/keys outside of Microsoft control (Intel SGX)
- All secrets used by the service are generated inside the TEE secured instance
- There is no external access to the application execution environments
- No clear-text secrets are to be in active memory on the physical hosts
- No human or system outside of the trusted environment has the HSM credentials
- Access to the service is programmatically limited to the customers Entra ID (Azure Active Directory) object ID
- The security domain (including masking key) can only be requested / downloaded and unencrypted by the customer and is not stored in cloud.
- Private Key material in the HSM is set to non-exportable – unless specifically requested for Secure Key Release (see this link). Regular keys are non-exportable and the HSM will therefore not release the private key material in an unmasked state.
- The HSM has its own audit log that can be pulled by a customer to view any interaction on the HSM partition.
When looking at the key storage component, this is within the datacenter boundary. As indicated before we would need to evaluate the possibility of the 2 possible access methods to key materials.
- Private Key Material Extraction –
- A 3rd party attested HSM running regular firmware is used to provide the private key material protection.
- The credentials to the HSM itself are not humanly readable and are solely stored inside the trusted execution environments of the system.
- The masking key protecting the private key materials is solely with the customer – protected by customer generated and managed keys- and it is therefore still the customers responsibility to safeguard the masking key protecting the private key materials – effectively separating the keys (backups) and the masking key in two different places.
- While indeed Microsoft Azure can take a full backup of the physical HSM (including all partitions), this backup is protected by each individual partitions masking key (that only the customer has access to).
- (un)Intended exposure of the security domain does not provide access to the keys. An attacker would need to gain access to a (key) backup and the security domain to be able to gain access to private key materials.
- (un)Intended exposure of the backup would not allow an attacker to gain access to private key material as they would also need to have the security domain which is not stored in cloud.
- Unauthorized access / usage
- Access to the service is limited to customers Entra ID signed authentication tokens. The service uses 2 ACL lists – Azure RBAC for creating / deleting instances and the HSM RBAC for HSM instructions. While a single Entra ID is used, an Azure subscription owner cannot take forced control of the Managed HSM.
- An emergency HSM Administrator role exists for the “Entra ID Global Admins” group. Regardless of permissions on Azure subscriptions, access to the Managed HSM instance allows an Entra ID Global Admin to take control of the HSM. It will not allow them access to private key materials, but does allow them to change the HSM ACL list. Caution on Global Admins memberships is required.
- The credentials of (each individual) HSM partition are never exposed and can therefore not be misused by unauthorized systems or persons.
- The TLS (https) certificates are generated by and inside the Trusted Execution Environment, making the private key of the service TLS connections solely available to the instance.
- The front-end service runs fully in confidential computing, and external access is impossible by other systems or humans. The instance cannot be “moved” to a compromised host as the TEE parameters will change and access to the “secrets store” that holds the credentials and service private keys is only possible on the original physical CPU.
- Other items:
- The service has built-in redundancies in place. Each created managed HSM service creates 3 (individual) backend instances. The key exchange between the instances is secured by a database encryption key (shared between all instances) and the HSM partitions masking key. Ensuring that private key material can only be exchanged between instances in the same service instance and that private key materials are only accessible when imported in to the HSM partitions belonging to the same service instance.
- Management and operational procedures on the HSM’s are limited to Azure Fabric controllers. Therefore out-of-bound operations that are not built-into the service cannot be called upon. Direct access to the partitions (and therefore key operations or partition wide operations) are limited to each of the instances inside their respective TEE’s. No human or out-of-bound access to the partition is possible.
Conclusion
While two different views exist on the actual hosting of the keys, the trust in the cloud providers should exceed the key storage itself. As the Key Encryption Keys might be stored (somewhere), the data itself is not encrypted with the keys stored in the HSM (wherever it is). Trust in the encryption methods and strengths as well as trust in the firmware of the (storage) service controllers, the code base of the Key Management Service/Managed HSM service is still required.
With an external key store there can indeed be more trust in the HSM itself. There are more controls available for who can access the device and under which terms/conditions. Multi-Factor authentication, multi-eyed principles and even quorum for access can be configured. Managing your own HSM also provides more flexibility in the HSM and key policies, ciphers, curves and therefore ultimately key strengths. But while the trust in the on-premises HSM could be higher, the additional components should not be ignored. The on-premises proxy that translates the cloud initiated calls into HSM specific language requires the same level of trust and there is still the dependency on the Key Management Service in the cloud.
An external key store provides the ability for customers to pull-the-plug by revoking access to specific keys, or even in worst case scenario literally pull the power cable from the HSM or proxy service. This ability allows customers to revoke any access to any data almost immediately. Almost in this case as the DEK (the actual data encryption key) is cached for a limited time on the storage controllers. However, for a customer to decide to pull the plug, they would have to know a direct attack is imminent and that their data is at risk. The ability to pull the plug and render the data inaccessible does not apply when “backdoors” or unauthorized access are already in play. Trust has to be in place on the cloud vendors’ security and procedures in managing the Services and Key Management Service – regardless of where the actual HSM is placed.
The Managed HSM solution draws its security from confidential computing. Indeed, the HSM (partition) is hosted outside of customer reach where it receives its own security world to ensure a unique masking key. However the key protecting mechanisms provided by the HSM are equal to the on-premises HSM. The additional benefit of the service (in terms of security) has to come from the architecture and its use of confidential compute. As credentials for the HSM partition are only accessible to each individual instance of the service – and are protected by Intel SGX, there is no system or human that could gain access to the device itself. In the architecture of Managed HSM, the root of trust is external to the cloud provider. In this case access to the credentials (and other secrets of the service) are bound to the physical silicon of the CPU that was manufactured by Intel. Deleting/deactivating keys up to deleting the entire service would equal a pull-the-plug scenario, and similar to trusting the Key Management Service, equal trust has to be put in place for the cloud providers’ code running in the TEE to actually execute these commands.
More information: