Parquet modular encryption can work with arbitrary Key Management Service (KMS) servers. A custom KMS client class, able to communicate with the chosen KMS server, has to be provided to the Analytics Engine powered by Apache Spark instance. This
class needs to implement the KmsClient interface (part of the Parquet modular encryption API). Analytics Engine powered by Apache Spark includes the VaultClient KmsClient, that can be used out of the box if you use Hashicorp Vault as the KMS
server for the master keys. If you use or plan to use a different KMS system, you can develop a custom KmsClient class (taking the VaultClient code as an example).
Custom KmsClient class
Copy link to section
Parquet modular encryption provides a simple interface called org.apache.parquet.crypto.keytools.KmsClient with the following two main functions that you must implement:
// Wraps a key - encrypts it with the master key, encodes the result and
// potentially adds KMS-specific metadata.
public String wrapKey(byte[] keyBytes, String masterKeyIdentifier)
// Decrypts (unwraps) a key with the master key.
public byte[] unwrapKey(String wrappedKey, String masterKeyIdentifier)
Copy to clipboardCopied to clipboard
In addition, the interface provides the following initialization function that passes KMS parameters and other configuration:
public void initialize(Configuration configuration, String kmsInstanceID, String kmsInstanceURL, String accessToken)
After you have developed the custom KmsClient class, add it to a jar supplied to Analytics Engine powered by Apache Spark, and pass its full name in the Spark Hadoop configuration, for example:
Set "parquet.encryption.key.access.token" to a valid access token with the access policy attached, which provides access rights to the required keys in your Vault instance:
val dataFrame = spark.read.parquet("<path to encrypted files>")
Copy to clipboardCopied to clipboard
Key rotation
Copy link to section
If key rotation is required, an administrator with access rights to the KMS key rotation actions must rotate master keys in Hashicorp Vault using the procedure described in the Hashicorp Vault documentation. Thereafter the administrator can
trigger Parquet key rotation by calling:
public static void KeyToolkit.rotateMasterKeys(String folderPath, Configuration hadoopConfig)
Copy to clipboardCopied to clipboard
To enable Parquet key rotation, the following Hadoop configuration properties must be set:
The parameters "parquet.encryption.key.access.token" and "parquet.encryption.kms.instance.url" must set set, and optionally "parquet.encryption.kms.instance.id"
The parameter "parquet.encryption.key.material.store.internally" must be set to "false".
The parameter "parquet.encryption.kms.client.class" must be set to "com.ibm.parquet.key.management.VaultClient"