Guide for Iceberg Integration (Snowflake on AWS)
This guide outlines the steps required to configure your Snowflake environment on Amazon Web Services (AWS) to share data with Optimove using Iceberg tables. This modern data sharing architecture allows Optimove to securely and efficiently access required data directly from your cloud storage, giving you full control and visibility.
Prerequisites
| Prerequisite | Description |
|---|---|
| S3 Bucket | An Amazon S3 bucket to store Iceberg metadata and data files. |
| Snowflake External Volume | A Snowflake object that holds the AWS IAM role ARN used to access the S3 bucket. |
| Optional: Open Catalog (Polaris) | A remote REST catalog to register and discover Iceberg tables. |
| Snowflake Catalog Integration | An object in your Snowflake account that instructs Snowflake on how to communicate with the remote catalog. |
Client-Side Setup Steps
1.1 Create or Choose an S3 Bucket
First, please provision an Amazon S3 bucket (e.g., s3://client‑lake‑prod/data/). For data recovery and security best practices, we recommend enabling bucket versioning and default encryption.
1.2 Create a Writable External Volume
In Snowflake, create an external volume that uses an IAM role to access your S3 bucket.
CREATE EXTERNAL VOLUME client_ext_vol
STORAGE_LOCATIONS = ((
NAME = 'primary'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://<bucket>/data/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<acct_id>:role/IcebergVolumeWriter'
))
ALLOW_WRITES = TRUE;
GRANT USAGE ON EXTERNAL VOLUME client_ext_vol TO ROLE <writer_role>;Immediately after creation, run DESC EXTERNAL VOLUME client_ext_vol;. Copy the STORAGE_AWS_IAM_USER_ARN and the STORAGE_AWS_EXTERNAL_ID values, as you will need them for the IAM role's trust policy in the next step.
1.3 Configure the IAM Role for Bucket Access
In the AWS Console, create or update the IAM role you specified in the STORAGE_AWS_ROLE_ARN. This role must have:
- A permissions policy that allows writing to the S3 bucket.
- A trust policy that allows the Snowflake user (
STORAGE_AWS_IAM_USER_ARN) to assume the role, using theSTORAGE_AWS_EXTERNAL_IDto ensure secure access.
Example Permissions Policy:
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<bucket>",
"arn:aws:s3:::<bucket>/*"
]
}
]
}Example Trust Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "AWS": "<YOUR_STORAGE_AWS_IAM_USER_ARN>" },
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": { "sts:ExternalId": "<YOUR_STORAGE_AWS_EXTERNAL_ID>" }
}
}
]
}1.4 (Optional) Create an Open Catalog
If you use a REST-based open catalog like Polaris, you can integrate it with Snowflake for easier discovery and management of your Iceberg tables.
1.5 Create a Catalog Integration
Create a catalog integration to connect Snowflake to your Iceberg catalog.
CREATE OR REPLACE CATALOG INTEGRATION client_catalog_int
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://<polaris-endpoint>'
CATALOG_NAME = '<catalog-name>'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<client-id>'
OAUTH_CLIENT_SECRET = '<client-secret>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;
GRANT USAGE ON INTEGRATION client_catalog_int TO ROLE <writer_role>;1.6 (Optional) Create a Catalog-Linked Database
You can link the catalog integration to a database to enable automatic discovery of new tables.
CREATE DATABASE client_lake
CATALOG = 'client_catalog_int'
EXTERNAL_VOLUME = 'client_ext_vol';1.7 Create Iceberg Tables
You can now create your Iceberg tables. If you did not link a database in the previous step, you must specify the EXTERNAL_VOLUME and CATALOG parameters in your CREATE TABLE statement.
Partitioning Recommendation: It is strongly recommended to define a PARTITION BY clause when creating Iceberg tables for efficient data processing. Please coordinate with the Optimove onboarding team to determine the best partitioning strategy for your data.
CREATE ICEBERG TABLE sample_table (
id INT,
name STRING,
created_at DATE
)
PARTITION BY (id, created_at)
EXTERNAL_VOLUME = 'client_ext_vol'
CATALOG = 'client_catalog_int';1.8 Share the Following with Optimove
To finalize the integration, please provide the Optimove team with the following details:
| Item | Example Value |
|---|---|
| S3 path | s3://client‑lake‑prod/data/ |
| External volume name | client_ext_vol |
| Catalog integration name | client_catalog_int |
| Table list | client_db.sample_schema.sample_table |
| Polaris catalog namespace | client_catalog |
| IAM role ARN | arn:aws:iam::<acct_id>:role/IcebergVolumeWriter |
| External ID | The value from your DESC EXTERNAL VOLUME command |
Reference Documentation
For more detailed technical information, you can refer to the official Snowflake documentation:
Updated about 8 hours ago
