Guide for Iceberg Integration (Snowflake on AWS)

This guide outlines the steps required to configure your Snowflake environment on Amazon Web Services (AWS) to share data with Optimove using Iceberg tables. This modern data sharing architecture allows Optimove to securely and efficiently access required data directly from your cloud storage, giving you full control and visibility.

Prerequisites

PrerequisiteDescription
S3 BucketAn Amazon S3 bucket to store Iceberg metadata and data files.
Snowflake External VolumeA Snowflake object that holds the AWS IAM role ARN used to access the S3 bucket.
Optional: Open Catalog (Polaris)A remote REST catalog to register and discover Iceberg tables.
Snowflake Catalog IntegrationAn object in your Snowflake account that instructs Snowflake on how to communicate with the remote catalog.

Client-Side Setup Steps

1.1 Create or Choose an S3 Bucket

First, please provision an Amazon S3 bucket (e.g., s3://client‑lake‑prod/data/). For data recovery and security best practices, we recommend enabling bucket versioning and default encryption.

1.2 Create a Writable External Volume

In Snowflake, create an external volume that uses an IAM role to access your S3 bucket.

CREATE EXTERNAL VOLUME client_ext_vol
  STORAGE_LOCATIONS = ((
    NAME = 'primary'
    STORAGE_PROVIDER      = 'S3'
    STORAGE_BASE_URL      = 's3://<bucket>/data/'
    STORAGE_AWS_ROLE_ARN  = 'arn:aws:iam::<acct_id>:role/IcebergVolumeWriter'
  ))
  ALLOW_WRITES = TRUE;

GRANT USAGE ON EXTERNAL VOLUME client_ext_vol TO ROLE <writer_role>;

Immediately after creation, run DESC EXTERNAL VOLUME client_ext_vol;. Copy the STORAGE_AWS_IAM_USER_ARN and the STORAGE_AWS_EXTERNAL_ID values, as you will need them for the IAM role's trust policy in the next step.

1.3 Configure the IAM Role for Bucket Access

In the AWS Console, create or update the IAM role you specified in the STORAGE_AWS_ROLE_ARN. This role must have:

  1. A permissions policy that allows writing to the S3 bucket.
  2. A trust policy that allows the Snowflake user (STORAGE_AWS_IAM_USER_ARN) to assume the role, using the STORAGE_AWS_EXTERNAL_ID to ensure secure access.

Example Permissions Policy:

{
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::<bucket>",
        "arn:aws:s3:::<bucket>/*"
      ]
    }
  ]
}

Example Trust Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "AWS": "<YOUR_STORAGE_AWS_IAM_USER_ARN>" },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": { "sts:ExternalId": "<YOUR_STORAGE_AWS_EXTERNAL_ID>" }
      }
    }
  ]
}

1.4 (Optional) Create an Open Catalog

If you use a REST-based open catalog like Polaris, you can integrate it with Snowflake for easier discovery and management of your Iceberg tables.

1.5 Create a Catalog Integration

Create a catalog integration to connect Snowflake to your Iceberg catalog.

CREATE OR REPLACE CATALOG INTEGRATION client_catalog_int
  CATALOG_SOURCE = POLARIS
  TABLE_FORMAT   = ICEBERG
  REST_CONFIG = (
    CATALOG_URI  = 'https://<polaris-endpoint>'
    CATALOG_NAME = '<catalog-name>'
  )
  REST_AUTHENTICATION = (
    TYPE                   = OAUTH
    OAUTH_CLIENT_ID        = '<client-id>'
    OAUTH_CLIENT_SECRET    = '<client-secret>'
    OAUTH_ALLOWED_SCOPES   = ('PRINCIPAL_ROLE:ALL')
  )
  ENABLED = TRUE;

GRANT USAGE ON INTEGRATION client_catalog_int TO ROLE <writer_role>;

1.6 (Optional) Create a Catalog-Linked Database

You can link the catalog integration to a database to enable automatic discovery of new tables.

CREATE DATABASE client_lake
  CATALOG         = 'client_catalog_int'
  EXTERNAL_VOLUME = 'client_ext_vol';

1.7 Create Iceberg Tables

You can now create your Iceberg tables. If you did not link a database in the previous step, you must specify the EXTERNAL_VOLUME and CATALOG parameters in your CREATE TABLE statement.

Partitioning Recommendation: It is strongly recommended to define a PARTITION BY clause when creating Iceberg tables for efficient data processing. Please coordinate with the Optimove onboarding team to determine the best partitioning strategy for your data.

CREATE ICEBERG TABLE sample_table (
  id          INT,
  name        STRING,
  created_at  DATE
)
PARTITION BY (id, created_at)
EXTERNAL_VOLUME = 'client_ext_vol'
CATALOG         = 'client_catalog_int';

1.8 Share the Following with Optimove

To finalize the integration, please provide the Optimove team with the following details:

ItemExample Value
S3 paths3://client‑lake‑prod/data/
External volume nameclient_ext_vol
Catalog integration nameclient_catalog_int
Table listclient_db.sample_schema.sample_table
Polaris catalog namespaceclient_catalog
IAM role ARNarn:aws:iam::<acct_id>:role/IcebergVolumeWriter
External IDThe value from your DESC EXTERNAL VOLUME command

Reference Documentation

For more detailed technical information, you can refer to the official Snowflake documentation: