Data Unloading to GCS

1. Overview

This guide outlines the supported methods for exporting (unloading) data from your Reader Account to Google Cloud Storage (GCS).

Due to Snowflake’s security model for Reader Accounts, standard direct-to-cloud unloading (using STORAGE INTEGRATION or COPY INTO 'gcs://...') is restricted. This document provides two alternative workflows to achieve your data export goals.

2. Critical Limitations

Before attempting an export, please note the following restrictions specific to your Reader Account environment:

No Storage Integrations: Reader Accounts cannot create STORAGE INTEGRATION objects.
No External Stages: You cannot create new Stage objects (CREATE STAGE).
No Direct GCS Unload: The command COPY INTO 'gcs://bucket/...' is not supported because it requires a Storage Integration.

3. Method A: The Manual Export (Recommended for Ad-hoc Files)

Use this method if you need to manually extract a dataset to your local machine and then upload it to GCS. This approach uses your personal User Stage (@~), which is built-in and available immediately.

Step 1: Unload Data to Your Internal User Stage

Run the following SQL in your Snowflake Worksheet. This saves the query results to your internal storage area.

COPY INTO @~/exports/my_data_
FROM (SELECT * FROM target_table)
FILE_FORMAT = (TYPE = CSV, COMPRESSION = GZIP, FIELD_OPTIONALLY_ENCLOSED_BY = '"')
HEADER = TRUE;

Step 2: Download Files to Local Machine

You must use the SnowSQL command-line tool (CLI) to download files from Snowflake.

Note: This cannot be done from the Web UI.

Run the following command in your terminal/command prompt:

# Syntax: snowsql -a <account_identifier> -u <username> -q "GET @~/source_path file://destination_path"

snowsql -a <your_account_locator> -u <your_username> -q "GET @~/exports/ file:///tmp/my_local_data/"

Step 3: Upload to GCS

Once the files are on your local machine, use the standard Google Cloud tools (gsutil or Google Cloud Console) to upload them to your target bucket.

gsutil cp /tmp/my_local_data/* gs://your-target-bucket/data/

4. Method B: The Automated Export (Recommended for Pipelines)

If you require a recurring, automated feed to GCS, you must use a "client-side" approach. Instead of asking Snowflake to push the data, your application will pull the data and write it to GCS directly.

Workflow Logic

Connect: Use a Snowflake Connector (e.g., Python, JDBC, ODBC, Node.js) to connect to the Reader Account.
Query: Execute your SELECT statement.
Stream: Fetch the results in your application code.
Write: Use the Google Cloud Storage SDK to write the results directly to your bucket.

Example: Python Script Logic

This method bypasses the COPY INTO command entirely, avoiding all Reader Account restrictions.

import snowflake.connector
from google.cloud import storage

# 1. Connect to Snowflake Reader Account
ctx = snowflake.connector.connect(
    user='<user>',
    password='<password>',
    account='<reader_account_locator>'
)

# 2. Fetch Data
cs = ctx.cursor()
cs.execute("SELECT * FROM my_table")
result_batches = cs.fetch_pandas_all() # Efficiently fetch data

# 3. Write to GCS using Google's Library
client = storage.Client()
bucket = client.get_bucket('my-gcs-bucket')
bucket.blob('export.csv').upload_from_string(result_batches.to_csv(index=False))

5. Summary of Capabilities

Task	Supported in Reader Account?
Create Storage Integration	❌ No (Restricted by Snowflake)
Direct Unload to GCS (`COPY INTO 'gcs://'`)	❌ No (Requires Integration)
Unload to Internal Stage (`COPY INTO @~`)	✅ Yes (Recommended)
Client-Side Extraction (Python/ODBC)	✅ Yes (Recommended for Automation)

6. Python Template

Here is a complete, "plug-and-play" Python script template.

It handles connecting to the Reader Account, running the query, and streaming the results to their Google Cloud Storage bucket.

Prerequisites

Python Library: You will need to install these Python libraries:

pip install snowflake-connector-python[pandas] google-cloud-storage pandas

Google Cloud Platform (GCP) Setup This is the most critical part. Because the script uses "Service Account" authentication (which is best practice for automated scripts), you need:
1. A Service Account: Create a Service Account in your Google Cloud Console (IAM & Admin > Service Accounts).
2. A JSON Key File: Generate and download a JSON key for that Service Account. Save this file on the machine where you will run the script (e.g., /path/to/keyfile.json).
3. Bucket Permissions: Grant that Service Account the "Storage Object Admin" (or "Storage Object Creator") role on your target GCS bucket so it has permission to write files.
Snowflake Connection Details You need the specific login details for the Reader Account:
1. Account URL/Locator: e.g., xy12345.us-east-1.
2. User & Password: A user in the Reader Account that has USAGE rights on the warehouse and SELECT rights on the table.
3. Warehouse: A running Virtual Warehouse in the Reader Account to execute the query.
File Paths Ensure the GCS_CREDENTIALS_JSON variable in the script points to the exact local path where you saved your Google JSON key file.

Python Export Scripts

import snowflake.connector
from google.cloud import storage
import pandas as pd
import io
import os

# ==========================================
# 1. CONFIGURATION
# ==========================================

# Snowflake Reader Account Details
SNOW_ACCOUNT  = 'xy12345.region'  # e.g., 'xy12345.us-east-1'
SNOW_USER     = 'reader_user'
SNOW_PASSWORD = 'reader_password'
SNOW_WAREHOUSE= 'COMPUTE_WH'
SNOW_DATABASE = 'SHARED_DB'
SNOW_SCHEMA   = 'PUBLIC'

# Query to Run
QUERY = "SELECT * FROM MY_TABLE LIMIT 1000"

# Google Cloud Storage Details
GCS_BUCKET_NAME = 'my-target-bucket'
GCS_FILE_NAME   = 'exports/my_data_export.csv'
# Path to your Google Service Account JSON key file
GCS_CREDENTIALS_JSON = '/path/to/my-gcp-service-account.json'

# ==========================================
# 2. EXECUTION
# ==========================================

def run_export():
    print("Connecting to Snowflake...")
    try:
        # Connect to Snowflake
        ctx = snowflake.connector.connect(
            user=SNOW_USER,
            password=SNOW_PASSWORD,
            account=SNOW_ACCOUNT,
            warehouse=SNOW_WAREHOUSE,
            database=SNOW_DATABASE,
            schema=SNOW_SCHEMA
        )
        cs = ctx.cursor()

        print("Executing Query...")
        cs.execute(QUERY)

        # Fetch data into a Pandas DataFrame
        # (This is efficient for datasets that fit in memory)
        print("Fetching results...")
        df = cs.fetch_pandas_all()
        
        row_count = len(df)
        print(f"Retrieved {row_count} rows.")

        if row_count > 0:
            print("Converting to CSV...")
            # Convert DataFrame to CSV string in memory
            csv_buffer = io.StringIO()
            df.to_csv(csv_buffer, index=False)
            csv_content = csv_buffer.getvalue()

            print("Uploading to Google Cloud Storage...")
            # Authenticate with GCS
            # Note: If running in an environment where GCS auth is pre-configured 
            # (like Cloud Run or a VM), you can remove the ".from_service_account_json" part.
            storage_client = storage.Client.from_service_account_json(GCS_CREDENTIALS_JSON)
            
            bucket = storage_client.bucket(GCS_BUCKET_NAME)
            blob = bucket.blob(GCS_FILE_NAME)
            
            # Upload
            blob.upload_from_string(csv_content, content_type='text/csv')
            print(f"Success! Data uploaded to: gs://{GCS_BUCKET_NAME}/{GCS_FILE_NAME}")
        else:
            print("Query returned no data. Nothing to upload.")

    except Exception as e:
        print(f"Error: {e}")
    finally:
        # Ensure connection is closed
        if 'ctx' in locals():
            ctx.close()

if __name__ == "__main__":
    run_export()

How it works

Snowflake Connection: It logs into the Reader Account using the provided credentials.
Data Fetch: It uses fetch_pandas_all(), which is optimized for speed using the Arrow format, to pull the query results into memory.
GCS Upload: It uses the official Google Cloud library to authenticate (using a Service Account key file) and writes the CSV data directly to the bucket.