Batch Data QA Processes

This article provides a comprehensive data QA checklist to validate the initial data load, ensuring that all critical data is accurate, consistent, and fully prepared for upload to Optimove.
By following these guidelines for batch data, you can ensure high data integrity and minimize unnecessary back-and-forth communication.

Data Types

LanguageRecommended language for data schema is English (for file/table/column names).
DateThe format must be: yyyy-MM-dd*Make sure not to use any default dates such as ‘1900-01-01’,instead please use Null (blank in files) as default. |
DatetimeThe format must be: yyyy-MM-dd hh:mm:ss(No need for UTC string at the end and yyyy-MM-ddThh:mm:ssZ this is not a valid format either).
Decimals| Please use a dot (.) as the decimal symbol. We display up to 4 decimals in the Optimove UI but can support more decimals on the back end (if needed for accuracy of calculations).
CurrencyAdd currency as a new column/field, with a data type string (e.g. GBP). Please don't add the currency symbol inside the monetary field. |
Currency conversionYour Optimove instance will show monetary values in a single currency. In the case of multiple currencies, please clarify what is the main one and how you wish to have values converted.
StatusPlease provide definition for any status column that uses a numeric ID instead of a string.
Missing valuesPlease keep null/missing values as blank (don't replace them with a string or any default values, keep them blank/NULL)

Alignment between tables

Please provide supporting documentation / ERD explaining your data schema.

Customer IDWhen providing data from multiple sources, make sure they all use the same Unique Customer Identifier (we will need to be able to link between them).
Primary KeyPlease specify the primary key for each table. The primary key consists of one or more columns whose data contained within are used to uniquely identify each row in the table. It must contain a unique value for each row of data and it cannot contain null values (every row must have a primary key value).
Foreign KeysPlease specify the foreign keys for each table. A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It acts as a cross-reference between tables because it references the primary key of another table, thereby establishing a link between them.
CustomersAll customers that exist in all transactional tables must exist in the Customers table. Only customer IDs appearing in the Customers table will be included in Optimove.
Registration DateAny records in transactional tables must not pre-date the customer registration date in the Customers table. Registration date must not be NULL/blank. Only customer IDs with a valid Registration date will be included in Optimove.

Data Stability

Date rangeVerify stability in daily trends and spot if there are any "holes" in the data provided (if this is expected, please share with us in advance).

Files specific

FormatThe preferred formats are CSV, JSON and PARQUET. If you are unable to supply in those formats, you can find all the supported file formats here.
File sizesIf possible, large files should be separated into batches of 10-100 MB, for performance purposes.
CompressionFiles may be compressed as GZ (ZIP is not supported).
HeadersFor structured files, field header names must be present in the first row. This applies even when the file contains no data.
DelimitersThe preferred delimiters are Comma (,), Semi-colon (;) and Pipe (`
Double quotesFields that contain double quotes must be surrounded by double quotes, and the embedded double quotes must each be represented by two consecutive double quotes. That means that double quotes must also be used as an escape character. For example: John "The Man" would need to appear in the file as "John ""The Man""".
Naming conventionFiles containing historical data should be named using the same table name that appears in Database Schema. Daily data files must be named in the format: TableName_yyyyMMdd.csv (e.g.: Customers_20160120.csv).
When using cloud storage (AWS S3, Google Cloud Storage or Azure blob) Files should be dropped off in different folders with the file name – in this format: [BucketName]/[TableName]/[Date]/[File].[ext]
For example:
MongoDb/Customers/20201202/Customers_20201202.json
OR
MongoDb/Customers/20201202/Customers.json
JSON specificEach JSON object should be in a separate row (with no additional brackets at the beginning or end of the file), using {} as identifiers. For example:{"Id":"30515240","SubscriptionId":null} {"Id":"30515241","SubscriptionId":null,"Amount":"99.00"}
When a field value is NULL, leave the field empty – do not write “NULL”.

Database and API specific

Last Updated DateOur daily process is incremental, therefore, all tables must have (and maintain) a last updated date column.

Other

Marketing Opt-inPlease remember to include the marketing opt-in field. Determines whether it is acceptable to send promotional messages to a particular customer.