Batch Data QA Processes

This article provides a comprehensive data QA checklist to validate the initial data load, ensuring that all critical data is accurate, consistent, and fully prepared for upload to Optimove.
By following these guidelines for batch data, you can ensure high data integrity and minimize unnecessary back-and-forth communication.

Data Types

Language	Recommended language for data schema is English (for file/table/column names).
Date	The format must be: `yyyy-MM-dd`*Make sure not to use any default dates such as `‘1900-01-01’`,instead please use Null (blank in files) as default. \|
Datetime	The format must be: `yyyy-MM-dd hh:mm:ss`(No need for UTC string at the end and `yyyy-MM-ddThh:mm:ssZ` this is not a valid format either).
Decimals	\| Please use a dot (`.`) as the decimal symbol. We display up to 4 decimals in the Optimove UI but can support more decimals on the back end (if needed for accuracy of calculations).
Currency	Add currency as a new column/field, with a data type string (e.g. GBP). Please don't add the currency symbol inside the monetary field. \|
Currency conversion	Your Optimove instance will show monetary values in a single currency. In the case of multiple currencies, please clarify what is the main one and how you wish to have values converted.
Status	Please provide definition for any status column that uses a numeric ID instead of a string.
Missing values	Please keep null/missing values as blank (don't replace them with a string or any default values, keep them blank/NULL)

Alignment between tables

Please provide supporting documentation / ERD explaining your data schema.

Customer ID	When providing data from multiple sources, make sure they all use the same Unique Customer Identifier (we will need to be able to link between them).
Primary Key	Please specify the primary key for each table. The primary key consists of one or more columns whose data contained within are used to uniquely identify each row in the table. It must contain a unique value for each row of data and it cannot contain null values (every row must have a primary key value).
Foreign Keys	Please specify the foreign keys for each table. A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It acts as a cross-reference between tables because it references the primary key of another table, thereby establishing a link between them.
Customers	All customers that exist in all transactional tables must exist in the Customers table. Only customer IDs appearing in the Customers table will be included in Optimove.
Registration Date	Any records in transactional tables must not pre-date the customer registration date in the Customers table. Registration date must not be NULL/blank. Only customer IDs with a valid Registration date will be included in Optimove.

Data Stability

Date range	Verify stability in daily trends and spot if there are any "holes" in the data provided (if this is expected, please share with us in advance).

Files specific

Format	The preferred formats are `CSV`, `JSON` and `PARQUET`. If you are unable to supply in those formats, you can find all the supported file formats here.
File sizes	If possible, large files should be separated into batches of 10-100 MB, for performance purposes.
Compression	Files may be compressed as GZ (ZIP is not supported).
Headers	For structured files, field header names must be present in the first row. This applies even when the file contains no data.
Delimiters	The preferred delimiters are Comma (`,`), Semi-colon (`;`) and Pipe (`
Double quotes	Fields that contain double quotes must be surrounded by double quotes, and the embedded double quotes must each be represented by two consecutive double quotes. That means that double quotes must also be used as an escape character. For example: `John "The Man"` would need to appear in the file as `"John ""The Man"""`.
Naming convention	Files containing historical data should be named using the same table name that appears in Database Schema. Daily data files must be named in the format: `TableName_yyyyMMdd.csv` (e.g.: `Customers_20160120.csv`). When using cloud storage (`AWS S3`, `Google Cloud Storage` or `Azure blob`) Files should be dropped off in different folders with the file name – in this format: `[BucketName]/[TableName]/[Date]/[File].[ext]` For example: `MongoDb/Customers/20201202/Customers_20201202.json` OR `MongoDb/Customers/20201202/Customers.json`
JSON specific	Each JSON object should be in a separate row (with no additional brackets at the beginning or end of the file), using {} as identifiers. For example:`{"Id":"30515240","SubscriptionId":null} {"Id":"30515241","SubscriptionId":null,"Amount":"99.00"}` When a field value is NULL, leave the field empty – do not write “NULL”.

Database and API specific

Last Updated Date	Our daily process is incremental, therefore, all tables must have (and maintain) a last updated date column.

Other

Marketing Opt-in	Please remember to include the marketing opt-in field. Determines whether it is acceptable to send promotional messages to a particular customer.