Batch Data QA Processes
This article provides a comprehensive data QA checklist to validate the initial data load, ensuring that all critical data is accurate, consistent, and fully prepared for upload to Optimove.
By following these guidelines for batch data, you can ensure high data integrity and minimize unnecessary back-and-forth communication.
Data Types
Language | Recommended language for data schema is English (for file/table/column names). |
---|---|
Date | The format must be: yyyy-MM-dd *Make sure not to use any default dates such as ‘1900-01-01’ ,instead please use Null (blank in files) as default. | |
Datetime | The format must be: yyyy-MM-dd hh:mm:ss (No need for UTC string at the end and yyyy-MM-ddThh:mm:ssZ this is not a valid format either). |
Decimals | | Please use a dot (. ) as the decimal symbol. We display up to 4 decimals in the Optimove UI but can support more decimals on the back end (if needed for accuracy of calculations). |
Currency | Add currency as a new column/field, with a data type string (e.g. GBP). Please don't add the currency symbol inside the monetary field. | |
Currency conversion | Your Optimove instance will show monetary values in a single currency. In the case of multiple currencies, please clarify what is the main one and how you wish to have values converted. |
Status | Please provide definition for any status column that uses a numeric ID instead of a string. |
Missing values | Please keep null/missing values as blank (don't replace them with a string or any default values, keep them blank/NULL) |
Alignment between tables
Please provide supporting documentation / ERD explaining your data schema.
Customer ID | When providing data from multiple sources, make sure they all use the same Unique Customer Identifier (we will need to be able to link between them). |
---|---|
Primary Key | Please specify the primary key for each table. The primary key consists of one or more columns whose data contained within are used to uniquely identify each row in the table. It must contain a unique value for each row of data and it cannot contain null values (every row must have a primary key value). |
Foreign Keys | Please specify the foreign keys for each table. A foreign key is a column or group of columns in a relational database table that provides a link between data in two tables. It acts as a cross-reference between tables because it references the primary key of another table, thereby establishing a link between them. |
Customers | All customers that exist in all transactional tables must exist in the Customers table. Only customer IDs appearing in the Customers table will be included in Optimove. |
Registration Date | Any records in transactional tables must not pre-date the customer registration date in the Customers table. Registration date must not be NULL/blank. Only customer IDs with a valid Registration date will be included in Optimove. |
Data Stability
Date range | Verify stability in daily trends and spot if there are any "holes" in the data provided (if this is expected, please share with us in advance). |
---|
Files specific
Format | The preferred formats are CSV , JSON and PARQUET . If you are unable to supply in those formats, you can find all the supported file formats here. |
---|---|
File sizes | If possible, large files should be separated into batches of 10-100 MB, for performance purposes. |
Compression | Files may be compressed as GZ (ZIP is not supported). |
Headers | For structured files, field header names must be present in the first row. This applies even when the file contains no data. |
Delimiters | The preferred delimiters are Comma (, ), Semi-colon (; ) and Pipe (` |
Double quotes | Fields that contain double quotes must be surrounded by double quotes, and the embedded double quotes must each be represented by two consecutive double quotes. That means that double quotes must also be used as an escape character. For example: John "The Man" would need to appear in the file as "John ""The Man""" . |
Naming convention | Files containing historical data should be named using the same table name that appears in Database Schema. Daily data files must be named in the format: TableName_yyyyMMdd.csv (e.g.: Customers_20160120.csv ).When using cloud storage ( AWS S3 , Google Cloud Storage or Azure blob ) Files should be dropped off in different folders with the file name – in this format: [BucketName]/[TableName]/[Date]/[File].[ext] For example: MongoDb/Customers/20201202/Customers_20201202.json OR MongoDb/Customers/20201202/Customers.json |
JSON specific | Each JSON object should be in a separate row (with no additional brackets at the beginning or end of the file), using {} as identifiers. For example:{"Id":"30515240","SubscriptionId":null} {"Id":"30515241","SubscriptionId":null,"Amount":"99.00"} When a field value is NULL, leave the field empty – do not write “NULL”. |
Database and API specific
Last Updated Date | Our daily process is incremental, therefore, all tables must have (and maintain) a last updated date column. |
---|
Other
Marketing Opt-in | Please remember to include the marketing opt-in field. Determines whether it is acceptable to send promotional messages to a particular customer. |
---|
Updated 3 days ago