Data Volume Validation: Ensuring Data Completeness in Optimove

💡Basic Definition
Data Volume Validation is a checkpoint in the ETL process that verifies whether the files sent by the client contain the expected amount of data (in terms of row count). Clients define the criteria for this check during onboarding.


What is Data Volume Validation?

Data Volume Validation ensures that data received by Optimove is reliable and meets the client’s expectations. This validation step occurs at the end of each ETL process and evaluates whether the tables contain an appropriate number of rows for the given day. If inconsistencies are detected, the system acts based on predefined client configurations to notify stakeholders or halt the process.


How Does Data Volume Validation Work?

The procedure reviews data in client-specified tables using configurable parameters. These parameters allow Optimove to adapt the validation process to the client’s unique data patterns and operational requirements. Clients can adjust these settings to account for factors like seasonal fluctuations, daily variability, or specific operational preferences.


Customizable ETL Settings

Here are the configurable parameters for Data Volume Validation:

  1. CompareAvg

    • Determines whether to compare the daily row count to the average.
    • Default: True. If False, the system checks only for data presence (at least 1 row with yesterday’s date).
  2. DaysBackwards

    • Defines the number of days to use for calculating the average row count.
    • Default: 180.
  3. IsFail

    • Specifies whether unreliable row counts should fail the ETL process.
    • Default: True. If False, the process continues, and stakeholders are notified.
  4. MultFactor

    • A threshold (0–1) for acceptable deviations from the average row count.
    • Default: 0.2 (20%). For instance, if the daily row count is X and the average is Y, the process fails if X < 0.2 * Y.
  5. DaysToIgnore

    • Specifies weekdays to exclude from the check (e.g., no data expected on weekends).
    • Default: NULL (no exclusions). Options range from 1 (Sunday) to 7 (Saturday).
  6. IsMedian

    • Allows use of the median instead of the average for validation.
    • Recommended for clients with outliers that skew averages.

Example Use Cases

Client Configuration

  1. Games Table

    • CompareAvg = True and MultFactor = 0.2.
  2. Bonus Table

    • CompareAvg = False. Notifications are enabled, but the process won’t fail if no data is received.
  3. Game_Types Table

    • Checks for at least 1 row if a file is received but excludes checks on Sundays and Saturdays (DaysToIgnore = {1, 7}).

Scenario on 2024-03-03

The client sends data as follows:

  • Games: 50 rows
  • Bonus: 0 rows
  • Game_Types: 0 rows (on Sunday)

Outcome for Each Table:

  1. Games:
    Average row count: 300.
    Threshold: 300 * 0.2 = 60.
    Since 50 < 60, the daily process fails, and the client receives a notification.

  2. Bonus:
    Row count: 0.
    The process proceeds because the client configured the table to notify only.

  3. Game_Types:
    Row count: 0.
    Since it’s Sunday (excluded via DaysToIgnore), the process proceeds and the client receives no notification.


Key Benefits

  • Data Volume Validation ensures reliable data is available for downstream processes.
  • Clients have full control to customize checks to fit their operational patterns.
  • Notifications and fail-safes provide flexibility without compromising data quality.

For assistance in configuring Data Volume Validation, contact your Optimove Customer Success Manager or support team.