ETL Architecture Overview

Typically an ETL process contains 10 steps. Flow Engine calls Template Engine to generate SQLs based on the ETL templates for each entity during run time and Flow Engine executes those 10 steps one by one.

ETL templates contains most of the common ETL patterns, so you only need to focus on the business logic when developing new content.

In general, ETL templates support the following ETL patterns:

  • Change Data Capture

    ETL Engine captures only change data based on the is_cdc attributes defined in the source metadata. Deleted data can be captured based on the is_bk attributes defined in the source metadata.

  • Slow Changing Dimension

    ETL Engine supports slow changing dimension. It automatically keeps the dimension history if you mark a field as scd2.

  • Late Arriving Dimension

    ETL Engine automatically generates dummy record for late arriving dimension data, and updates the dummy record when data arrives.

  • Surrogate Key Generation

    ETL Engine generates surrogate key automatically. It generates the enterprise key to identify records and the primary key to identify the history of records.

  • Hierarchy Flatten

    Based on the target metadata definition, ETL Engine stores hierarchy relation information in the BRIDGE_HIEARCH tables. Unlimited hierarchy levels are supported.

  • Date Timezone Conversion

    ETL Engine converts date to the data warehouse timezone automatically. You need to specify the PPM database timezone and PPM data warehouse timezone during installation.