ETL Step 2: SSI
SSI is the only step that does not have any ETL templates. Flow Engine executes SQLs defined in the stream JSON file under the transforms attribute.
*_EXT tables are input in this step, while *_SSI tables are output.
You can customize Vertica for PPM reports totally in your way in the SSI step. However, you need to understand the major functions of this step:
-
Handling business logic
This is the only step where the entity-specific logic can be handled. You can input SQLs for handling the business logic, such as calculation and aggregation.
-
Transforming data models from the source model (defined in source entities) to the target model (defined in target models)
The output table of the SSI step (naming convention: *_SSI) should align with the target model in both the field name and data granularity.
-
Cleaning unnecessary prefilled data
To improve the ETL performance, you need to remove any unnecessary prefilled data. You can refer to the SQLs that contain
where tab1.md_pf_flag=’D
ortab2.md_flag=’D’
in the sample ETL entities.For more information about obtaining the sample content packs, see Obtaining Sample Content Packs for Customization.
-
Combining multiple source entities into one target entity
If your target entity needs data from multiple source entities, combine these source entities properly to populate data to the *_SSI tables.
You also need to pay attention to the following during the SSI development:
-
MD_FLAG field
MD_FLAG is a very important field that controls the data loading behaviors in the following ETL steps. You need to mark the record as
NEW
if the record is a newly created andUPD
if the record already exists in the target tables. If you mark the record asDEL
, it means that record needs to be deleted later.You need to consider it thoroughly when you combine multiple source entities. Data from different entities carries different MD_FLAG from the EXT step.
-
Fields need to be populated in the *_SSI tables
- MD_BUSINESS_KEY: This record is used to identify the data granularity. The system generates the enterprise key later based on the value of MD_BUSINESS_KEY. Make sure this value is unique and is not changed during the test load. Records with different MD_BUSINESS_KEY fields are considered as different records. It is suggested that prefix <MD_SOURCE_INSTANCE_ID> is included as part of MD_BUSINESS_KEY.
- MD_BATCH_ID: You can populate this field with the value from the *_EXT table or the value from the <MD_BATCH_ID> runtime variable.
- MD_SOURCE_INSTANCE_ID: You can populate this field with the value from the *_EXT table or the value from the <MD_SOURCE_INSTANCE_ID> runtime variable.
- MD_PROCESS_ID: You can populate this field with the value from the *_EXT table or the value from the <MD_PROCESS_ID> runtime variable.
-
Temporary tables that you create in the SSI step
Consider the Vertica storage strategy to speed up ETL when you run ETL on top of the Vertica database cluster. You can use
UNSEGMENTED ALL NODES
for small tables andSEGMENTED by hash (md_business_key) all nodes
for large tables in most cases. For details about Vertica storage impact on performance, refer to HP Vertica Analytics Platform Version 7.0.x Documentation.
The following table lists the key staging tables.
Table Name |
Table Description |
---|---|
<SOURCE_PRODUCT_NAME>_<SOURCE_ENTITY_NAME>_<MD_SOURCE_INSTANCE_ID>__SSI |
SSI table is an output table of the SSI step. It shall align with the target model on both schema and data granularity |
The SSI table contains the following system reserved fields.
Field Name |
Field Description |
---|---|
<ROLE_ENTITY_NAME>_BUSINESS_KEY |
If you define any entity lookup in the target entity in dimension_associated_dimension, fact_associated_dimension, or fact_associated_fact attibutes, you need to create a field in the SSI table, and put relevant business key values into this field. The name of this field should contain The value stored in this field should be the
same as in MD_BUSINESS_KEY of the
|