ETL Step 2: SSI

SSI is the only step that does not have any ETL templates. Flow Engine executes SQLs defined in the stream JSON file under the transforms attribute.

*_EXT tables are input in this step, while *_SSI tables are output.

You can customize Vertica for PPM reports totally in your way in the SSI step. However, you need to understand the major functions of this step:

  • Handling business logic

    This is the only step where the entity-specific logic can be handled. You can input SQLs for handling the business logic, such as calculation and aggregation.

  • Transforming data models from the source model (defined in source entities) to the target model (defined in target models)

    The output table of the SSI step (naming convention: *_SSI) should align with the target model in both the field name and data granularity.

  • Cleaning unnecessary prefilled data

    To improve the ETL performance, you need to remove any unnecessary prefilled data. You can refer to the SQLs that contain where tab1.md_pf_flag=’D or tab2.md_flag=’D’ in the sample ETL entities.

    For more information about obtaining the sample content packs, see Obtaining Sample Content Packs for Customization.

  • Combining multiple source entities into one target entity

    If your target entity needs data from multiple source entities, combine these source entities properly to populate data to the *_SSI tables.

You also need to pay attention to the following during the SSI development:

  • MD_FLAG field

    MD_FLAG is a very important field that controls the data loading behaviors in the following ETL steps. You need to mark the record as NEW if the record is a newly created and UPD if the record already exists in the target tables. If you mark the record as DEL, it means that record needs to be deleted later.

    You need to consider it thoroughly when you combine multiple source entities. Data from different entities carries different MD_FLAG from the EXT step.

  • Fields need to be populated in the *_SSI tables

    • MD_BUSINESS_KEY: This record is used to identify the data granularity. The system generates the enterprise key later based on the value of MD_BUSINESS_KEY. Make sure this value is unique and is not changed during the test load. Records with different MD_BUSINESS_KEY fields are considered as different records. It is suggested that prefix <MD_SOURCE_INSTANCE_ID> is included as part of MD_BUSINESS_KEY.
    • MD_BATCH_ID: You can populate this field with the value from the *_EXT table or the value from the <MD_BATCH_ID> runtime variable.
    • MD_SOURCE_INSTANCE_ID: You can populate this field with the value from the *_EXT table or the value from the <MD_SOURCE_INSTANCE_ID> runtime variable.
    • MD_PROCESS_ID: You can populate this field with the value from the *_EXT table or the value from the <MD_PROCESS_ID> runtime variable.
  • Temporary tables that you create in the SSI step

    Consider the Vertica storage strategy to speed up ETL when you run ETL on top of the Vertica database cluster. You can use UNSEGMENTED ALL NODES for small tables and SEGMENTED by hash (md_business_key) all nodes for large tables in most cases. For details about Vertica storage impact on performance, refer to HP Vertica Analytics Platform Version 7.0.x Documentation.

The following table lists the key staging tables.

Key Staging Tables

Table Name

Table Description

<SOURCE_PRODUCT_NAME>_<SOURCE_ENTITY_NAME>_<MD_SOURCE_INSTANCE_ID>__SSI

SSI table is an output table of the SSI step. It shall align with the target model on both schema and data granularity

The SSI table contains the following system reserved fields.

System Reserved Fields in Staging Tables

Field Name

Field Description

<ROLE_ENTITY_NAME>_BUSINESS_KEY

If you define any entity lookup in the target entity in dimension_associated_dimension, fact_associated_dimension, or fact_associated_fact attibutes, you need to create a field in the SSI table, and put relevant business key values into this field.

The name of this field should contain <ROLE_ENTITY_NAME> defined in the target entity as the prefix .

The value stored in this field should be the same as in MD_BUSINESS_KEY of the lookup_entity_name entity. Otherwise ETL Engine is not able to build reference between entities in the following steps.