Customizing ETL Rules

What is ETL Rules

Extract-Transform-Load (ETL) is the process of extracting data from data source, transforming data, and loading data to the target data warehouse. ETL, as the core of Business Intelligence, is a critical step for deploying data warehouse.

ETL contains the following steps:

  1. EXT
  2. SSI
  3. XREF
  4. MSI
  5. XFR
  6. KEY LOOKUP
  7. TARGET
  8. TSNP
  9. HIERARCHY
  10. POST TARGET

For details of these steps, see ETL Steps Introduction.

Why Customize ETL Rules

The PPM database includes SQLs that can process the ETL steps. At the same time, PPM also supports ETL process customization to meet various business needs. Because the business environments vary, you can optimize the ETL performance by customizing ETL rules.

How to Customize ETL Rules

You can customize ETL rules by running the following command:

sh ArtifactRegister.sh --artifactfile <DATA FILE PATH> --streamname <STREAM NAME> --etlstep <ETL STEP> --register CUSTOMIZATION
Supported Parameters

Parameter

Mandatory?

Sample Value

Description

streamname

Yes

PPM_PERSON_STREAM

The name of the stream in which the ETL rules need to be customized; See Entity Attribute Descriptions for details.

etlstep

Yes

EXT|SSI|TSNP|XREF|MSI|

XFR|KEYLOOKUP|TARGET|

HIERARCHY|POSTTARGET

ETL step that needs to be customized

artifactfile

Yes

/temp/XREF.SQL

SQL file that contains customized ETL logic

register

No

CUSTOMIZATION

All non-HP provided SQL commands should be registered as CUSTOMIZATION

help

No

 

Prints put the help message

For detailed explanation, see "Administration Tasks" in the Vertica for PPM Administrator Guide for Content Pack 1.0.

Cusomization Validation

All the ETL rules are stored in the ETL_GENERATED_ARTIFACT table of the management schema. You can query customized ETL rules by running the following SQLs:

SELECT * FROM <MANAGER_SCHEMA>.ETL_GENERATED_ARTIFACT,<METADATA_SCHEMA>.ETL_METADATA where ETL_GENERATED_ARTIFACT.STREAM_ID = ETL_METADATA.OBJECT_ID AND ETL_GENERATED_ARTIFACT.REGISTERED_BY='CUSTOMIZATION' AND ETL_METADATA.OBJECT_NAME='<STREAM_NAME>'
  • <MANAGER_SCHEMA> is the name that you specified for the schema that contains management tables when installing the Vertica for PPM content pack.
  • <METADATA_SCHEMA> is the name that you specified for the schema that contains metadata when installing the Vertica for PPM content pack.
  • <STREAM_NAME> is the stream name that is specified when running ArtifactRegister.sh.

Best Practices

Customizing ETL rules is a very complex process. Follow these steps for customization:

  1. Query the system with the following command to find out the default rule for a specified ETL step under a certain stream:

    SELECT ARTIFACT_CONTENT FROM <MANAGER_SCHEMA>.ETL_GENERATED_ARTIFACT,<METADATA_SCHEMA>.ETL_METADATA where ETL_GENERATED_ARTIFACT.STREAM_ID = ETL_METADATA.OBJECT_ID AND ETL_GENERATED_ARTIFACT.REGISTERED_BY='SYSTEM' AND ETL_METADATA.OBJECT_NAME='<STREAM_NAME>' AND ETL_GENERATED_ARTIFACT.ETL_STEP='<ETL_RULE_STEP_NAME>'
    • <MANAGER_SCHEMA> is the name that you specified for the schema that contains management tables when installing the Vertica for PPM content pack.
    • <METADATA_SCHEMA> is the name that you specified for the schema that contains metadata when installing the Vertica for PPM content pack.
    • <STREAM_NAME> is the name of the stream that needs to be customized.
    • <ETL_RULE_STEP_NAME> is the ETL step that needs to be customized. Possible values are SSI, XREF, MSI, XFR, KEYLOOKUP, TARGET, HIERARCHY, TNSP, and POSTTARGET.
  2. Copy the SQLs from the ARTIFACT_CONTENT field to a TXT file, such as data.txt. Adjust the SQLs and keep variables such as MD_BATCH_ID, MD_PROCESS_ID, and MD_SOURCE_INSTANCE_ID as is.
  3. Run the following command to customize the ETL rule:

    sh ArtifactRegister.sh --artifactfile data.txt --streamname <STREAM_NAME> --etlstep <ETEL_RULE_STEP_NAME> --register CUSTOMIZATION

Note:  

  • Do not change the metadata of OOTB entities.
  • Do not add new fields to the existing staging or target tables.
  • During the system upgrade, the system bypasses all customized ETL artifacts. Sometimes it may cause ETL to break down. If that happens, correct your ETL SQLs and register again.