Extensible Data Loading Process

Related Tags: data loading etl best practices

Creating an extensible data loading process is sometimes little bit tricky. Imagine the following use case. You are extracting data from Salesforce and loading it to the GoodData Platform. After some time, you decide you want to add one field from a Salesforce entity to your GoodData project. Following this article and learning this process will make your life easier.

See the example data loading process:

Example Process

First of all, we have single metadata definition that describes all fields we are extracting from Salesforce:

Remember that your metadata must be the same. Not even different decimal number formatting.

Metadata

We are going to add the new field to your Opportunity dataset. Normally, you would have to open each transformation component (in our example it is ExtHashJoin - see the transformation function below) and map each field properly to carry the new field through the process. But, this could be much more easier.

function integer transform() {
    $out.0.* = $in.0.*;

    $out.0.salesrep = $in.1.name;
    
    if ($in.0.is_closed) {
        if ($in.0.is_won) {
            $out.0.status = "Won";
        }
        else {
            $out.0.status = "Lost";
        }
    }
    else {
        $out.0.status = "Open";
    }

    return ALL;
}

Do you want to play with this with your data? Download the full example!

The trick here is following row from the transformation configuration:

$0.out.* =  $0.in.*;

Using this approach, each field from the metadata is propagated to the next step automatically.

Adding another custom field means to:

1) Add the field to Metadata (only once!)
2) Add the field to SFDC Opportunity reader SOQL and map it on metadata
3) Add and map the field to the GoodData Writer

Remember that you have to setup your Salesforce Connection and create new GoodData Project and check the GDC_PROJECT_ID Parameter in your workspace.prm file.