Powered By GoodData - Execute ETL Provisioning

Related Tags: quick start powered by etl provisioning

NOTE: This article is part of the Powered By GoodData Tutorial. Some aspects may not apply to other implementation scenarios.

Execute Provisioning Scripts

After you have created through project provisioning the projects in your Powered By GoodData solution, you may execute the script to provision your ETL graphs to each project.

You should retrieve the process-id value of each provisioned ETL graph for each project (project-id) to which it is provisioned.

NOTE: For customer-specific ETL processes, please verify that the project has been properly parameterized, so that the customer’s data can be properly loaded.

Setup Process Notifications

Before you begin executing processes on your customer projects, you may wish to use the GoodData APIs to set up notifications of process activities. For example, over the first few iterations of a process, it is a good idea to track at least the following events related to the process:

  • Start of process
  • Success of process
  • Failure of process

These email updates will inform you of the progress of your ETL processes.

NOTE: Over time, you may selectively choose to delete some of these notifications when they stabilize.

For more information on setting up notifications via API, see notifications API doc.

NOTE: As an alternative to using the APIs, you may use the Data Integration Console to set up notifications. For more information, see Introduction to Data Integration Console.

Execute First-Time Data Loads for Production Projects

When the ETL processes have been deployed to each project, you may begin executing the first-time data loads for each project.

NOTE: Due to per-customer threading, you should have already verified with GoodData your schedule for execution these first-time loads.

NOTE: As an alternative to using the APIs to drive one-time data loads, you may use the Data Integration Console to execute these processes on on-demand or scheduled basis. However, the console does not support batch management of processes. For more information, see Introduction to Data Integration Console.

Use the following API call to execute a specific ETL process ({process-id}) in a specific project ({project-id}):

Type POST
URI /gdc/project/{project-id}/dataload/process/{process-id}/executions

The returned JSON response may look like the following:

201 (Created)
    Content-Type: application/json
    
{
    "executionTask":{
        "link":{
            "poll":"/gdc/projects/{project-id}/dataload/processes/{process-id}/executions/{execution-id}"
        }
    }
}

Review process status

To review the status of the ETL process, you use the following API end-point:

Type GET
URI /gdc/projects/{project-id}/dataload/processes/{process-id}/executions/{execution-id}/detail

The returned JSON contains the status of the execution ({execution-id}) of the process ({process-id}).

Verify Initial Data Loads

If possible, you should verify that the initial data loads have been completed into each project. Depending on the structure of your projects, this verification may require varying tests.

  • If there is consistent data across all projects, you can verify a single report or data point in each of the projects.
  • If there is customer-specific data in each project, you should verify that the customer’s data is present and that no other customer’s data is present.