Combining Google Analytics and Twitter Search Projects

Related Tags: cloudconnect google twitter

In this tutorial, you learn how to combine two CloudConnect projects into a single project, for which you may find multiple use cases.

For example, during implementation of a project, you may decide that it is easier to break the project into separate pieces, each of which is developed by a separate individual. Or, as in this case of this tutorial, you may wish to bring together separate projects so that you can build metrics and reports that realize the potential of business intelligence solutions.

For this tutorial, you will combine two of the CloudConnect example projects: the Google Analytics demo project and the Twitter Search project. When these two projects are integrated into the same CloudConnect project, their data models are joined together using a single Date dimension, which enables reporting across the datasets.

Pre-Requisites

Before you begin, please verify that you have access to or have completed the following:

  • CloudConnect Designer. For more information, see Download CloudConnect Designer.
  • GoodData account. For more information, please contact GoodData Customer Support.
  • Google Analytics account & website with some data.
  • Completed tutorial to create and publish the Google Analytics demo project. For more information, see Analyzing Website Traffic Using Google Analytics.
  • Twitter Developer App: The provided CloudConnect project is preconfigured to use a GoodData developer application. This tutorial is designed to enable you to gather your own Twitter data, for which you should use your own developer application for authentication. Further instructions are listed in the following section.
  • Completed tutorial to create and publish the Twitter Search demo project. For more information, see Analyzing Twitter.

Project Overview

The basic approach is to integrate the content from one project into the other and then to connect the pieces together.

To connect the logical data models together, you must bring the two LDMs into the same file and then physically connect them together through a shared attribute or dimension.

For transactional data such as the contents of these two projects, the easiest point of connection is to share a Date dimension. For other systems, you may be able to share some form of unique transactional identifier, such as an Order Number. In some cases, you may be required to create a new identifier and to populate it with data. Hopefully, that data can be managed through the host systems.

After the logical data models have been connected, you must refine the ETL graphs to include the connecting object. Generally, the ETL graphs can remain in separate files and can continue to exist independently. Depending on the integration, you may choose to combine them together.

In this tutorial, the Date dimension from the Twitter Search project is replaced by the corresponding Date dimension in the Google Analytics project. Then, the ETL for the Twitter Search project must be modified to reference the new Date dimension.

Set up the project

In CloudConnect, you can now set up the project into which you will be integrating these two projects. Since it is the larger project, you should use the Google Analytics project as the base project, into which you import the elements of the Twitter Search project.

The first step is to create a copy of the Google Analytics project.

Steps:

  1. In CloudConnect Designer, close any tabs that are opened to the Google Analytics project’s ETL graphs or logical data model.
  2. In the Project Explorer panel, secondary-click the Google Analytics project and select Copy.
  3. Secondary-click again and select Paste.
  4. In the dialog, enter the following for the name: Google Analytics and Twitter Demo.
  5. Click OK.

A copy of the Google Analytics project has been created and is available in the Project Explorer.

Acquire assets from Twitter project

Now, you need to integrate the components of the Twitter project. In this case, you want to export and import the following things:

  • ETL graph (contained in the twitter.grf file)
  • logical data model (contained in the twitter.ldm file)

Complete the steps below to integrate these two elements into your project.

Export ETL graph file

Steps:

  1. In the Project Explorer, select the Twitter Demo project.
  2. Open the graph folder.
  3. Secondary-click the twitter.grf file and select Export….
  4. The Export Wizard is displayed. Under the General folder, select File System:
    Export Wizard
    Figure: Export Wizard
  5. Click Next.
  6. Verify that the twitter.grf file is selected. You may choose a different export location if desired. This location is just used for the export/import process.
    Export Wizard 2
    Figure: Export Wizard
  7. Click Finish.

The graph file is exported to the specified location.

Export logical data model file

While it is possible to use simple copy-and-paste to bring the Twitter Search logical data model into your project, you are better served by using the following steps.

NOTE: You can import the logical data model from a separate project directly. Then, you can mash the two models together through a common connection and publish the resulting data model to a completely new project.

Steps:

  1. Open the model folder.
  2. Secondary-click the twitter.ldm file and select Export….
  3. Repeat the steps in the Export Wizard to export the logical data model to the same location where the ETL graph file is stored.
  4. The logical data model file is exported.

The logical data model file is exported to the specified location.

Import assets into Google Analytics project

Import ETL graph file

Use the steps below to import the ETL graph file into the Google Analytics and Twitter Demo project.

Steps:

  1. In the Project Explorer, select the Google Analytics and Twitter Demo project.
  2. Secondary-click the Google Analytics and Twitter Demo folder in the Project Explorer and select Import….
  3. The Import Wizard is displayed. From the list of source, select File System:
    Import Wizard
    Figure: Import Wizard
  4. Click Next.
  5. In the next step, click the Browse button next to the From directory textbox. Navigate your local environment to select the twitter.grf file. Click Open
  6. In the Wizard, the list of files in the selected directory is displayed. Click the checkboxes next to twitter.grf and twitter.ldm.
  7. For the Into folder setting, you can leave the current value.
  8. The Wizard should look like the following:
    Import Wizard 2
    Figure: Import Wizard
  9. Click Finish.
  10. The two files are imported into the top-level folder of the project.
  11. Drag and drop the twitter.grf file into the graph folder.
  12. Drag and drop the twitter.ldm file into the model folder.
  13. Save the project.

Integrating the Imported Assets

The base assets have been imported into the combined project. Now, you must connect the Twitter Search assets into the existing assets for the Google Analytics project.

Integrate Twitter data model

The logical data model for the project is now contained in two separate files. Through CloudConnect Designer, you can publish to a GoodData project the LDM contained in a single file at a time. So, you must bring the contents of the twitter.ldm file into the googleanalytics.ldm file. To do so, you may use copy and paste.

Steps:

  1. Open the twitter.ldm file and the googleanalytics.ldm file in the combined project.
  2. In the twitter.ldm tab, click and drag a selection rectangle around all items on the canvas. Copy the contents.
  3. Select the googleanalytics.ldm tab. Paste the contents.
  4. You should drag the pasted contents so that the two date dimensions are next to each other. Your screen should look like the following:
    Combined LDM
    Figure: Integrated LDM
  5. Save the file.

Connect the data models

To connect the integrated data models, you must delete the Date dimension from the Twitter Search project (called, tweet) and connect the remaining Twitter Search LDM objects to the Date dimension from the Google Analytics project.

Steps:

  1. Secondary-click the tweet Date dimension and select Delete. The Date dimension and its connection to the Twitter dataset are removed.
  2. Hover the mouse over the Date object so that an arrow appears at the side of it. Click and drag the line to the left side of the Twitter dataset. When you release the mouse button, the connection is made.
    Connected LDM
    Figure: Connected LDM
  3. Save the file.

The two LDMs are now connected.

Publish the LDM

In order to be able to configure the ETL to write to the appropriate fields in the GoodData project, you must publish the combined logical data model to the GoodData project at this time.

To publish, verify that you are looking at the combined LDM. Then, select any whitespace area in the logical data model and click Publish model to server. Select or create the Google Analytics and Twitter project.

After the project has been updated, you should verify that the logical data model has been updated through the GoodData Portal.

Update the Twitter ETL to use the new date dimension

You must now update the ETL for the Twitter data feed to use the Date dimension from the Google Analytics project. In this case, you must map the previous field to the new field in the logical data model.

Steps:

  1. Click or open the twitter.grf tab.
  2. In the graph, double-click the GD Dataset Writer component.
  3. The component configuration is displayed.
  4. Click the Field mapping entry. Then, click Browse…. The Dataset field mapping window is displayed:
    Dataset field mapping window
    Figure: Dataset field mapping window
  5. Under Dates, click the Date fact drop-down. Select Date (tweet). You have mapped the field in the logical data model.
  6. Under the Input Fields column, click the corresponding drop-down for the field you just set.
  7. Click Finish. Click OK.
  8. Select tweet.
  9. Save the file.

The new field in the ETL graph is connected to the LDM.

NOTE:In the configurations for the other components of the ETL graph, the labels for the field mappings do not correspond to the new name. However, since the data is internal to the graph, they are consistent and still work. Feel free to update the labels for clarity.

  • To test the graph, secondary-click in the white space in the graph. From the drop-down, select Run As > 1 Graph (Locally). Verify that a success message is displayed in the Console tab.

NOTE:Since the full ETL for the project is contained in two separate *.grf files, you must run the graph in each file. You may find it easier to copy and paste the contents of the second graph file into the first and remove the second *.grf file from your project. Then, you can run the ETL for the entire project from a single command. For more complex projects, however, it may be easier to manage graphs through multiple tabs/files.

Run Google Analytics graph

Before you publish your project to the GoodData platform, you should verify that the Google Analytics graph is working properly.

Steps:

  1. Open the ga.grf file.
  2. Enable debugging in all edges in the graph.
  3. Run the graph locally.
  4. Verify that there is an adequate set of records being passed into the GD Dataset Writer.
  5. If you made changes, save the file.

Create a report

After you have run the graphs locally, you should also review the data inside the project in the GoodData platform. In the steps below, you create a simple data validation report to verify that the data has been properly loaded and can be sliced by Date.

Steps:

  1. Login to the GoodData Portal.
  2. From the Projects drop-down, select the project to which you published your project.
  3. Click the Reports menu.
  4. Click Create Report.
  5. Click the What tab.
  6. Click the Add New Metric link.
  7. For the # of Tweets metric, select COUNT for the Operation. For the Perform Operation On value, select Text. For the name, enter # of Tweets. Do not add to global metrics. Click Done.
  8. The report is generated. If you have been able to extract Twitter data via the Twitter graph, the report should display a positive integer value in the report for the # of Text value.
  9. Click the What tab again.
  10. Click the Add New Metric link.
  11. For the # of Visits metric, select SUM for the Operation. For the Perform Operation On value, select New Visits (Visitor). For the name, enter # of Visits. Do not add to global metrics. Click Done.
  12. The report is updated to contain two metrics side-by-side.
  13. Now, you can slice the date by the date dimension. Click the How tab.
  14. Click All Attributes. Then, select the Date (Date) checkbox. Then, click Done.
  15. The report now contains the number of tweets and number of visits tracked per date.

Delete twitter.ldm file

After you have validated that the data has been uploaded, you should remove the twitter.ldm file from the combined project.

Steps:

  1. In the Project Explorer tab, open the model folder of the combined project.
  2. Secondary-click the twitter.ldm file and select Delete.
  3. The Twitter LDM is removed from the project.

Publish to a new project

When the validation report is complete, you can publish the CloudConnect Designer project to the platform.

NOTE: By default the Twitter graph is designed to do a full replacement of the data each time it executes. Since the Twitter Search API limits the number of tweets retrieved per query to 100, at most you can retrieve 100 tweets of the most recent data at a time. Making the modifications to gather historical data exceeds the scope of this document.

Steps:

  1. When you are ready to publish the project, click the Project Explorer tab.
  2. Secondary-click the project containing the Twitter and Google Analytics items. Then, select Deploy As > Deploy CloudConnect project to GoodData Server.
  3. The Deployment configurations window is displayed.
    Deployment configurations window
    Figure: Deployment configurations window
  4. Note that you are creating a new Process, which contains the runtime components of the ETL graph that you authored in CloudConnect Designer. A process includes the graph and any associated schedules that you create for it (see below).
  5. Verify that the value for the GoodData project corresponds to the project to which you wish to publish your CloudConnect project. To choose a different project, click Select and make your selection.
  6. To deploy the CloudConnect project to the GoodData project, click Deploy.
  7. In the Console tab, a success message indicates that the project has been successfully deployed.

After the project has been deployed, you can schedule periodic execution of the ETL. For more information, see Automate Data Loading Process.