|Related Tags: data modeling cloudconnect ldm|
Fact table grain improves the performance of incremental uploads and simplifies data management. Normally, when deduplicating records you must create a special column or ‘connection point’ in your project. This approach leads to complexity in the ETL process as all attributes that form a unique identifier together must be concatenated, passed to a hash function and then added to the dataset.
This article describes where it is helpful to use the fact table grain feature, and how you can set the grain of a fact table. We also describe some useful examples from customers who already use this improvement.
Basic knowledge of MAQL DDL is helpful.
What is Fact Table Grain?
The fact table grain functionality sets a new compound primary key for a table. This means you no longer need to use connection points for incremental uploads to fact tables.
The grain of a fact table defines the lowest level of detail that the fact table is divided into. For more information about grain in fact tables, see Fact Table. As there may be billions of records in the table that need to be inserted or updated on a daily basis, full data upload requires potentially slow transfer of an extremely large amount of data with possibility of errors or missing values. Hence it is more advantageous to update tables of records incrementally. To do this each record must have a unique identifier so that only those records that need to be overwritten are overwritten.
For example, in a billing system that has invoices with multiple invoice lines, each line represents a product bought by a customer and does not have a natural primary key. Each Invoice has a unique Invoice ID and you can identify any entity by specifying the Invoice ID and Invoice Line. The Invoice ID and line number together form a compound key that uniquely identifies any record within the table invoice line. This can be set as the GRAIN of the table.
NOTE: Previously, you would have to create a ‘surrogate’ connection point in the Invoice line table to be able to append data incrementally and update it, if needed (for example, in case of errors in previous uploads).
Where Fact Grain Helps
Fact Grain helps where there are extremely large volumes of data that need to be updated regularly. However, owing to the effort needed to implement we do not recommend using Fact Grain when projects:
- are small, i.e. no more than 1 GB
- do not use incremental upload mode, but only full
- do not contain fact tables with connection points
Key Fact Table Grain Benefits:
- Simplifies incremental loading of data. This is especially useful when working with large fact tables leading to much quicker ETL times.
- You no longer have to worry about surrogate keys, because the system checks for existing data rows.
Using Fact Table Grain
GRAIN can be implemented through API, MAQL DDL scripting, or CloudConnect LDM Modeler:
- For more information about setting the grain via API, see API reference.
- Fact tables that have a grain do not have to contain facts. In this case, the common loading method of all attributes is applied to this table as well.
- Attributes that are part of a grain must not contain any labels.
- If you remove grain from an attribute with loaded data and then add a connection point to that attribute, you must refresh data when you upload new data (a full upload; or a hard synchronize, see Managing LDM Project Changes).
Limits: 2-32 attributes or references only (that is, a compound key can be up to 32 columns max)