Skip to main content

 

ThoughtSpot

Data Connect best practices

Overview

Explains what the feature is or what its benefits are to the user or customer.

Feature

This section includes some tips and best practices for using ThoughtSpot Data Connect.

Status reporting

You can see the status of a data load job by looking at the list of data sources. The data load status gets updated every 30 seconds.

Figure: Data load status

File:Integration_Guides/data_connect_guide_bookmap/030/020/data_connect_load_status.png

To see loading details for each table that makes up a data load, click on the name of the data load.

Figure: Data load status details

File:Integration_Guides/data_connect_guide_bookmap/030/020/data_connect_load_table_status.png

View log files

You can download the session logs for a data load run. Access the link to download the log on the data load status details page for the session. Check the log files for a successful or failed run, and fix the cause of the failure before resubmitting the job.

Figure: Download session log

File:Integration_Guides/data_connect_guide_bookmap/030/020/data_connect_session_log.png

Import relationships

Relationships (primary key/foreign key) can be imported from the source database. For this to work, the primary and foreign key(s) need to be made up of only one column each. That is, they cannot be composite keys. You will also need to include all the tables whose relationships you want to bring over into ThoughtSpot within a single data source.

Edit the DDL script to choose a different database, schema, or table

You can edit the SQL DDL (data definition language) script generated by ThoughtSpot Data Connect to point to an existing table in ThoughtSpot. You can then load to that table and/or create the table in a different database/schema than what the default DDL suggests. To do this, the column names and the order of the column names must match exactly with the table you are importing into in ThoughtSpot.

No need to import a column mentioned in an expression

It is not necessary to import a column that is mentioned in an expression. For example, let's say you have a source table "sales_fact" with a column "revenue". You may want to use that column as part of an expression that calculates gross margin. But maybe you don't want to include the column "revenue" in the target table. You can unselect the column "revenue" in the table selection page, and then add your gross margin expression that references the column "revenue", even though it isn't part of the data to be loaded. Your expression will still work.

Internet connectivity

The only internet access the needed for ThoughtSpot Data Connect to operate is outbound access to https://app.informaticaondemand.com and https://app2.informaticacloud.com/ (port 443). Of course, it also needs any other internet connectivity required to access any cloud data sources, like Salesforce. No inbound access is needed.

Renaming columns and table

You can rename columns and tables after a data load, using the ThoughtSpot web application. Select Data and click on the name of the table. Then click on a column to rename it.

In older versions of ThoughtSpot Data Connect, you could not use characters other than alphanumeric and underscores. Now, you can change a column name to use those, for example you could change "percent_gross_margin" to "% gross margin".

Troubleshooting

To view details about error conditions, you can look at the trace file that is generated when there is a red bar error.

Possible issues that can occur are:
  • Cannot connect to data source.
  • Authentication was rejected.
  • Schema being loaded does not match schema receiving the load.

Best practices

Unless you're trying to bring in relationships between tables with a data source, the recommendation is to create one data source load job per table. Doing this makes your data sources more flexible for if/when you need to change something (add a column, etc.)

If you want to add a foreign key, do it using the TQL command ALTER TABLE, after the load has completed. This is often easier than trying to get relationships to come in with a data load, and it works even if the relationship you want to create does not exist in the source system. If use this method, you'll need to manually create the relationships using a TQL script, which you can run afterward the first data load.

Derived column names

When you create a derived column, further activities in ThoughtSpot Data Connect will show you the target column name, since there is no analogous column in the source.

Compact tables to save memory

If you're doing incremental loads into existing tables, you may want to automate a script that runs the TQL command COMPACT TABLE either after each incremental load, or periodically (like weekly, for a nightly load). This is because when upserts are performed, space from the old rows is not released. The delta between each row that gets upserted is retained and takes up memory.

Expressions

You can validate your expressions when using ThoughtSpot Data Connect. Make sure to click Validate after entering your expression. This prevents you from entering an invalid expression that will fail at runtime.

When using the expression builder concat, you need to use single quotes instead of double quotes around each of the strings.

Connection parameters

Informatica provides documentation on each of its connectors. This contains information on the connection properties and what they mean for each connector. You can see the connection properties in ThoughtSpot Data Connect by mousing over the key value pairs, however, you won't be able to see their meanings. For that, you should refer to the Informatica documentation.

  • Was this article helpful?