Querying AWS DynamoDB with User Defined Java Class

A quick example on how to use a User Defined Java Class (UDJC) to query Amazon DynamoDB with Pentaho 8.2.x (probably works in other versions as well).

The UDJC uses the local machines DefaultAWSCredentialsProviderChain to find authentication credentials (environment variables, java properties, profile, etc.).

There is opportunity to create a single request with multiple Items and send them in a single batch.

The transformation can be used to grab configuration information for an execution e.g. read server properties for a project/environment/tenant and set them at top of your ETL cycle.

alt text

Click here for sample put/get to DynamoDB from AWS CLI
Click here for transformation
Click here for UDJC

How to launch and watch a job on Pentaho/Carte Server 7.x+

Since version 7.x, Pentaho Server (or a Carte Server) has the ability to kick off jobs (or transformations) on one of the aforementioned servers.

To see the status of the ETL that is being executed, one would browse to http://SERVER_IP:SERVER_PORT/kettle/status. We could actually see this status page of currently and recently executed ETL since at least version 4.x.

The shell script provided starts execution of a job stored in the Pentaho Repository (NOTE: syntax may change slightly between versions, and Enterprise or File System repositories) and monitors the status of the job on the kettle/status endpoint by calling it with the id generated after submission.

The status of the job is then evaluated to identify overall SUCCESS or FAILURE state of the requested job. This can be used with scheduling tools or CI/CD tools, for regular execution, or perhaps in a testing framework, respectively.

Click here for script
Click here for Pentaho Documentation

A simple date dim generator

A small Python script to generate a date dimension for your data warehouse. Uses YYYYMMDD as an intelligent surrogate key.

Use with Python 2.7.x:

kiran@zeus:~$ python makedate.py
makedate.py -s<startyear> -n<numberofyears> -o<outputfilename>
 NOTE: output includes 19000101 record
kiran@zeus:~$ python makedate.py -s1900 -n5 -odate.csv
COMPLETED writing file date.csv

Click here for code