Now we will put together a simple ETL Orchestration with Matillion. Basically we are creating a workflow of jobs. When doing this be sure to take into account job run times, load on the system, and dependencies (ie, what tables need to be loaded before the job can run).
Here we will just create one Master Sequence to run all the jobs. Normally I would have more levels grouping together similar jobs into separate batches, and then an overall Master Batch to run everything. You want to think about how you would rerun these jobs if they were to fail. Since we a have a simple destructive load here that runs in a short period of time, if something goes wrong we can just rerun the master. In real life it may not be that simple.
Let’s first create a new “Batch” folder.
Add a new Orchestration Job Name ‘Master Batch’.
Click ‘Yes’ When prompted to switch to new job.
Grab the staging jobs and drag them to the canvas. Connect from start to each job as depicted below.
Under components drag the ‘And’ stage onto the canvas. Connect each staging job to this stage. Be sure to grab the GREEN connectors. These are job flows for successfully completed jobs. The and stage waits until all connectors are complete before executing the next step.
Now drag the dimensions and fact onto the canvas. Connect them from the ‘And’ component.
Note: Due to the way our jobs are designed, the fact table and dimension tables can be processed in parallel. This depends on what data the ETL job is accessing. In some systems the fact job will actually join to dimension tables to pull back the Surrogate ID. This would mean that the dimension tables would need to be fully loaded before the fact table could run.
Right click start and select ‘Run from Component’.
Check the log for a complete run. You may also want to run some the validation SQL from earlier sections.
Check the tasks tab. Click arrow on far right to see a more detailed view.
Check results of query from fact build section.