Become a Data Guru

With the proliferation of the Cloud and Big Data technologies, new tools are popping up everywhere allowing us to do things we never could before.  Since the landscape is evolving very rapidly, I thought maybe I could do something to help organizations and professionals keep up.

Who this site is for:

  1. Organizations wishing to Proof of Concept certain technologies or BI/DW architectures.
  2. Anyone looking to learn BI/DW architecture concepts.
  3. An individual that would like to create their own sandbox environment to explore or learn new tools.

How to use this site.  Click on the ‘Setup’ menu, and follow instructions to configure source and target servers/data stores.  Next select the ETL tool from the Setup Menu, and follow install instructions.  Then follow the steps in the corresponding ETL Drop down to build a small data mart.  It is recommended that you work through the ETL sections in the order they are shown.

This slideshow requires JavaScript.

Currently this site will walk you through the following build out.  

  1. General Setup – Launch an EC2 instance (a server in the AWS cloud) to act as data source. Download the IMDB data files.
  2. Target Setup – configure target datastore.
  3. ETL Setup – install and configure ETL software.
  4. Load Data to S3 – Transfer data to S3 (a file system on AWS).
  5. S3 to Target – Load this data into staging tables in a datastore.
  6. Create Star Schema – Transform this data into Facts and Dimensions.
  7. Orchestrations – Build out Orchestration to run this daily (coming soon).
  8. Business Intelligence – Develop a simple dashboard to display results (possible coming soon).

General Architecture.png

The idea is to do the above tasks in a variety of ways, so the audience can decide what is best for them and what setup they would like to try.

Manual steps will be added soon.  So if all you care about is having a data model to do performance testing or build dashboards on, hopefully the manual steps will give you a quick way to get there.

Currently, I have a basic setup complete using Matillion and Snowflake.  More tools and methods are to follow. I will get into more advanced ETL techniques using an open source tool, to bring my personal development costs down.  If you are brand new to this type of work, Matillion will be a great starting point because it is a visual tool.

This site is all about feedback!

  • This is hot off the press, I am releasing before final formatting, editing, and revisions, because I want to start getting feedback on the content.
  • The shape and direction of this site will depend on user feedback.  Please leave a comment if there is something you would like to see in the future.
  • If you see a better way of doing something, are confused, have a question, or want to see more of something leave a comment.

Up next:

Either Serverless ETL with AWS Lambda or ETL Orchestration with Apache Airflow.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s