In this section we will move data from our source system to AWS S3.
Let’s first create an S3 bucket to stage our movie data files. Navigate to Services->S3.
Select Create bucket
Give the bucket a name. Note: Bucket names must be unique across all of AWS.
Check make sure your bucket exists.
Let’s create an Orchestration job in Matillion to load the S3 bucket. Right click on default and select ‘Add Orchestration job’.
Name the job and click ‘OK’.
Under Components select Load/Unload->S3 Put Object and drag it onto the canvas.
Attach ‘Start 0’ to ‘S3 Put Object 0’ by clicking once on the circle to the right of ‘Start 0’, move your cursor on top of ‘S3 Put Object 0’ and click again.
Select ‘S3 Put Object 0’ stage by clicking once on it, and fill out the Properties in the panel below.
Input data -> SFTP Input data URL -> Set this to the url of the EC2 instance holding your source data files. Format like so -> sftp://[Private IPs]:22[Directory][Filename] Example -> Sftp://172.31.239.117:22/home/ec2-user/title.basics.tsv.gz Output object name: The name the file will have in S3 Username: ec2-user SFTP Key: copy and paste the full contents of MyPOCKeyPair.pem.txt S3 path -> s3://[bucket name]
Now since this is sftp, we need to open a port to our ec2 server holding our source data.
Navigate to the EC2 console. Select our t2.medium server that is running Matilion. Note the private IPs.
Now select the t2.micro that is our source system, and click the security group link.
Click the Edit Button on the Inbound tab.
Click ‘Add Rule’. Then type in port range ’22’. Add the Private IPs from the Matilion server. Be sure to add the CIDER block ‘/32’ at the end of the IP. Click Save.
Run the Matillion job. Right Click on the Start Stage and select ‘Run From Component (Dev)’.
Check the task tab for a completed status.
If the job failed click the arrow in a box on the far right of the run status to expand that window (the -> is found under the task tab, just to the right of ‘Completed’ time). Most likely the issue is the ftp address is wrong, RSA key not fully copied, or like the error below the port is not open.
After successful run, check S3 to verify the file made it.
Matillion does not have user prompting for variable values at this time. That would have been nice, b/c then we could have built one job and prompted for the file name. Let’s clean this up, and add a few variables anyway.
Right click on ‘default’ folder in explorer, and select ‘Create Folder’.
Add a name.
Right click on the job and select ‘Manage Job’.
Enter new name.
Drag the newly named job ‘Src_S3_Title_Basics’ to the folder ‘Src_to_S3’.
Now let’s add a project level variable for the server ip address. Click project drop down. Select ‘Manage Environment Variables’.
Click the plus sign at the bottom and add variable name, data type, default value.
Single click ‘S3 Put Object 0’. Select Export tab and Manage Variables->Manage Variables.
Click the plus sign at the bottom of the Manage Job Variables window. Add the variable name ‘filename’, and ‘title.basics.tsv.gz’ as the value. Select ‘Copied’ as behavior. This means that the variable value will be passed down any branches. The other option ‘Shared’ will share this value with all nodes.
Now add the variables to the properties. Be sure to use the ${variable} format. In this example the variable names would be ${src_server_ip} and ${filename}
Run the job again to make sure it still works. Then Copy, rename, and create jobs for each file.