Source Data Server

In this section we will create an AWS EC2 instance to act as our source data server.  We will load data here from imdb.com.

Now let’s launch an EC2 instance to hold our data.  This will act as our source system. Services-> EC2.

Setup-EC2 CLick

Launch Instance.

Setup-launch instance

Select Amazon Linux.

Setup-Quickstart select

Choose t2.micro.  This will be free tier eligible.

setup - select instance type

Keep default network and subnet.  ‘Enable’ Auto-assign Public IP. This gives this server a public or external IP address.  This will make it easy to connect to this server from outside the VPC (Virtual Private Cloud).

setup config details

Bump up the storage a little here, because we are going to load some data on this server.

setup storage

Keep ‘Assign a security group default’ set as ‘Create a new security group’.  Select My Ip in the routing table below. This allows port 22 SSH traffic into this server from our IP.  This will allow us to SSH terminal into the server.

setup security

Create a new key pair.  Give the new key pair a name.  Download the Key Pair. DO NOT lose this.  We will use this key pair to SSH into this machine. Click Launch.

setup key pair

Click view running instances. At the bottom.  Or services->EC2. Then click ‘Running Instances’ to the right, or Instance under EC2 Dashboard.

Copy your Public DNS.

setup - public dns

Open a terminal window on Mac chmod the pem file (key pair) downloaded earlier.  Limit access to the key pair.

>chmod 500 MyPOCKeyPair.pem.txt 

Now use terminal window on Mac or Putty on Windows to SSH to our EC2 instance.

Type the command -> ssh -i ec2-user@[/Path/KeyPair] [Public DNS]

Example (my key pair file is in my ‘/Downloads’ directory).

>cd /Downloads

>ssh -i MyPOCKeyPair.pem.txt ec2-user@ec2-18-207-211-107.compute-1.amazonaws.com

Now download the imdb data sets using the following commands:

> wget https://datasets.imdbws.com/name.basics.tsv.gz

> wget https://datasets.imdbws.com/title.akas.tsv.gz

> wget https://datasets.imdbws.com/title.basics.tsv.gz

> wget https://datasets.imdbws.com/title.crew.tsv.gz

> wget https://datasets.imdbws.com/title.principals.tsv.gz

> wget https://datasets.imdbws.com/title.ratings.tsv.gz

setup wget

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s