APT Documentation: AWS GPU Back End

The information below will guide you through understanding, setting up, and configuring Amazon Web Services Elastic Cloud Compute (AWS EC2) for use with APT. After you are done with the setup, APT will be able to perform deep learning training and inference in the cloud.

Requirements
About EC2 instances
EC2 Setup
Managing EC2

Requirements

The steps below will walk you through obtaining the following:

Windows-only: Git. Follow the installation instructions and install to the default location C:\Program Files\Git. To communicate with EC2, we use the ssh and scp packaged with Git. You can also use Git to download the latest version of APT.
An AWS EC2 account with access to p2.xlarge instances. Instructions to set this up are below.
AWS Command Line Interface (CLI): install instructions here. We used the following:
- Windows: MSI installer.
- Linux: awscli-bundle.zip

About EC2 instances

You can think of an EC2 instance (as configured below) as a computer with a GPU in the cloud that is under your control. It is your "GPU in the cloud". An instance has:

An instance ID, which uniquely identifies it.
A (public) IP address so you can connect to it via ssh, examine its processes and filesystem contents.
A state that is either RUNNING, STOPPED, or TERMINATED:
- If the state is RUNNING, your instance is either actively computing or ready to do so. You are paying about $1/hour.
- A state of STOPPED is analogous to having your desktop workstation in "hibernation" or "sleep" mode. No computations are being carried out, but any previous computations (trained models) etc are saved in the remote filesystem. You are paying a very low price based on the amount of disk storage ("EBS" storage in AWS-speak) that has been allocated for your instance. For 50GB of storage (the current default), you are paying $5/month.
- A state of TERMINATED means your that instance has ceased to exist and returned to the ephemeral protoplasm of the cloud! Any and all state is destroyed at termination, so before you terminate an instance, make sure you tell APT to download all your trained models. As you might guess, terminated instances do not cost anything.

Since STOPPED instances are inexpensive, APT is currently designed around the idea that you will create an EC2 instance and "leave it up" for a stretch of time (eg weeks, maybe even a month or two) while you do a bunch of work for a project. During this time, you can iteratively label, train, and track within APT over multiple sessions. Between active working sessions, your instance is STOPPED, and all APT state including trained models and movies/trxfiles to be tracked is preserved. After a time, you will reach a stopping point for the project, and you can instruct APT to download your trained models to your local workstation. (Tracking results are currently downloaded immediately after each tracking session.) When the download is complete, you can terminate the EC2 instance.

APT automates starting/stopping your instance (TODO: check), starting/stopping training and tracking processes, and so on. However, it is inevitable that APT will at times become disconnected from what is happening in the cloud. At these times, manually managing the EC2 instance by eg ssh-ing into the instance and killing processes, or manually stopping an instance via the AWS dashboard will be necessary. More information on how to do this is here.

Be proactive and check on your instance!

EC2 Setup

AWS account setup

Create an AWS account and set up the payment system. This is the root account for AWS. You can create an account here.
Create an IAM user (or users) from within the root account:
- Choose a user name and set the Access type to Programmatic Access.
- To set permissions, select Attach existing policies directly, and add Administrator Access and IAMUserChangePassword permissions.
Login to the AWS user account using the console.

Connect your computer to your account

Create Access Keys. Access keys are a sort of pair of login and password that are required to start an instance.
- To create a new Access Key pair for an IAM user, open the IAM console or look for IAM under Services then Security, Identity & Compliance on your console home page.
- Click Users in the Details pane, click the appropriate IAM user, and then click Create Access Key on the Security Credentials tab.
- Save the access keys anywhere (e.g. in your $HOME/.ssh folder).
- On Linux, change the permission to be read-only using
```
chmod 400 path/to/access_key.csv
```
If you do not already have one, create an ssh key pair. Note that this ssh key pair is different than the above Access Keys. The ssh key pair is used to ssh into a running instance while Access Keys are required to create an instance.
- Go to the EC2 console or look for EC2 under Services - Compute on your console home page.
- On the left, click on Key Pairs under Network & Security and then use the Create Key Pair button. Make note of the name and stored location of your key pair, you will need to refer to it when using APT.
- Save the key pair in your $HOME/.ssh folder.
- On Linux, change the permissions to read-only using chmod 400 ~/.ssh/key_pair.pem.
Alternatively, if you already have an ssh key pair, you can import your public key by clicking Import Key Pair. Make note of the name and stored location of your key pair, you will need to refer to it when using APT.
Increase the number of instances you can create of type p2.xlarge. p2.xlarge is the basic instance type that has a GPU useful for training APT trackers. Using p2.xlarge costs $0.90/hr. To do this, in the EC2 console, use the Limits option on the left and request a limit increase for p2.xlarge. We suggest increasing the limit to at least 2 instances so that you can train on one instance and track on the other. Note that this step seems to take a day or so because Amazon does verifications. You can, however, continue with the rest of the set up while you wait for this increase to be verified.
Install the AWS Command Line Interface (CLI) following the instructions here.
- Windows: I installed using the MSI installer.
- Linux: I installed using the bundled installer.
Configure the AWS CLI following these instructions in a terminal/command prompt. As part of this, you'll need to enter your AWS Access Key. For Janelians, the default region is us-east-1, Northern Virginia. Choose the default output format as json.

Check your CLI setup. To do this, in your terminal, type the following command:

aws ec2 describe-regions --output table

The output should be something like:

----------------------------------------------------------
|                     DescribeRegions                    |
+--------------------------------------------------------+
||                        Regions                       ||
|+-----------------------------------+------------------+|
||             Endpoint              |   RegionName     ||
|+-----------------------------------+------------------+|
||  ec2.ap-south-1.amazonaws.com     |  ap-south-1      ||
||  ec2.eu-west-3.amazonaws.com      |  eu-west-3       ||
||  ec2.eu-west-2.amazonaws.com      |  eu-west-2       ||

If it has not been done by someone in your group, create a security group named apt_dl. The security group defines the basic firewall for the instance that will be launched using the CLI. Our security group, which must be named apt_dl, will allow instances to accept ssh connections from any IP address. Enter the following in your terminal/command prompt.
```
$ aws ec2 create-security-group --group-name apt_dl --description "Basic security group for APT deep learning"
{
    "GroupId": "sg-b018ced5"
}
$ aws ec2 authorize-security-group-ingress --group-name apt_dl --protocol tcp --port 22 --cidr 0.0.0.0/0
```
You do not need to do this if another user on your account has already created this security group. The following command will tell you if a the apt_dl security group has already been created:
```
  aws ec2 describe-security-groups --group-names apt_dl
```
Once the increase to the number of instances is approved, check that you can launch and ssh into an instance using the EC2 console. Follow the instructions below to

Connecting APT to AWS

To set APT to use the AWS backend for training and tracking:

From APT's Track menu, select Track->GPU/Backend Configuration->AWS Cloud.
Select Track->GPU/Backend Configuration->(AWS) Configure to configure. Here, you must enter information about how to log into AWS. Enter the name and location of your AWS key.
- Name: You can look up the name of your key on the EC2 console under Network & Security and Key Pairs.
- Location: Where you saved your private key, usually $HOME/.ssh.
Set which EC2 Instance APT will use. You have the option of launcing a new instance (Launch New) or attaching to an existing instance (Attach to Existing).
Optionally, you can test that everything worked correctly by selecting Track->GPU/Backend Configuration->Test backend configuration.

Managing EC2

From the EC2 console, you can Start, Monitor, and Manipulate instances.

Launching an APT-compatible instance

To launch an instance:

Use the Launch Instance button on your EC2 console page.
Select the APT Amazon Machine Image (AMI) bransonlab_apt_ami_[latest version]. Currently, the latest AMI is bransonlab_apt_ami_tf115_20200601 (ami-061ef1fe3348194d4) and is available only in N. Virginia region (us-east-1). This contains everything needed to run APT. You can find this by searching for bransonlab_apt_ami within Community AMIs.
Select p2.xlarge as the instance type.
Click Review and launch. It will then prompt you for a key pair. Select the ssh key pair you created/uploaded previously.

Monitoring EC2 instances

To see information about all your EC2 instances, go to the EC2 console and use the Instances option, and select any launched instance from the table.

You can connect to a RUNNING instance using ssh. Find the instance's IP address (IPv4 Public IP). Use the IP address to ssh into the machine using

    ssh -i ~/.ssh/[key_pair.pem] ubuntu@[IPaddress]

Replace [key_pair.pem] with the ssh key pair you created and downloaded previously. If instead you imported an existing key into Amazon, you can replace this with your private key file. Here is a screenshot of sshing to an EC2 instance on Windows:

Manipulating EC2 instances

Use the Actions button at the top of the instances page, to Terminate the machine from the Instance State menu.

APT Documentation: AWS GPU Back End

Contents