The information below will guide you through understanding, setting up, and configuring Amazon Web Services Elastic Cloud Compute (AWS EC2) for use with APT. After you are done with the setup, APT will be able to perform deep learning training and inference in the cloud.
C:\Program Files\Git
. To communicate with EC2, we use the ssh and scp packaged with Git. You can also use Git to download the latest version of APT.p2.xlarge
instances. Instructions to set this up are below.RUNNING
, STOPPED
, or TERMINATED
:
RUNNING
, your instance is either actively computing or ready to do so. You are paying about $1/hour.STOPPED
is analogous to having your desktop workstation in "hibernation" or "sleep" mode. No computations are being carried out, but any previous computations (trained models) etc are saved in the remote filesystem. You are paying a very low price based on the amount of disk storage ("EBS" storage in AWS-speak) that has been allocated for your instance. For 50GB of storage (the current default), you are paying $5/month.TERMINATED
means your that instance has ceased to exist and returned to the ephemeral protoplasm of the cloud! Any and all state is destroyed at termination, so before you terminate an instance, make sure you tell APT to download all your trained models. As you might guess, terminated instances do not cost anything.Since STOPPED
instances are inexpensive, APT is currently designed around the idea that you will create an EC2 instance and "leave it up" for a stretch of time (eg weeks, maybe even a month or two) while you do a bunch of work for a project. During this time, you can iteratively label, train, and track within APT over multiple sessions. Between active working sessions, your instance is STOPPED
, and all APT state including trained models and movies/trxfiles to be tracked is preserved. After a time, you will reach a stopping point for the project, and you can instruct APT to download your trained models to your local workstation. (Tracking results are currently downloaded immediately after each tracking session.) When the download is complete, you can terminate the EC2 instance.
APT automates starting/stopping your instance (TODO: check), starting/stopping training and tracking processes, and so on. However, it is inevitable that APT will at times become disconnected from what is happening in the cloud. At these times, manually managing the EC2 instance by eg ssh-ing into the instance and killing processes, or manually stopping an instance via the AWS dashboard will be necessary. More information on how to do this is here.
Be proactive and check on your instance!
$HOME/.ssh
folder).
chmod 400 path/to/access_key.csv
$HOME/.ssh
folder.chmod 400 ~/.ssh/key_pair.pem
.p2.xlarge
. p2.xlarge
is the basic instance type that has a GPU useful for training APT trackers. Using p2.xlarge
costs $0.90/hr. To do this, in the EC2 console, use the Limits option on the left and request a limit increase for p2.xlarge. We suggest increasing the limit to at least 2 instances so that you can train on one instance and track on the other. Note that this step seems to take a day or so because Amazon does verifications. You can, however, continue with the rest of the set up while you wait for this increase to be verified.us-east-1, Northern Virginia
. Choose the default output format as json.
aws ec2 describe-regions --output tableThe output should be something like:
---------------------------------------------------------- | DescribeRegions | +--------------------------------------------------------+ || Regions || |+-----------------------------------+------------------+| || Endpoint | RegionName || |+-----------------------------------+------------------+| || ec2.ap-south-1.amazonaws.com | ap-south-1 || || ec2.eu-west-3.amazonaws.com | eu-west-3 || || ec2.eu-west-2.amazonaws.com | eu-west-2 ||
apt_dl
. The security group defines the basic firewall for the instance that will be launched using the CLI. Our security group, which must be named apt_dl
, will allow instances to accept ssh connections from any IP address. Enter the following in your terminal/command prompt.
$ aws ec2 create-security-group --group-name apt_dl --description "Basic security group for APT deep learning" { "GroupId": "sg-b018ced5" } $ aws ec2 authorize-security-group-ingress --group-name apt_dl --protocol tcp --port 22 --cidr 0.0.0.0/0You do not need to do this if another user on your account has already created this security group. The following command will tell you if a the apt_dl security group has already been created:
aws ec2 describe-security-groups --group-names apt_dl
To set APT to use the AWS backend for training and tracking:
$HOME/.ssh
.From the EC2 console, you can Start, Monitor, and Manipulate instances.
bransonlab_apt_ami
within Community AMIs.
To see information about all your EC2 instances, go to the EC2 console and use the Instances option, and select any launched instance from the table.
You can connect to a RUNNING
instance using ssh. Find the instance's IP address (IPv4 Public IP). Use the IP address to ssh into the machine using
ssh -i ~/.ssh/[key_pair.pem] ubuntu@[IPaddress]Replace [key_pair.pem] with the ssh key pair you created and downloaded previously. If instead you imported an existing key into Amazon, you can replace this with your private key file. Here is a screenshot of sshing to an EC2 instance on Windows:
Use the Actions button at the top of the instances page, to Terminate the machine from the Instance State menu.