What
Build an Amazon AMI, with your preferred data science tools, plus Jupyter. Use it from anywhere - all you need is a browser and keyboard.
Why
- Use any EC2 instance, including Free tier eligible
- Use EC2 spot instances to lower cost, when you have higher compute needs
- EC2 feels like a local desktop installation. Packages can be added interactively, as needed
- Useful for early stage data exploration, where flexibility with many tools is more important than versioning. In later stages, such as model development, versioning of both models and code becomes more important. Using docker images instead would provide reproducibility.
- AMIs are easy to update, so your newly installed packages will always be present.
- Mix and match multiple languages (R, Python, Julia) in a single AMI.
How
1. Launch the EC2 Instance
Step 1: Choose a base Amazon Machine Image (AMI)
- > Services (top) > EC2 > AMIs (left) > Launch Instance (blue button)
- Select the check-box (left) for “Free tier only”
- In the search box, enter “Ubuntu 18.04”, without quotes
- Click “AWS Marketplace” (left)
- Click “Select” (blue button) for Ubuntu 18.04 LTS - Bionic from Canonical Group
- Click “Continue” (blue button) in the pop-up
Step 2: Choose an Instance Type
- Check “t2.micro” with the “Free tier eligible” tag
- Click “Next: Configure Instance Details” (grey button at bottom-right)
Step 3: Configure Instance Details
- Click “Next: Add Storage” (grey button at bottom-right)
- Click “Next: Add Storage” (grey button at bottom-right)
Step 4: Add Storage
- VolumeType=Root, Device=/dev/sda1, Size=20 GiB. This is ephemeral storage. When the EC2 VM gets terminated, it gets deleted.
- Click “Add New Volume” (grey button below “Root”)
- VolumeType=EBS, Device=/dev/sdb, Size=10 GiB. This is persistent storage, for jupyter notebooks, and files you want to save between sessions.
- Click “Next: Add Tags” (grey button at bottom)
Step 5: Add Tags
- Click “Next: Configure Security Group”
- Click “Next: Configure Security Group”
Step 6: Configure Security Group
- Click “Add Rule” 3 times
- In the “Type” column drop-down (2nd row, just below “SSH”), select “HTTP”. Leave the other columns as default values
- In the “Type” column drop-down (3rd row), select “HTTPS”
- In the “Type” column drop-down (4th row), select “Custom TCP”. Set “Port Range”=4444 (Any number can be picked. This is our connect port for Jupyter). Set “Source”=0.0.0.0/0, ::/0
- Clear the “Security group name” box, and use a SG name such as “Ubuntu 18.04 with Python, R, Jupyter”
- Clear the “Description” box. Enter “Data Science with Jupyter AMI” or anything else
- Click “Review and Launch” (blue button at bottom)
Step 7: Review Instance Launch
- Create a new Key Pair. Key pair name=“jupyter_ec2”. Download
- Click “Launch” (blue button)