Here is my working, production design for an AWS architecture capable of receiving 360 degree panoramic 8k video via an RTMP endpoint and live-streaming at varying bitrates and resolutions to various client devices. The architecture also includes an archiving step for users that may find it necessary.
The client records stunning 360 video (preferably using a Bonsai Excalibur) and uses OpenBroadcaster Software (OBS) to stream the feed to an AWS MediaLive RTMP endpoint. MediaLive offers dual ingest - two endpoints capable of receiving the stream.
MediaLive saves the incoming live stream to .ts files in S3. This allows us to support highly durable storage of important video files. These files will later be moved to Glacier to save on storage costs.
MediaLive compresses and transcodes the incoming streaming into various output formats that will support low and high bandwidth users. Everything from 320p to 8k video is supported!
MediaPackage uses the MediaLive channels to prepare the videos for distribution to various devices.
The live HLS streams are available via CloudFront endpoints - using a CDN cuts down on cost, and allows us to distribute high resolution video from low latency endpoints around the world.
Note: I am working on a CloudFormation script to automatically deploy this. More to come.
Note: This exam is heavily oriented towards long, detailed, and exhaustive networking questions. You cannot brute-force study your way through this exam. You need a thorough working understanding of Direct Connect and hybrid architecture scenarios and how to solve them.
To give you an idea of how hard this exam really is, it’s worth noting that I currently hold all 9 AWS certifications. I had finished 8 certifications before focusing on Advanced Networking, and comfortably passed all 8 attempts. It usually takes me 30-45 minutes to sit for an AWS exam, and I generally feel extremely well-prepared when I sit down for an exam.
The first time I took Advanced Networking - Specialty, I failed with a 60% score. Unlike all of my other certification exams, I knew within the first 10 questions that my chances of passing were slim. I chewed my fingernails to bits as I tried to troubleshoot scenarios that I didn’t even know existed. I walked out of the testing center miserable - I haven’t failed a certification exam since high school (MCSE). I had wanted to get all of the AWS certifications by the end of 2018 - but AWS mandates a two-week pause before you can re-take the exam, so I would be forced to finish my goal in 2019. This stung.
But, so it goes, right? I already had eight AWS exams, and I just needed this last one. I went home and immediately scheduled the exam for two weeks later.
I wrote down all of the things that I didn’t know, and I focused on where I felt the gaps in my knowledge were. I mostly realized that my Direct Connect troubleshooting skills were terrible, and I was not very good at VPC endpoint/interface stuff.
I crammed as hard as I could. Harder than any test I have ever taken. I studied more for this test than I had ever studied in my life. I drew diagrams, flashcards, and mental maps. I had my girlfriend quiz me over and over again on CIDR block properties and subnetting math. It was not a very fun two weeks. I gained weight, avoided social contact, and thought of nothing but passing. I am not used to failing exams, and I was frustrated with myself.
I took the exam for the second time on January 5th, 2019. I passed with a 66% score.
Two weeks of effort and studying had netted me… a 6% boost in score. Luckily, AWS scores on an unpublished curve, so this was sufficient to pass.
I think the exam is so difficult for a few reasons:
It is rare to have practical experience with Direct Connect and all of the networking concerns that accompany it.
The core material is simply very, very dense, and requires great knowledge of a great deal of AWS services and how they interact with each other.
You need strong networking skills in order to troubleshoot tough scenarios.
The breadth of the exam is extreme. If you haven’t taken the Big Data certification, I doubt you will have any knowledge of some of the products that are used in the exam’s scenario questions. You need very, very broad knowledge of AWS services - on top of that, you need to be familiar with technical implementation details with them.
I found it very useful to use pen and paper to re-draw all of the various routing scenarios that are presented throughout AWS Direct Connect/VPN/VIF documentation. You need to become intimately familiar with the flow of DNS requests for different scenarios.
All that being said, I’ve laid out some of my study notes below. Please enjoy!
If you need to set Active/Passive mode, use AS_PATH prepending to make certain routes less attractive than others.
For example, if you had two DX connections and wanted to configure in an Active/Passive configuration, you should prepend the AS_PATH of whichever DX connection you want to be the Passive.
The only way AWS will prefer a VPN over Direct Connect is if the VPN has a more specific route table entry.
Configure your custom DNS servers on EC2 instances in your VPC to route your private DNS queries to the IP address of the Amazon-provided DNS servers for your VPC. This IP address is the IP address at the base of the VPC network range “plus two.” If your VPC CIDR block is 10.0.0.0/16, the IP address of the DNS server is 10.0.0.2. (Source)
If you’re using custom DNS servers that are outside of your VPC and you want to use private DNS, you must use custom DNS servers on EC2 instances within your VPC.
If you have integrated your on-premises network with one or more VPC virtual networks and you want your on-premises network to resolve domain names in private hosted zones, you can create a Simple AD directory. Simple AD provides IP addresses that you can use to submit DNS queries from your on-premises network to your private hosted zone.
You cannot further subnet a /28 CIDR block.
DNS servers require TCP/UDP port 53 to be open.
For each public hosted zone that you create, Amazon Route 53 automatically creates a name server (NS) record and a start of authority (SOA) record. Don’t change these records.
BGP is typically classified as a path vector protocol.
A path vector protocol is a network routing protocol which maintains the path information that gets updated dynamically. (Source)
AS_PATH prepending assists in setting Active/Passive mode.
Keep these route preference rules in mind. They are evaluated from the top down.
Bidirectional Forwarding Detection (BFD) is a network fault detection protocol that provides fast failure detection times, which facilitates faster re-convergence time for dynamic routing protocols. Enable BFD when configuring multiple DX connections or a single DX + VPN backup. You can configure BFD to detect failures and update dynamic routing as Direct Connect quickly terminates BGP peering so that backup routes can kick in. (Source)
A forwarding equivalence class(FEC) is a term used in Multi-protocol Label Switching(MPLS) to describe a set of packets with similar or identical characteristics which may be forwarded the same way. An FEC label is used by the router to know where to forward packets in an MPLS network. (Source)
DSCP can be used as a means of classifying and managing network traffic and of providing QoS in modern Layer 3 IP networks. (Source)
Make sure to allow ephemeral ports in your outbound NACL rules.
The default VPC can be deleted and recreated.
Broadcasting is not allowed by AWS.
AWS will always prefer a DX connection over VPN, unless the VPN has a more specific route.
A sub-1Gbps connection is indicative of a hosted connection.
A VIF is required for each VPC connected to the DX connection.
You cannot change the port speed of a connection after creation.
There is a limit of 100 advertised BGP routes. Summarize routes if you have more than this.
Remember: The VPCs to which you connect through a Direct Connect gateway cannot have overlapping CIDR blocks.
DX Gateway is a global service.
Cannot be used cross account. Cannot be used with public VIFs.
Cannot be used to send traffic to other VPCs that are connected to the same Gateway.
You may be asked how many LOA’s will be generated for a customer request. Remember the formula: 1 LOA per Connection per Datacenter.
A Link Aggregation Group (LAG) counts as oneLOA.
Direct Connect Requirements
I noticed that the exam had an unusual emphasis on the Direct Connect requirements. Make sure you know these requirements by heart. It’ll be worth a few points on the exam.
BGP MD5 authentication
802.1Q VLAN encapsulation
Single-mode fiber with a 1000BASE-LX (1310nm) transceiver for 1gb Ethernet or a 10GBASE-LR (1310nm) transceiver for 10gb Ethernet.
Auto-negotiation for the port must be disabled.
Port speed and full-duplex mode must be configured manually.
AWS will always prefer a DX connection over VPN, unless the VPN has a more specific route.
Use an IPSec VPN over a public VIF to ensure protection of your data en-route from your on-premise datacenter.
AWS VPN does not support 128 bit AES encryption.
You need to use a public VIF over a DX connection in order to use an IPSec VPN.
You can monitor your VPN tunnels using Cloudwatch. (Source)
There are three VPN metrics: TunnelState, TunnelDataOut, and TunnelDataIn.
Use aws cloudwatch list-metrics --namespace "AWS/DX" to view metrics using the CLI.
If the VPN connection is idle for 10 seconds, the connection may close. Use a keep-alive tool to send continuous traffic.
Static routes take priority over propagated routes - it’s assumed that you’re using static routes for a good reason.
The VPN tunnel comes up when traffic is generated from your side of the Site-to-Site VPN connection. The virtual private gateway is not the initiator; your customer gateway must initiate the tunnels. (Source)
Allow UDP 500 (and UDP 4500 if using NAT Traversal).
OSI Layers Cheatsheet
Application - DNS, DHCP, BGP, FTP, SMTP
Network/Internet - IPSec
Transport - TCP/UDP
IPSec: IP 50, UDP 500
VPN: UDP 500, UDP 4500 for NAT-Traversal
DNS: TCP/UDP 53
Ephemeral Port Range: 1024-65535
Elastic Load Balancers
ELB needs a minimum subnet size of /27 (not/28).
ELB SSL Negotiation: A security policy is a combination of:
Server Order Preference option.
ELB SSL Negotiation: SSL 2.0 has been deprecated. Use SSL 3.0, TLS 1.0, TLS 1.1, or TLS 1.2.
CIDR Blocks and Subnetting
You need to understand CIDR blocks, addresses ranges, and be able to do subnet math in your head on the fly. Use this calculator to get better at subnetting.
The minimum size of a VPC subnet is a /28 (16 addresses) for IPv4.
The maximum size is a /16 (65,535 addresses).
For IPv6, the VPC is a fixed size of /56.
Use the 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 ranges. (Source)
For IPv6, the subnet size is fixed to be a /64. Only one IPv6 CIDR block can be allocated to a subnet.
The default VPC range is 172.31.0.0/16
Remember that 5 addresses are reserved per subnet:
0 - Network address
1 - VPC router
2 - DNS
3 - Reserved by AWS for future use
255 - Broadcast
Subnets cannot be larger than the VPC in which they are created.
Enhanced Networking and Performance
You can only use Jumbo frames/high MTU values within a VPC - once the traffic leaves the VPC, use 1500 MTU. The only exception to this is VPC peering traffic - it can take advantage of 9001 MTU. (Source)
To maximize instance speed for an external/internal facing instance, attach two ENIs:
An externally-facing ENI with an MTU of 1,500 bytes
An internally-facing ENI with an MTU of 9,001 bytes
DPDK is the Data Plane Development Kit. It consists of libraries to accelerate packet processing workloads running on a wide variety of CPU architectures.
To connect to AWS public endpoints, such as DynamoDB or S3, with dedicated network performance, use a public VIF.
Use an IPSec VPN over a public VIF to ensure protection of your data en-route from your on-premise datacenter.
Private Virtual Interface
To connect to private services, such as a VPC, with dedicated network performance, use a private VIF.
A private VIF allows you to connect to your VPC resources on your private IP address or endpoint. A private VIF can connect to a DX gateway, which can be associated with one or more VGWs.
A virtual private gateway is associated with a single VPC, so you can connect to multiple VPCs using a private VIF. For a private VIF, AWS only advertises the entire VPC CIDR over the BGP neighbor.
Do not advertise more than 100 prefixes for private virtual interfaces or 1,000 prefixes for public virtual interfaces. These are hard limits and cannot be exceeded. (Source)
You cannot extend endpoint functionality outside of the VPC - this means you cannot access a VPC endpoint from another location (like a VPC peer, or over a VPN).
DNS resolution is required within the VPC.
Default VPCE policy is unrestricted. You can lock it down after creation.
You can’t reach VPC endpoints over a private VIF.
Link Aggregation Groups
All connections in a LAG run in an Active/Active configuration. LAG is a Layer 2 connection. (Source)
Remember the three rules of creating Link Aggregation Groups:
All connections in the LAG must use the same bandwidth.
You can have a maximum of four connections in a LAG. Each connection in the LAG counts towards your overall connection limit for the Region.
All connections in the LAG must terminate at the same AWS Direct Connect endpoint.
Remember that AWS Workspaces requires a minimum of 1200 MTU. (Source)
Use Route53 to configure split-view DNS, also known as “split-horizon DNS”. This feature can be used when you want to maintain internal and external versions of the same website or application. (Source)
Create a Reusable Delegation Set using the Route 53 API command CreateReusableDelegationSet (you can’t do this through the AWS console) if you want to re-use nameservers across different hosted zones.
The hub and spoke model involves creating multiple Customer Gateways, each with a public IP address.
The remote network prefixes for each spoke must have unique ASNs, and the sites must not have overlapping IP ranges.
Use Lambda@Edge in conjunction with Cloudwatch to perform actions based on incoming requests. For example:
A Lambda function can inspect cookies and rewrite URLs so that users see different versions of a site for A/B testing.
CloudFront can return different objects to viewers based on the device they’re using by checking the User-Agent header, which includes information about the devices.
Generate HTTP responses when CloudFront viewer request or origin request events occur.
inspect headers or authorization tokens, and insert a header to control access to your content before CloudFront forwards the request to your origin.
Make network calls to external resources to confirm user credentials, or fetch additional content to customize a response.
Security Groups and Network Access Control Lists
In case of a DDOS, the fastest way to stop incoming requests is to delete the default NACL in a VPC.
You cannot apply Security Groups to NAT Gateways or Egress-Only Gateways (which makes sense, because you can’t set outgoing rules with Security Groups anyways!)
Know your basic service limits:
Max. 5 Internet Gateways per region
Direct Connect: 100 BGP advertised routes per route table maximum
You need to assign an Elastic IP to NAT Gateways.
Use tracepath to check MTU between two hosts. tracepath needs UDP traffic allowed in order to function.
A hosted VIF is a VIF created on your Direct Connect connection, and shared with another AWS account.
By contrast, a hosted connection is a sub-1gbps connection allocated by an AWS partner for your use.
Automatic Distribution: The default option, Redshift automatically manages your distribution strategy for you, shifting from an initial ALL strategy (for smaller tables) to EVEN distribution (for larger tables). Note: Redshift will not automatically switch back from EVEN to ALL.
Even Distribution: With the EVEN distribution, the leader node distributes rows equally across all slices. This is appropriate for tables that do not participate in joining.
Key Distribution: With the KEY distribution, rows are distributed according to a selected column. Tables that share common join keys are physically co-located for performance.
All Distribution: A copy of the entire data set is stored on each node. This slows down inserting, updating, and querying. This distribution method is only appropriate for small or rarely-updated data sets.
You need to know the DynamoDB partition sizing formula by heart: (Desired RCU/3000 RCU) + (Desired WCU/1000 RCU) = # of partitions needed
AWS Machine Learning does not support unsupervised learning - you will need Apache Spark or Spark MLLib for real-time anomaly detection.
In the context of evaluating a Redshift query plan, DS_DIST_NONE and DS_DIST_ALL_NONE are good. They indicate that no distribution was required for that step because all of the joins are co-located.
DS_DIST_INNER means that the step will probably have a relatively high cost because the inner table is being redistributed to the nodes. DS_DIST_ALL_INNER, DS_BCAST_INNER and DS_DIST_BOTH are not good. (Source)
You must disable cross-region snapshots for Redshift before changing the encryption type of the cluster.
Amazon recommends allocating three dedicated master nodes for each production ElasticSearch domain.
To make your data searchable in CloudSearch, you need to format it in JSON or XML.
You can use popular BI tools like Excel, MicroStrategy, QlikView, and Tableau with EMR to explore and visualize your data. Many of these tools require an ODBC (Open Database Connectivity) or JDBC (Java Database Connectivity) driver. (Source)
Mahout is a machine learning library with tools for clustering, classification, and several types of recommenders, including tools to calculate most-similar items or build item recommendations for users. Use it to carry out Machine Learning work on top of Hadoop.
When preparing to use a Lambda/Kinesis combination, make sure to optimize your Lambda memory and batch size, and adjust the number of shards used by the Kinesis streams.
Triggers do not exist in Redshift.
Amazon Kinesis Aggregators is a Java framework that enables the automatic creation of real-time aggregated time series data from Kinesis streams. (Source)
You can’t encrypt an existing DynamoDB table. You need to create a new, encrypted table and transfer your data over. (Source)
Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources. (Source)
I passed the exam on November 30th, 2018. Before taking this exam, I had already achieved my SA Pro. There is a fair amount of overlap between the two tests - I would highly recommend taking the SA Pro before this exam. I scored an 85% on the exam.
This exam is fairly difficult, although I would argue it’s slightly easier than the Solutions Architect Professional exam. There is a greater focus on implementation and debugging in the DevOps exam - the SA Pro exam is more about breadth of knowledge across AWS services.
Remember to solve exactly what the question is asking. If they want performance, give them read replicas, sharding, PIOPS, etc. If they want elasticity and high availability, start using Multi-AZ, cross-region replication, Auto Scaling Groups, ELB’s, etc. If they are worried about cost, remember the S3 storage classes, and use AutoScaling groups to scale down unnecessary instances.
The basic stages of a Continuous Integration pipeline are Source Control -> Build -> Staging -> Production.
ELB can be configured to publish logs at a 5 or 60 minute interval. Access logs are disabled by default.
When an instance is launched by Autoscaling, the instance can be in the pending state for up to 60 minutes by default.
Cloudwatch logs can be streamed to Kinesis, Lambda, or ElasticSearch for further processing.
The fastest deployment times in Elastic Beanstalk are achieved using the All At Once deployment method.
Generally, if you see Docker, think Elastic Beanstalk!
AWS OpsWorks Stacks agents communicate regularly. If an agent does not communicate with the service for more than approximately five minutes, AWS OpsWorks Stacks considers the instance to have failed.
You can configure an Opswork stack to accept custom cookbooks after creation. Just enable the option.
Speaking of custom cookbooks, remember that the custom cookbook option is enabled at the stack level, not the layer level.
The CodeDeploy service relies on a appspec.yml file included with your source code binaries on an EC2/On-Premises Compute Platform.
A nested stack is a stack that you create within another stack by using the AWS::CloudFormation::Stack resource.
When attempting to secure data in transit through an ELB, remember you can use either an HTTPS or SSL listener.
AWS CodeDeploy can deploy application content stored in Amazon S3 buckets, GitHub repositories, or Bitbucket repositories. Note: Subversion is not supported.
CodeDeploy offers two deployment types: In-place deployment and Blue-Green deployment.
A Dockerrun.aws.json file is an Elastic Beanstalk–specific JSON file that describes how to deploy a set of Docker containers as an Elastic Beanstalk application.
This exam is extremely tough to parse at first glance. There are lots of grammatical twists and turns that are not present in typical exams. You may need to check your proposed architecture against the question 4-5 times to ensure that you are solving the right problem.
Remember to use the NoEcho property in CloudFormation templates to prevent a describe command from listing sensitive values, such as passwords.
You really need to understand the lifecycle of an EC2 instance (pictured below) being brought into service.