avatarTeri Radichel

Summarize

Simplifying An AWS Network Design

ACM.352 Can we also reduce costs?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

⚙️ Check out my series on Automating Cybersecurity Metrics | Code.

🔒 Related Stories: AWS Security | Network Security | Cybersecurity

💻 Free Content on Jobs in Cybersecurity | ✉️ Sign up for the Email List

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In my last post I wrote about troubleshooting installing software on private networks because it was slowing me down.

Today I was working on my container for my Lambda function to use as a trigger for AWS CodeCommit to push to an S3 bucket. I started by revamping my base container based on findings from prior posts so I can use a single container with all the base files required for my bash custom runtime. More on that in the next post.

But first, networking has really been a drag on my productivity.

The problems with my current approach and lack of automation

As I work through developing and testing my Lambda function, I realize that my networking can be simplified, or maybe in some ways more complex in the short term but reduce cost and complexity in the long run. At the same time, I want to reduce the number of endpoints I need to deploy because apparently they are going to break the bank more than I expected.

Along the way, I keep realizing I was missing some VPC Endpoints. Or one service doesn’t work so I have to tear down that endpoint and deploy a different one. Every new service I use requires another VPC endpoint. Sometimes I’m not sure if I’m supposed to be using one endpoint or another — like the git VPC Endpoint for AWS CodeCommit or the AWS CodeCommit endpoint. I have to go back and re-read the info on that.

I’m also developing in two separate networks — one with a Gateway for remote access and one without. So I test something in one network and it works. Then I have to go back and replicate what I’ve set up in the other network.

Because I keep thinking “Oh, I just need one more endpoint. I’ll just manually deploy it for this test, quickly finish, and come back around and automate it later”, I keep making mistakes — or I find that I need yet another endpoint. Or yet another IP address in my NACLs. Or I need another security group or I misapplied one somewhere.

Automating all of that will help, but additionally I have been thinking about ways I can simplify my approach. I have some things that are redundant and can be reduced (the principle of abstraction again to make life easier.)

At the same time, there are reasons why certain things can’t be simpler — routing, costs, and trust boundaries.

I have plans to make some changes that are on my mind and am going to write about it since that’s what I’ve been dealing with much of the past few days.

Here are a few things I’ve been thinking about.

Development and Deployment Environments

I’ve been thinking about standard deployment and development environments for different types of resources in different stages of development. Every environment needs the tools to automate and deploy things and separate accounts where things get deployed, but different environments will deploy and have access to different resources. I’ve been noodling over this for a while. More to come.

One Security Group to Access All VPC Endpoints per Environment

In each environment, the principles in that environment may have access to different services. But I’m thinking for a particular type of principal there only needs to be one security group to access all the endpoints that principal needs to access instead of creating a security group for each endpoint. Besides simplifying things, we’d end up hitting the security group quota the other way around.

One Security Group to Access Any AWS VPC Endpoint

All the VPC endpoints essentially access 443 outbound. We can limit that outbound access to Amazon IPs but there’s not an easy way to track all the IPs associated with each service.

AWS has a JSON IP range list you can download, but it’s not complete enough to use for firewall rules. In many cases the IPs for “EC2” are used by many other services.

Although AWS has some prefix lists for service IP addresses, the list of supported services is so small right now it’s not worth bothering with this:

I wish they would create prefix lists for every AWS Service. #awswishlist

Our best bet is to create a single security group that allows traffic in from the security group we created to access all endpoints. Then we allow traffic out to port 443.

VCP Endpoint Policies

Although I prefer networking for this purpose due to potential DNS poisoning, we can be explicit about which domain names can be accessed via that endpoint using the concepts in this blog post I found.

“Bob” provides the domain names to use for resources for Amazon Linux 1:

"arn:aws:s3:::packages.region.amazonaws.com/*",
"arn:aws:s3:::repo.region.amazonaws.com/*"

And Amazon Linux :

"arn:aws:s3:::amazonlinux.region.amazonaws.com/*",
"arn:aws:s3:::amazonlinux-2-repos-region/*"

We’ll need to figure out what to use for Amazon 2023 and anything else we need to access on S3.

Unfortunately this traffic may or may not be private and we may have issues with NACLs as I’ve demonstrated in prior posts. I need to explore that a bit more. When it’s not private, we end up with NACL limitations and blocked traffic as a result so I really wish and hope AWS would make these all resolve to private IPs when configured to do so.

Shared VPC

I mulled over the option of creating a shared VPC. Right now I am deploying my Lambda function in one VPC and I have my workstation in another. What are the pros and cons of a Shared VPC?

Well I thought the shared VPC could share the VPC Endpoints but the endpoints are in the subnets, not the VPC itself. So “sharing” a VPC doesn’t really help us. Also, a shared VPC is really for resources that are deployed across accounts.

One reason for a shared VPC might be that you want to manage your networking and networking costs in one account and then share that networking to another account. I might explore that later. But a shared VPC doesn’t really solve the problems I am trying to solve right now.

A single VPC, but separate subnets and route tables

If I needed a single VPC in my account for the resources that are all using VPC endpoints I could just put them all in one VPC. But the reason I don’t put everything in the same subnet is that I have different tables for my different resources. The only reason I have two VPCs currently is because I set up a VPC manually for testing and then I deployed a VPC with my automation code. In the long run it will likely be one VPC.

However, I will have a separate subnet for developer workstations which need either public IP access or access over a VPN. So I need either an internet gateway or a VPN gateway and a route for that in the route table of the subnet the developers use to access their virtual machines (workstations in the cloud).

On the other hand, I’m running Lambda functions that only get completely private access with outbound access via a NAT. That route table has no gateway and no public access directly whatsoever. The traffic gets directed to the NAT subnet for outbound access. I did not find the so-called “no NAT” replacements to have the same level of network security. I believe I covered that in another post but I forget which one.

A bastion host in the developer network and workstations in the Lambda network

OK one way to solve this would be to make developer machines a true bastion host. The developers log into those machines but they can’t access any AWS services on that machine. There are no VPC endpoints in that network. The VPC endpoints are all in the Lambda private network.

So the developers have to first log into their machines in the subnet with the Internet Gateway or VPC Gateway. From there they log into the truly private network that can access all the AWS services.

Problem solved. One set of VPC Endpoints for all AWS Services in that environment.

Good luck with that.

Developers cry (or worse) when they have to use bastion hosts. I know because I was responsible for setting them up at Capital One, and although I think they are a good thing, developers get really annoyed with them — and even more annoyed when they don’t work.

Now I am not one to say we should just do whatever developers want to keep them happy. If you are in production, perhaps that is how you access your production resources. But I also understand the developer point of view and there are other options for large companies who can afford them.

What we cannot do — transitive routing

One thing you can’t do on AWS is called transitive routing. You can connect two VPC via peering to allow private IP addresses from one VPC to access another. Let’s say you have an EC2 instance in VPC A and a VPC instance in VPC B. You peer the VPCs and then the two EC2 instances can communicate with each other on a private IP address.

Great!

But then you try to send traffic from VPC A to VPC B to another VPC C that is peered to VPC B. No can do. That’s called transitive routing. You can only directly connect endpoints from VPC A to VPC B.

Let’s say VPC B has an Internet Gateway but VPC A does not. You also cannot attempt to reach the Internet over the peering connection. That would also be transitive routing.

You can essentially think of transitive routing as any time you would need to traverse multiple route tables to connect two endpoints or resources. That is not allowed. One route table per connection on AWS. This helps prevent inadvertently opening up an unintended route for traffic.

A proxy solution

Speaking of unintended routes for traffic, to get around this routing problem you might think about setting up a proxy. That was another one of my assignments at Capital One. I had to deploy a proxy to bypass this transitive routing restriction (not my decision!) I was just following orders at the time.

I was handed an AWS Blog post with a squid implementation and deployed it according to the example as I was given no further instruction. I was not familiar with Squid proxy or as well versed in networking (or penetration testing) at the time and was not told to configure it any differently than what was handed to me. I did as I was told.

The proxy solution allowed traffic to both reach the Internet from a secondary VPC and to reach one VPC from another. Here are a few of the problems that resulted from that design.

We had one QA person for the entire cloud engineering team. (You need more than that! Poor guy.) So one day I see this traffic coming from his machine and hitting something it shouldn’t in a particular VPC. It was probably something I was working on like helping setup a Safenet HSM, Active Directory servers, or the bastion hosts. Whatever it was, it wasn’t supposed to be happening.

Let’s say the guy’s name is Joe. I’m like, “Hey Joe, why are you sending traffic to XYZ thing you’re not supposed to be accessing?

And Joe is like, “I’m not! What?”

So I show Joe the traffic and we figure out that he’s trying to hit some resource in the account where he’s testing, but due to the routing and the proxy that allows paths that wouldn’t normally be there, the traffic is flowing over the proxy to some host in another account that the Joe had no intention of sending traffic to at all.

That’s the problem with setting up a proxy. You might have unintended byproducts depending on your firewall rules and your routing if you don’t know exactly what you are doing. Even if you do — one mistake is all it takes to have a bad day.

The other problem was that the proxy was set up with Internet access. One day the person responsible for security engineering of the cloud environment who told me to install the proxy and provided the link contacts me. He asks, “What’s all this traffic from China traversing the proxy?” I’m like, I have no idea…I just followed the instructions.

Yeah. An open source proxy with no firewall for a bank might not be the best idea. I begged him to just take it down and someone else he was working with agreed and sent a picture of a dead squid. Phew.

You can make a proxy work and it definitely has a purpose. Especially when penetration testing …mwa ha ha…. but be careful. When you overlay the AWS networking with your own devices you may lose some of the protections and functionality the cloud platform has to offer. There are reasons why some companies use third-party tools just really understand how they work and the pros and cons of each solution.

A Transit VPC

Years ago I proposed a solution about a single VPC for capturing network traffic as it bypasses trust boundaries. I talked to various people at AWS and one of their key networking people said he wished a vendor would implement such a solution. I probably made some mistakes back then and a lot has changed but essentially what I was proposing is what is an AWS Transit VPC. I gave this talk at my meetup and then at the AWS Community Day in San Francisco.

This was after proving to people that you could, in fact, capture packets on AWS when most security professionals thought it was impossible.

There’s a whole white paper on the topic and a presentation, but some of this is a moot point now, because AWS finally released a solution for packet capture called Traffic Mirroring. I wrote about the pros and cons of this solution.

At any rate, this is all related to passing data that needs to traverse certain network boundaries through a single VPC. Doing so might help you reduce the number of network appliances you need to make that happen and the number of resources you need to deploy to monitor traffic like Zeek or Snort or expensive firewall solutions.

The same concept applies to VPC Endpoints. There’s a blog post on AWS explaining how to centralize your VPC endpoints with a transit gateway.

I love the idea in concept. I started to work on it but I’m always doing too many things at once so I didn’t finish it immediately. I deployed a transit gateway and started setting up the routes and so on. But then I dropped it and about half a month later I got a budget alert. My costs had risen +2600%. What?

I didn’t even think I was sending traffic over the transit gateway, so that was apparently all monthly fees. But I didn’t have time to review the traffic in detail. I immediately tore it all down until I have time to focus on it, but it seems like a very expensive solution that costs more than the duplicated endpoints, in the end.

Feature Requests

At this point, I have a few feature requests which I’ve already mentioned in other posts:

  • Create the ability to deploy one VPC Endpoint for all services and restrict the services with a policy as an option to individual endpoints. Because that seems to be essentially what I am doing.
  • Create prefix lists for every AWS service so it’s easy to allow and disallow services in a security group.
  • Make sure all traffic to VPC endpoints are in the private IP ranges of the subnet. Otherwise there are too many IPs to create appropriate NACL rules.
  • Make the costs more scalable, so these services can be used by very small as well as very large businesses.

What to do?

For now, here are some options I’m considering:

  • I can review the transit gateway again to see if I did anything wrong or there’s some way to minimize costs.
  • I can stick with multiple endpoints.
  • Potentially I can remove and redeploy endpoints where I deploy things infrequently.
  • I could use a shared environment for deployments to different accounts but then I risk one environment becoming compromised and infecting resources in all my accounts.
  • I can use the approach of a bastion host to log into a host in the private network.
  • For my use case I could have one prod and one dev environment. A smaller company might have a shared VPC for QA and segregate via security groups in separate accounts.
  • I could share a VPC with two accounts. Each account would have a private VPC with private endpoints but they would send traffic to a shared VPC with a NAT in the subnet.

If I had a large company with more stringent security and reliability requirements I would definitely deploy resources across two regions with two separate NATS to withstand outages. I would create separate VPCs and subnets for each different environment and use a transit gateway with a transit VPC to centralize the boundary where traffic crosses from private networks to other networks and monitor traffic closely in that centralized VPC.

But for my personal use case, I’m looking at alternatives to keep costs down and scale up when required for larger projects (like when I have multiple people working on penetration tests and I spin up a new penetration test for each account.

I like the idea of segregating resources to a central account like IAM, KMS, and Networking as I wrote about in my organizational architecture, but the problem with that is currently the cost of the private network with all the peering, transit gateway, and VPC Endpoints. So I’m thinking on that a bit now. I’ll be revisiting and revamping things in the future accordingly.

OK back to troubleshooting whatever is blocking me on the network at the moment and hopefully I can provide the information about the base container tomorrow for all my Lambda containers going forward (with future modifications, I’m sure.)

Follow for updates.

Teri Radichel | © 2nd Sight Lab 2023

About Teri Radichel:
~~~~~~~~~~~~~~~~~~~~
⭐️ Author: Cybersecurity Books
⭐️ Presentations: Presentations by Teri Radichel
⭐️ Recognition: SANS Award, AWS Security Hero, IANS Faculty
⭐️ Certifications: SANS ~ GSE 240
⭐️ Education: BA Business, Master of Software Engineering, Master of Infosec
⭐️ Company: Penetration Tests, Assessments, Phone Consulting ~ 2nd Sight Lab
Need Help With Cybersecurity, Cloud, or Application Security?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
🔒 Request a penetration test or security assessment
🔒 Schedule a consulting call
🔒 Cybersecurity Speaker for Presentation
Follow for more stories like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
❤️ Sign Up my Medium Email List
❤️ Twitter: @teriradichel
❤️ LinkedIn: https://www.linkedin.com/in/teriradichel
❤️ Mastodon: @teriradichel@infosec.exchange
❤️ Facebook: 2nd Sight Lab
❤️ YouTube: @2ndsightlab
AWS
Network
Security
Private
Cost
Recommended from ReadMedium