avatarTeri Radichel

Summarize

Troubleshooting Lambda Networking

ACM.308 Validating private network access and troubleshooting issues access from a private VPC through a NAT

Part of my series on Automating Cybersecurity Metrics. Container Security. Lambda. Network Security. Deploying a Static Website. The Code.

Free Content on Jobs in Cybersecurity | Sign up for the Email List

In the last post I got the local Lambda Runtime Interface Emulator working with a container that can run Bash Lambda functions.

In this post I want to test network access I set up earlier in this post but is, as of yet, untested.

Will it work? I don’t know. That’s why we’re going to test it. 😊 Having deployed cloud networking for the original Capital One cloud team, I can tell you that networking doesn’t always work on the first try. Either the people who provided the specifications didn’t understand the ports and protocols their application required or I made a mistake deploying it.

That’s why when I deployed the networking to production, which could only happen during certain windows, I would have the development team come to help me test it so we knew it would work when they needed it! If they made a mistake on the specifications it would go back through the security review. If I made a mistake I could fix it on the spot.

Also I created a tool to help teams figure out what networking their application required. Maybe I’ll post something like that here in a future post. For now I’m specifically going to test a container that tries to clone a GitHub repo using the container I’ve been working through building and deploying to Lambda and the networking I already deployed.

Test git access from EC2 instance

I’m running the local test on an EC2 instance that I use for development purposes. I’ll run a git clone command so you can see that the network I have set up for my EC2 instance has access to GitHub.

I just cloned that in the /tmp directory and then deleted it to show that it works.

Install git in the Docker image

Next I need to install git in the image as I did in this post, except that we are using AWS Linux so we will use yum instead of apt. We should also update the container as I did in this post.

Add the git clone command to the handler function

Next I added the git command to the handler function. Be aware that if you do not set the EVENT_DATA variable you will get all kinds of strange errors. I am cloning to the tmp directory because that’s a writable directory on a lambda function. I’m also listing the contents of the directory after I run the command so I can see if my directory is in the folder. The function responds that the repo is cloned at the end.

This is hard coded for now but you can easily imagine passing in a parameter with the repository name we want to cone, right? 😉

Rebuild the image

I build the image using the build script created in a prior post.

./build.sh

Run a local test

Next I run the localtest.sh file as explained in the last post to start the API server. That script executes the function and mimics the AWS Lambda service on your local machine via the Lambda Runtime Emulator.

./localtest

Then I put my curl command from the last post to simulate calling the Lambda function into test-function.sh.

./test-function.sh

When I execute the curl command to call the simulated Lambda function it works.

I’m curious what all that other stuff in the tmp directory is but I’ll just leave that alone for now.

What is interesting is that if I run the function again I get an error.

Why do you suppose that is? Perhaps because I already cloned the directory and the files already exist? Lambda caches certain resources between invocations.

Do we want that repository cached? No. We want to make sure we get the most up to date version of the files each time we execute the Lambda function.

How can we resolve it? Delete the directory if it already exists.

And now I can call the function multiple times.

Testing locally is pretty handy.

Test the Lambda function when deployed as a Lambda function

Next I’m going to deploy the container to Lambda and see if it works there. Once we push the image to Lambda, the container will be running inside the networking we configured it with when we deployed it.

Recall that the networking looks like this:

  • The function is associated with a security group (a group of networking rules, not a group of Lambda functions) that has access to GitHub via customer managed Prefix List.
  • A VPC Endpoint provides access to AWS CodeCommit (which keeps traffic on AWS rather than passing over the Internet and uses private IP addresses when configured correctly.)
  • The function is associated with a security group that has access to AWS Code Commit.
  • The function is in a subnet with a private route table that should have a route to an AWS NAT Gateway (a Private Subnet)
  • The NAT Gateway is in a subnet that has a route to the Internet (a Public Subnet).
  • You can’t associate a security group with a NAT Gateway.
  • The VPC has FlowLogs enabled so we can inspect the network traffic if something is blocked.

Let’s test and troubleshoot networking for this function.

Push, deploy, and test the function on Lambda

Push the container to Lambda with our push script.

./push.sh

Deploy the image.

Test the Lambda in the console as I’ve shown previously.

Our Lambda function timeout. Boo.

Did it timeout because it exceeded the time threshold for Lambda or because it could not reach GitHub due to a misconfigured network.

Taking a look at VPC Flow Logs

Let’s start with the VPC Flow Logs for the VPC. Head over to your VPCs. Click on the flow logs link for the VPC deployed in the prior post for Lambda.

The first ENI in my case had a whole bunch of traffic to and from the Internet, most of which has a status of ACCEPT, meaning the traffic is allowed. What is all that?

Look at the ports.

Right off the top I can see that’s all noisy junk traffic. I’ve written about this before. All this traffic on two high ports can just be immediately dropped in the majority of traffic and weeded out of your logs if IF IT IS REJECTED.

I also wrote about how to write one rule to weed out most of this noise on pfSense, reject, and ignore it on a home network.

Unfortunately, it’s not so simple on AWS to do that. There’s no way to specify in a Network Access Control List or a Security Group rule that you want to block any traffic to this particular Subnet that is coming from two high ports.

But I digress. Based on the fact that this traffic is coming from the Internet and accepted, I know it’s the NAT. I have nothing else in this VPC and the Lambda functions don’t have access to the Internet. I also immediately recognize things from years of experience like any IP that starts with 77 is most frequently traffic from Russia. I wrote about how to tell where the traffic for an IP address is coming from in this post:

The other thing you will notice is that one of the IPs in each case is a public IP and one is a private IP address. That’s because AWS shows you the private IP address for AWS resources. The public IP address is something connecting from the Internet.

Why is all this traffic allowed and what is it doing? Well we can’t see what it is doing without getting packets as I wrote about in this post:

But whatever it is, it is unwanted and we would rather just drop it. More on that later.

Let’s look at the NACL for the NAT.

Hmm. Could add some blocking here for the most egregious ranges bombarding our systems with noise. Hint: It’s going to be a lot of traffic from Digital Ocean, Russia, and China, though oddly I was getting a lot of traffic from Germany on my home network for a while. Monitor your logs to become familiar with who is targeting and frequently hitting your systems.

Well, that’s not our current issue. I’ll save more on that for another post.

Find the Function Elastic Network Interface (ENI)

How do we know which resource in our VPC is getting those connections in that particular log?

VPC Flow Logs are grouped by ENI.

I clicked on the first ENI in the list. You can figure out which resource in your account has that ENI. It would be possible to query this on the command line but I pretty much know this is the NAT Gateway since that was the only thing running.

Head over to the VPC dashboard.

Click NAT Gateways in the left menu.

Click on the NAT gateway ID link.

Scroll down and find the ENI. That ENI matches the ENI logs I was just reviewing so I know those logs are related to the NAT Gateway.

I want to see the logs associated with my Lambda function ENI.

Head to the EC2 dashboard. Click Network Interfaces on the left. Search for the Network Interface that has your Lambda function name in it.

Click on the link for that ENI ID.

Here you can see any security groups, subnets, and the VPC associated with your Lambda function.

You can click on any of the above to inspect the configuration of each as we’ll do below.

You can also see what the private IP address is for the Lambda function. Note the internal IP address above for the Lambda function in the above logs. Internal or private IP addresses will fall in these ranges specified in RFC 1918:

We defined the private IP range for our subnets when we created them so the IP address for our Lambda will be assigned from that range.

Get the Network Interface ID for that interface shown above.

Head back over to CloudWatch logs for our VPC as described above.

Click on the latest link for that ENI ID.

That’s a bit easier to look at. Three entries and they are all OK.

There are a lot of NODATA entries because the Lambda function was not called for quite some time.

This is the private IP for the Lambda function in the above logs: 10.20.0.38

This is a GitHub IP address: 140.82.112.3

Well, we don’t have any traffic being blocked here. Think about why for a minute. The Lambda function tried to make network requests to GitHub and the GitHub security group allowed that. Right?

But somewhere along the path through the NAT Gateway to the Internet the traffic is failing to pass. We need access through any security groups, route tables, and NACLs along the way. Think about how the traffic gets to GitHub in this case.

  • It leaves the Lambda destined for the GitHub IP address.
  • It has to pass the Security Group rules assigned to the Lambda function.
  • It has to pass the NACL rules on the subnet associated with the Lambda function.
  • A route needs to exist in the Lambda subnet to allow the traffic to reach the NAT.
  • It has to pass the NAT subnet NACL rules.
  • The NAT subnet has to have a public route to the Internet.
  • The response from GitHub needs to be allowed through the NAT Subnet NACL.
  • The GitHub response needs to pass back through the route to the private subnet. If it got out on a route, it can get back in.
  • The traffic needs to be allowed through the NACL rules associated with the Lambda Subnet.
  • The Security Group rules automatically allow any responses to valid requests because security groups are stateful (which I explained in a prior post.)

The VPC Resource Map

For route tables, AWS is now providing a diagram via their new Resource map feature. Let’s try it out.

I’ve sketched a little upside down Y where the Lambda is situated below and follow the path the traffic can take. As you can see the route tables allow the traffic to pass from the VPC in which the Lambda is in to the NAT, but it isn’t really clear from this diagram that form the NAT the traffic then flows out the public Internet Gateway route.

I just realized you can hover over the subnet where the resource exists and follow the flow that way. Cool. The only thing is it doesn’t show the traffic leaving the public route. That may be a clue.

Head back over to the NAT Gateway logs. We can look to see if any traffic from our Lambda function is blocked at that point.

Click on the link for the NAT ENI in the list of ENIs for which flow logs exist.

Search for the Lambda IP address.

Here we can see that the rules or routes associated with the NAT are rejecting the traffic from the Lambda function.

We already know that we can’t add a security group to the NAT.

We know the subnet rules are wide open.

You have to associate an Elastic IP address with a NAT, which we did.

We added an Internet gateway and there’s a route to the Internet.

But where is that route to the Internet exactly? Which route table and to what is that route table associated?

Let’s look.

In the details for the NAT gateway, click on the subnet.

Then click on Route Tables.

There’s no public route in the route table for our subnet.

That is a very common mistake when it comes to networking. At least it is one that I made early on and would take me time to resolve because I would be looking at the NACLs and security groups and wondering what is wrong — completely forgetting about the route table.

The other common thing people sometimes forget is adding the public IP address for Internet access. That threw me for a loop when I very first started using AWS. Why can’t I SSH to my VM??? Learn all the variables to configure networking properly and methodically troubleshoot errors to reduce headaches and pain. 😊

The good thing about that missing route is all that junk hitting our NAT from the outside didn’t have anywhere to go either.

Let’s deploy a route to the Internet in our NAT subnet route table.

Now I’ll tell you right about now that I’ve been rethinking the organization of all my scripts, but for the moment the network script to deploy to sandbox networking is under my network folder in the GitHub repo.

What I notice is odd in my script is that I specified that my VPC should have a public route but it doesn’t. Somewhere I have a mistake.

So here’s what I’m staring at. I know this worked before. I have a template to create a route table. It has a condition on the route table typed passed in as a parameter which in this case is “Public”:

That translates to a condition of IsPublic = true.

If that condition is true it deploys the following resources:

When I check the resources the template deployed, the ROUTE for the route table is missing:

I have a very similar block of code for a route for the NAT gateway route which seems to have worked fine for NAT route deployment.

The only difference is the dependency on the VPC Attachment, which I added due to a CloudFormation bug.

I ran Drift Detection against this stack and it says everything is in sync. I wrote about drift detection here:

Hmm. This is odd. I added the NAT route and that shouldn’t be deployed unless the condition is set, but here’s what is happening. Because I gave both routes the same logical name, the second route in the template is overriding the first route even though it’s not being deployed. Bug? Shouldn’t the second resource be ignored in this case?

Anyway, we can easily fix it. I rename the two route resources:

Now I have a route resource:

And it’s associated with the route table used by the public subnet:

Let’s take a look at the Resource Map again.

Well, you still have to make a leap here. You have to understand that the Lambda network traffic is going to reach the SandboxNAT which is in the public subnet. Then from there it can reach the IGW route:

It also looks like the colors are changing before my very eyes. I like the orange better. 😊 Does the orange somehow indicate the linkage above? Maybe. This is not exactly intuitive.

Anyway, let’s test our Lambda function again.

Yay!

It works. We can successfully traverse the NAT to get to the Internet.

In the next post, I’ll show you a trick to weed out some of that noise reflecting off the NAT. Hopefully that is all it is doing…

Also, you may be interested in how inbound Lambda networking works.

Follow for updates.

Teri Radichel | © 2nd Sight Lab 2023

The best way to support this blog is to sign up for the email list and clap for stories you like. If you are interested in IANS Decision Support services so you can schedule security consulting calls with myself and other IANS faculty, please reach out on LinkedIn via the link below. Thank you!

About Teri Radichel:
~~~~~~~~~~~~~~~~~~~~
Author: Cybersecurity for Executives in the Age of Cloud
Presentations: Presentations by Teri Radichel
Recognition: SANS Difference Makers Award, AWS Security Hero, IANS Faculty
Certifications: SANS
Education: BA Business, Master of Software Engineering, Master of Infosec
Company: Cloud Penetration Tests, Assessments, Training ~ 2nd Sight Lab
Like this story? Use the options below to help me write more!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
❤️ Clap
❤️ Referrals
❤️ Medium: Teri Radichel
❤️ Email List: Teri Radichel
❤️ Twitter: @teriradichel
❤️ Mastodon: @[email protected]
❤️ Facebook: 2nd Sight Lab
❤️ YouTube: @2ndsightlab
❤️ Buy a Book: Teri Radichel on Amazon
❤️ Request a penetration test, assessment, or training
 via LinkedIn: Teri Radichel 
❤️ Schedule a consulting call with me through IANS Research

My Cybersecurity Book: Cybersecurity for Executives in the Age of Cloud

AWS
Resource Map
Network
Troubleshoot
Lambda
Recommended from ReadMedium
avatarMunidimple Muchalli
AWS GuardDuty

AWS Guard Duty

4 min read