Wishlist for Cloud Governance
Make it easier to secure multiple accounts
One of my posts on Cloud Governance.
Free Content on Jobs in Cybersecurity | Sign up for the Email List

I’ve been working on some new automation using different cloud governance models and I’m not done, but so far the implementation has been painful and time-consuming. For example, 2nd Sight Lab implemented the ability to create secure penetration test environments for each pentest or security assessment automatically. We created scripts to provide to customers for automated deployment of the roles and permissions we need for testing purposes. However, we hit a lot of speed bumps along the way.
Warning: After numerous delays this week due to these issues I’m behind on projects so this blog post is not spell-checked! I need to catch up now and still don’t have some of these things fully working below. I’ll have to get back to it later. But I hope this helps those that are creating these systems improve them.
Yesterday, I logged in to simply create an account to get an account number to send to a customer and I spent all day searching for a resolution to errors with no results. Now I’m so behind on other projects I’ll be working all weekend instead of going to a football game with my significant other. Boo.
My frustration may have been apparent as I went through the console trying to figure out and fix problems with error messages that were inaccurate or just said something like “internal error” which is not at all helpful. I have customers waiting on me and don’t like that feeling of making them wait due to things outside of my control. I just wanted to create a new account and that is what the service was supposed to do. And whatever was causing it to fail was unclear.
The problems I’ve faced so far with multi-account security controls and access in multiple clouds boils down to a few of core issues:
- The documentation is unorganized and lacking details.
- The error messages are unhelpful.
- The UI is not organized in a way that aligns with the workflow.
If those things got fixed, any customer should be able to read an error message and get a link to the steps to fix the problem. They should be able to look at a screen, see what they need at a glance, and drill down. If a process needs to be completed the UI should be walk through the steps and automate away all the caveats and gotchas as much as possible.
When I go through the documentation in the cloud providers I just tested it was very disjointed with numerous caveats, scenarios, and if-this-do-that’s. Additionally, error messages, solutions, and best practices are spread out over random pages throughout the documentation. I think some of these services are new so hopefully they will improve a lot in the near future. Here are some suggestions for anyone who wants to understand some of the issues I faced.
About Support
Cloud providers have been reaching out for feedback since I’m submitting errors in the console as I hit them and I appreciate that. I did this brain dump as I don’t have time for phone calls, support, and troubleshooting. Support often takes me longer than figuring things out myself and sometimes I feel like I’m just paying because the error messages and documentation are not clear. If they were, and the services just worked, I wouldn’t need to pay for support.
Also on AWS, you have to pay for support in every single account. The whole point of this new initiative is to create new accounts and I don’t want to pay a separate support fee for each one of them. Even if I don’t have to pay for support I’m so overloaded I don’t have time to assist in troubleshooting at the moment. I’m trying to hire an intern from a local college to help me get projects done and simply have no time.
Update: I fixed the problem creating accounts and getting them enrolled in Control Tower manually for now. Hopefully whatever is causing them not to enroll automatically through Account Factory properly will be resolved soon. I mention a billing issue I addressed below. I have no idea if that is related, but it seems to have been blocking manual access to the account. Perhaps it was also blocking programmatic access by Control Tower, Account Factory, and Service Catalog. I created an account manually in AWS Organizations, added the role, move the account to the desired OU, and reregistered the OU (with a few other fits and starts). I’m watching the progress bar which is currently at 70%. I already got an email inviting me to SSO for this account so I think it will work.
As per usual, I fixed the problem before I got a response from support about the quota, but I am going to leave that open just to make sure the limit is appropriate. Most of the time I spend a lot of time explaining things to support and I tend to send them very tricky problems (sorry) that take a long time to resolve and end up solving before they do.
Most of the time I can reverse engineer problems and solve them myself but there are a few exceptions. One time I was getting random errors on S3. I contacted support repeatedly with error messages when it happened but it was not getting resolved for weeks. Finally I started posting the error on Twitter each time it happened.
Mitch Garnaat, the original author of boto (AWS Python SDK) was kind enough to point me to a detailed debugging option. That provided some insight. Random DNS failures were causing the error. I started posting those errors on Twitter every time they happened.
Finally, Colm MacCárthaigh over at AWS took notice and asked me if I wanted to come in and speak to the AWS team and explain what I was trying to do. That was back when I was in Seattle. I said I would be happy to, but I was too busy at that moment. (Sound familiar?) Besides all I was doing was uploading a file to S3. But it was critical that it worked because it was for a white paper for a grad school project on Packet Capture on AWS. I was behind schedule due to this problem and issues with a firewall product from a vendor I worked for at the time.
By the time I would have been able to come in they resolved the problem. It’s always DNS…All the time spent to relay the problem to support, explain it, and try get a resolution that way did not prove to be beneficial.
I do find AWS to have the best cloud support team of those I have worked with. Sorry Azure, but it’s just too painful and in the past I don’t think GCP even had a support team. I see the option now but I haven’t tried it. I once joked with a coworker about calling ISPs. You should be able to take a test and if you pass you get sent to tier three support immediately. Different customers will need different levels of detail when it comes to support. Not all small businesses have simple needs or need basic answers.
AWS needs a new support option for small businesses that covers all accounts. Perhaps it should be based on number of support requests (pay per use instead of $15,000 minimum) and whether or not the request got resolved before the customer figured it out on their own. Also query is free or perhaps even pay the customer if it turns out to be a platform bug and they were the first to report it and help resolve it — like a bug bounty! How about that?
Update: And…my account has now been successfully enrolled in AWS Control Tower. Only took two days.
Error messages
I have worked on numerous e-commerce, financial, retail, investment, and banking systems over the course of my career. These systems have a lot of integration points. Mistakes, errors, and lost funds are simply not acceptable.
Error messages get handled in a very organized way (most of the time.) Each error has a number and that number corresponds to a specific message and problem. That specific error has a resolution. It is up to the developer to properly code the error to the correct problem.
For example:
1000: Account limit reached in AWS Organizations. Request a limit increase. [Link to instructions]1001: Role AccountFactoryAdmin does not exist in account. Create the AccountFacctoryAdmin IAM role with these permissions: [x, y, z]. [Link to instructions.]1002. Role AccountFactoryAdmin does not have permission to create an IAM Role. Add the IAM.CreatRole permission to the AccountFactoryAdmin Role. [Link to instructions]1003. Role AccountFactoryAdmin does not have the correct permissions in Service Catalog to create an account. [Link to instructions.]Each error message should relate to ONE fix. Not it could be this, or that, or one other thing, or maybe this…so customers don’t have to try random things to fix the problem.
Error messages should be accurate. If it says STS is turned off in a region, that should actually be the case.
Show the root cause. Just like in Java when you get a stack trace showing the root problem, integrated systems should bubble up the root problem to the end user and not report something useless like “internal error.”
The error should bubble up to the screen the user is working in. The documentation should not say, “If you’re on a screen in this service and get an error go over to that service and look at this other error event log to solve the problem.” Show the error message to the user where they are. Or at least, provide a link to the errors directly instead of requiring them to know from the documentation the error is logged somewhere else.
Error messages should be useful. At some point I got the error message in this StackExchange question. That error message is not useful. It is completely unclear what the problem is. I later found this error in the documentation. Why is this not translated to something useful? Additionally, I never changed any of these permissions in Service Catalog so I don’t know how this problem got into my account exactly. The solution in the documentation could be a bit more descriptive than, “Make sure you have permissions in Service Catalog” or something to that effect. Check out the steps in the answer from Navneet in this post which I thought were pretty good:
Telling a customer to contact support indicates an underlying problem. If your documentation says, “If that doesn’t work, contact support” every other paragraph — your system likely needs some work. Invest the time and resources to resolve the errors for the customer so they don’t have to contact support and remove that from your documentation.
Documentation
Organize the documentation in a step by step walk through to set things up. Remove the wordiness.
Put best practices in one list. Don’t scatter them through pages and pages of documentation or have a bunch of different best practice lists for one service.
Test your documentation. Have various new customers walk through it and figure out where they get stuck or confused and fix it.
Write a summary, then drill into the details. If there are two different workflows for existing and non-existing accounts, separate those into two sections with step by step instructions for each workflow.
Error messages, as mentioned, should be in one list with links to resolution for each error code.
In general, think checklists and steps.
Pre-requisites and Caveats
If you have 100,000 caveats and pre-requisites to using a service, then perhaps the service implementation is not complete. Consider taking the time to go through and address those prerequisites and caveats and fix them for customers automatically. Add a list of problems in the console and a button to fix the problem.
Don’t create a rambling list of caveats that spans multiple pages. Provide a step by step walk through that tells customers how to check each caveat in one coherent list and step by step instructions to fix each issue. Test it with new customers and iterate to improve the documentation and the service.
Separate cost management and security governance
Everyone seems to be trying to combine cost and security governance. These are two separate things and the categorization will not be a one to one mapping.
Companies use something called cost centers or accounts (as in chart of accounts, not cloud accounts) to categorize spending and budgeting. These assignments of cost centers to resources does not always map to security governance boundaries.
I think that may be why AWS Organizations and AWS Control Tower are separate services. One is trying to address costs and the other is trying to address security. But the problem is you’re trying to use OUs for both purposes and it’s creating a lot of problems. People can’t create OUs in Organizations without messing up Control Tower. Creation of an account might fail to work properly at some point in AWS Organizations and then security controls fail to apply correctly.
Separate these two things completely. You create OUs for a hierarchy of access controls, not accounting, though a cost center may apply to OUs. Allow companies to assign cost centers or accounts to OUs. Anything created in that account or OU shows up on the bill with that cost center applied. However, a company may want to override that and apply a cost center to a person and anything that person creates gets the proper cost center applied.
Perhaps a company wants to override that and apply a different cost center to a particular set of resources for an application. IT and security services may exist in every account that get billed to the IT and security departments rather than the department using the account. Allow application of a cost center to those resources.
Don’t use generic tags and make companies try to figure out how to tie that to their billing. Your customers should be able to apply a cost center to anything in your cloud platform and any costs related to that item will show up on the bill with the proper cost center applied. As I already explained, this is separate from applying security controls which may fall along different boundaries.
Then, allow the customers to automatically import those costs and cost centers into their accounting systems. Perfect! You have just saved them a ton of time. Allow them to drill down by cost center to high cost items in the console and show them who created the resource so they know who to contact if the spending is over budget. You will have fans.
I think AWS may be trying to do something like this with cost categories but I haven’t looked into that extensively. The description could be clearer to align with the above, if that is the purpose.
In GCP, I did not find clear guidance to consolidate billing accounts spread out across my organization for different GCP services like Workspaces, Voice, and different GCP accounts. I’m glad they turned off GCP by default because I think that is how I ended up with some random billing accounts charging me a couple of dollars a month each. They are hard to find in some cases and shut down. I don’t understand why, if I have an organization, it was offering a user I granted access within the organization, the option to add their credit card for a $300 free trial. Aren’t they automatically billed to my organization? I need to revisit that.
Creating and removing accounts and services
I’ve had numerous issues creating and deleting services on all three major cloud platforms. Some are annoyances and some seriously delay progress and cause problems.
On AWS one of the accounts that did not get properly enrolled in Control Tower shows up in AWS Organizations. When I logged into it to see if the organizations role is missing because I can’t access it, I’m being asked for a credit card. When I try to invite and add the account to my organization it says it is already in my organization. If that is the case, why am I being asked for a credit card because it should be billed to my organization? If it is not being billed to my organization it should not have been created.
If the above is related to a limit on accounts in organizations, none of the error messages anywhere in my account say that. Why is the account created at all and in my organization if it’s restricted by a limit? I presume that is not the case, and thought I already requested to change the default maximum last time this problem occurred, I requested a limit increase just in case.
I also noticed a message in the console saying my email account needed to be verified. However, I don’t recall seeing that message before so I didn’t know I needed to do that. I could not find a way to re-request that email so I clicked the link in the old email which had expired. I have no idea why I got that email in the first place. Of course it failed and then it gave me the option to re-request the verification email in the console.
I wonder if my previous request for a limit increase failed due to that but I have no idea because none of the messaging tells me what is going on. When creating accounts, give a user a count of how many more accounts they can create before they hit the limit, if that was the issue. (I’m still waiting for someone to respond to that request.)
I already went into a diatribe on Twitter previously explaining how Azure is like Hotel California. You can enter but never leave. I couldn’t separate and cancel certain services independently from a particular billing account. I also don’t like that Azure requires an enterprise agreement for things like custom policies and automated creation of user accounts. They also charge for MFA. I find billing accounts to be confusing on both GCP and Azure.
One other hassle with Azure is that once you create an account with a particular credit card and phone number you can’t create another one, I think. I teach cloud security and and periodically need to test new account setup. It’s very difficult to do for that reason, but I hire people to help with testing and they use their own credit cards to get around that. I don’t understand why that restriction exists. I don’t need a free trial again— I’ll pay for the services. But let me in!
Don’t charge small amounts to my credit card that result in locking out my card. This happened on AWS and it took me days to try to resolve this and get into my account. I figured out that clicking on the welcome email triggered the charge. The way in which the charges work (no CVV code) caused the bank to have to severely lower fraud controls to let the charges go through. There must be a better way. If an account is part of an organization or the card was used before no reason to double check it with another $1 hit. I see similar charges from other cloud providers used to verify accounts.
In fact, as I’m writing this I logged into my account that didn’t get properly enrolled and adding the credit card and got hit with a charge and got an alert for a $1 from my bank. I just got off the phone with the bank. At some point today I was charged a $1 amount from Amazon which went through successfully and my card is not locked. However, I can still not access the account I am trying to get into that was created from AWS Control Tower via AWS Account Factory. After the $1 charge the bank says they are seeing $0 charge attempts that are neither approved nor denied from an unknown source. Odd. Since my card is not currently locked I don’t think that is the source of the account creation problems this time around.
Update: I created a new account without AWS Account Factory or Control Tower in AWS Organizations. It seemed to work fine. When I used the switch role functionality to get into the account it asked me for a credit card. What? I went back to my management account. I went into payment methods. I clicked edit for my payment method to edit the address. I clicked save without changing anything. That triggered a $1 charge to my credit card (again). I returned to the account I created manually a bit later and now I can access the services in that account. I think there may be an issue in there somewhere but not sure what it is exactly. But I seem to have a working account to get to my customer now.
Make it easier to cancel or remove an account from an organization. If I created an account through an organization I should be able to shut it down through my organization and any remaining balance should be billed to my organization. Don’t make me log into each account separately and add a credit card.
If I accidentally had a typo in an email I should be able to cancel that account without access to that email! I still have an account with this issue I need to cancel. I went in circles with AWS Support. See this post and the comments, some of which are pretty funny, in regards to this issue:

