How secure are your practices? Really though.
I talked to one of my friends today and we were talking about security practices. We got onto the subject of trust and access. Namely the inherent trust that some companies give to employees. There is a fine line to ride between trust in security and also intentions. The conversation stemmed down to: "How much security DO we give someone?"
That's a fantastic question. There's multiple layers to dive into on this that start at everything from the idea of bad actors, to the idea of breaches and vulnerability. There's not a great simple "this much" answer to the question, but it indeed varies. More importantly we should consider the following: "How much do they NEED?"
AWS generally goes with "as little as possible" because if someone gets something like a key and access to something, they can do nearly whatever they want. I'll give a scenario. Let's say for an example, we'll call our example "Tim". Tim is a tenured employee of the company, he's been there 10 years. Tim is trusted, Tim has demonstrated he is a valuable asset, and Tim is very careful with how he handles his access keys. One day, Tim goes to download a terraform plan, and this terraform plan is very old, the legacy days of the company when the company was using aws keys saved in their raw terraform plan. Tim forgot about this, downloads the entire repo to sort through it, realizes he downloaded a terraform module with it, and saves it into one of his projects for later.
In a huge hurry, the dev manager asks to see a demo of what he's doing, he promptly does a git add . in his core directory and adds his changes to the new repo the company is working on. This is a repo that everyone in the company has access to, and Tim has accidentally just made the AWS keys of this project, which were used in production, accessible to everyone. It takes a week or two, but someone finally notices and says something. Uh oh!
There are some issues arising to this.
#1. This has now created a security incident. These keys now need rotated.
#2. These were used on production environments, so now this incident just got bumped up in priority.
#3. The AWS logs and access logs now need to be gone through thoroughly to ensure no one used them after seeing them, and if so, could cause an even bigger breach.
If we take it one step further, what if this key did end up being used? What if someone did decide to test it out? What then?
Now, obviously Tim did not have any bad intentions. He's built his trust, demonstrated value with the company, this is an unfortunate mistake of someone who was in a hurry, missed something on some old legacy code and it ended up biting back. This isn't even an uncommon thing, in fact, I can count multiple employers during interviews who asked me: "How do you handle the backend file of terraform when using aws?" And generally my answer is "Remotely, not locally, and never with the keys being stored in the plan." To my astonishment, many of them say, "you'd be surprised with how many people don't do it that way." People talk about people doing things like storing keys in code quite often.
So let's look back on this and just assume the incident has been marked off, it's "resolved". Now what? Security practices should always include, "If we didn't catch that, how do we catch it next time? How do we prevent this? How do we harden this?"
Here's a list of things that could help this:
- Current Pattern analysis. Afraid of an AWS key being stored? Have a script that's crawling your repositories and looking for that very thing. If it finds it, your security team should be notified asap. Even if it is looking at /old/ code and not new code. Are you moving away from that legacy code? Find a way to analyze the pattern of those keys, search every existing repo and branch in your git. Pull it down, look through it, leave no stone unturned. Tim here simply downloaded an ancient branch and didn't even think about the old way those keys were stored.
- Proactive Pattern Analysis. There's a lot of really cool stuff you can do with things like gitlab. Move your pattern analysis to something active, so if something is attempted to be committed and has a specific pattern that you're looking for (ssh keys, aws keys, etc) it's not even allowed to make it there. Also make sure something notifies you.
- Use gitignore. It's there for a reason. You shouldn't ever be committing terraform states to git, and remotely storing them can vastly prevent problems. Terraform does support remote state files inherently, and in fact, is a great thing to look into if you aren't already doing it. https://www.terraform.io/docs/backends/types/remote.html
- Limit what access can do. Now in this case, the aws key in this terraform had crazy amounts of access and is fairly old, still hanging out in this old code, but had essentially admin capabilities to deploy to prod. If someone decided to become a bad actor on this part, there's no limit to the harm they could do to a production environment in AWS, short of deleting your account since it's not the root account, and that's bad. Instead, consider having build servers for your deployment that handle specific tags, and those specific tags use specific users/keys which can only deploy to those specific resources. Perhaps this means if you're building out ec2 instances, that user has access to vpc, cloudwatch, and ec2. Just the bare minimum to touch things like load balancers, allocate ips and networking, and log. If that key were to get compromised, that user won't have access to your data such as s3, or databases, or configurations of your containers, etc. Is this inconvenient? You betcha. Security is not about convenience often times, it's about limiting what can be access, restricting, and denying that access so that you are minimizing things if something were to happen. We always want to assume our fortress is impenetrable, but if someone did break down that wall, what can they take?
- Limit what your build servers can do. Unit tests are generally ran by your build servers when you check your code in. Part of these tests should be something patternized across your company that's agreed upon. Looking for things that shouldn't be there, looking for dangerous vulnerabilities. Something that is easily repeatable. On top of this helping you to secure part of that pattern analysis, but you should also make sure this build server is limited in what it can do. What happens if someone gets access to someone's laptop? All they'd have to do is use some simple commands to crawl some logs on it or crawl some configs and now they have your keys. Limiting the functionality of those build servers is also a great step in securing this a step further. This essentially means the only way someone can check the config is to log into it with admin access, and really, you shouldn't need to do otherwise except for those reasons. If it's there to build out terraform, make the build server just do that. Is it just for compiling some C code? Do that. Is it just there to run unit tests? Limit the commands to that directory.
- Limit your git repo. Here's the other big stinger here. In this particular case, anyone with access to git had access to ALL repos, because that's how the git was setup for the company. Should all engineers be able to see all repos? Probably not. Should trusted engineers have access to all repos? We trust them, but probably not. Access in something like git being tied to something like active directory can help you add people to teams, making it so those projects are only seen by those people. Should someone get compromised, you have now limited the amount of danger by a significant amount. This is a very common practice in many companies where they are using active directory to get people quick access upon being onboarded, and is a very good way of limiting access. Your biggest take from this is really stemming down to "Just how many groups do they need?" because you shouldn't be handing out the keys to the kingdom to everyone.
- Actively be searching for weird activity. Now, this might require something like a tool you may not have, or some home grown tools to help look for those. My example in this case would be looking for anomalies in AWS access. Did something from an unknown source use a key that normally wouldn't? Did we have only one server use this key and now we saw access in AWS from a separate source? Did it access a resource it normally would never touch? Example, that key that has been using the build server to make an ec2 for the last year and a half, did it try to access something weird like fargate? Was the access denied? How will you know? How do you get notified? What do you do next? Having something that can analyze these things and alert you is another step.
- Never store keys in code. This absolutely seems like common sense, but it can be very tempting to store keys in places like git when you assume no one will have access. Perhaps that repo is secured from other people in the company, but all it takes is one accidental pull to a branch and using for another project and now it isn't. On top of this, there are much better ways to handle your keys used. Instead of your backend saying what key and access to use, you can define a profile instead, and tell it to use a profile defined on a build server. You would of course, want to make that restricted too. This is, however, an inherent feature for Terraform, but it seems many choose to not use it. Maybe they don't know? Those layers of an onion can get deep. Point is, it's a lazy practice, and shouldn't be used. This goes for chef as well, virtually any automation tool generally has a way to handle these configurations either from a vault or secure location that won't require actively storing it in code.
Security is like layers of an onion. It's always inconvenient to have this many layers, but the whole idea is to keep the attacker trying to get through layer after layer after layer. If all you have is a hard shell and they get through, well, now it's down to the yolk and you've got egg on your face.
So this comes back to "How secure ARE your practices?" and I think it's something to constantly be considering. There's limitless numbers of things out there you can do, and of course, people are unlikely to think of everything since we're human. We should always take a step back though and consider how much trust we've lent out on access, how much trust we've lent out on servers, on keys, passwords, admin accounts and consider the following: "How much do we NEED and how much can we reduce?"
It's not to slight to your engineers, or IT people or others you have entrusted. After all, we all want to trust those we work with, but when shit hits the fan, do you want to be in the predicament of having to worry if you'll end up on the news? Or have the minor inconvenience of double checking, knowing the access is secure, maybe rotating something simple, and going back to your day? The most important thing to consider is always that you are a custodian of your customers data, and you have the responsibility of keeping that safe and if you put yourself in their shoes, would they say, "I feel this is safe" or would they be upset? Humans make human mistakes, even at the top level.
Consider those practices, consider looking at how things are handled, and consider more limitations if your data doesn't feel secure. There is always room for improvement.