The Best Way to Find Secrets in Source Code

Source Code

A staggering amount of companies are completely ignorant when it comes to the size of their digital real estate. How much of it they actually have and what can of data is simply floating around it — untethered, unprotected, unstructured. All that real estate is prime attack surface, its metric bits that can be breached and compromised by hackers. In that real estate companies house all kinds of sensitive data, not only their codebases, and IPs, but PII, all valuable intel attackers are hungry for. In this article, we’re going to go deep and explore just one corner of that vast landscape and uncover the secrets your source code has hidden in its DNA and how to protect them.

What are hidden secrets in the source code?

Your code has a vast amount of secret data that can’t be picked up easily and sometimes is extremely sensitive or dangerous. In most cases it’s data that should not be publicly exposed and has to be protected at all costs — it can be access keys, API tokens, credit card numbers, SSNs, passwords, IP protected code.

There are 3 main types of secrets you need to find in your source code:

Malicious

Today, your code database is an amalgamation of different parts and projects. It’s not just one core code, created in-house, inspected by your teams, and perfected under the scrupulous glare of your – hopefully – high-quality standards. Currently, your source code is a Frankenstein-like creature that amasses all types of outside components — off-the-shelf commercialized codes as well as open source codes. You have to be aware that sometimes those applications, critical to the way your software is created, are brimming with errors. Some of them are malicious, some of them accidental, and some of them intentionally created for your good. It’s important to understand what secrets others put into your script and how they can affect you.

Easter Egg

When we normally talk about secrets most people either think of sensitive data, or something evil like a data-hovering cookie, or some malicious trojan embedded into our program. However, secrets can sometimes be fun personalized signatures the code developer sneaked into their product. For example, companies like Google, eBay, and Yahoo have spent years sneaking job ads into HTML in hopes of finding new coders. Some developers enjoy hiding obscure messages or images in their games or apps. From confused dinosaurs appearing on Adobe suites to Amazon’s mystery duck that meows, some programs are brimming easter eggs — it’s important if only for the fun of it, to detect those secrets in your source code.

Sensitive data

Finally, there’s data. Whether it’s human error, laziness, or simply the only way a developer could get the job done, sensate data will end up in your repository. Developers know, after years of training, that pushing company secrets – such as API tokens, certifications, credit card info, and authorizations key, into a codebase is bad practice — still, they do it. Sensitive data will end up in your repository for multiple reasons, it’s a fact of life.

Secrets can become exposed for a million reasons, most of the time an innocuous error, one that is already in the pipeline for a patch, can expose a secret — since the source code secret was undetected, your security team gave that error a low threat assessment. For example, a small glitch in your software can allow a hacker to breach your perimeter. Your security team evaluated that breach and determined that, if an attacker were to use it, they wouldn’t be able to gain any valuable data or leverage. They gave the threat a low-risk score. A patch would be included in future updates but there were far more important risks to take care of first. The fact your team was ignorant of was that a developer hid an SSH key in that area of the source code. Suddenly, an attacker, simply browsing your code, has access to a tool that can help them connect to your server and have a field day — mine for bitcoins, steal data, encrypt your hard drive, and initiate a ransomware attack.

How to find secrets in your code?

There are multiple ways of detecting secrets in source codes. All of them can be broken up into two main categories, manual and automatic.

Manual detection

Developers constantly commit secrets to public repositories — a survey determined that a huge amount of them, over 92% did this regularly, in most cases it made their job easier. And, in most cases, after they finished their task they would go back and scrub the sensitive data. Still, sometimes they would make mistakes — share the code upstream while it had the secret or simply forget to erase it.

This type of vulnerability is not unusual and in many cases, manual source code secret detection can be employed.

Using text search and regex: search targets with words such as “key”, “passwords’, “API’, “was”, etc.
Randomly search your text particularly long strings with larger sets of characters.
Diligently search code string.

Automatic

There are several ways to find secrets in your code — to scan your code with automatic tools and apps.

This types of tools will reveal the following:

find bugs and programming errors. Some might not detect secrets but they will tell you if there are any syntax errors in the code.
They will parse through JSON and XML data with a simple query language.
Use code reviewers — they can be used as an additional layer of security when you want to make sure that no one has tampered with your code

The importance of finding secrets in your source code

Detecting secrets in source code is essential to achieving security goals. Some secrets that need to be detected in source code are passwords, API keys, tokens, and encryption keys that are used for authentication. These secrets need to be detected so that they can be monitored for changes and removed if necessary.