Over 12 million auth secrets and keys leaked on GitHub in 2023

  • March 12, 2024
  •  
  • 11:23 AM
  •  
  • 0

GitHub

GitHub users accidentally exposed 12.8 million authentication and sensitive secrets in over 3 million public repositories during 2023, with the vast majority remaining valid after five days.

This is according to cybersecurity experts at GitGuardian, who sent out 1.8 million complimentary email alerts to those who exposed secrets, seeing only a tiny 1.8% of those contacted taking quick action to correct the error.

The exposed secrets include account passwords, API keys, TLS/SSL certificates, encryption keys, cloud service credentials, OAuth tokens, and other sensitive data that could give external actors unlimited access to various private resources and services, leading to data breaches and financial damage.

A 2023 Sophos report highlighted that compromised credentials accounted for 50% of the root cause for all attacks recorded in the first half of the year, followed by vulnerability exploitation, which was the attack method in 23% of the cases.

GitGuardian says the secret exposure on GitHub, the world's most popular code hosting and collaboration platform, has followed a negative trend since 2020.

Secrets exposed on GitHub each year
Millions of secrets exposed on GitHub each year (GitGuardian)

The "leakiest" countries for 2023 were India, the United States, Brazil, China, France, Canada, Vietnam, Indonesia, South Korea, and Germany.

In terms of which sectors leaked the most secrets, IT tops the list with the lion's share of 65.9%, followed by education with a notable 20.1%, and all others combined (science, retail, manufacturing, finance, public administration, healthcare, entertainment, transportation) accounting for 14%.

GitGuardian's generic detectors, which caught about 45% of all secrets the firm detected in 2023, are analyzed as follows.

Top 10 generic secrets
Top 10 generic secrets (GitGuardian)

The specific detectors that can identify and soft leaked secrets into more tangible categories indicate a massive exposure of Google API and Google Cloud keys, MongoDB credentials, OpenWeatherMap and Telegram bot tokens, MySQL and PostgreSQL credentials, and GitHub OAuth keys.

Top 10 valid specific secrets
Top 10 valid specific secrets (GitGuardian)

2.6% of the exposed secrets are revoked within the first hour, but a whopping 91.6% remain valid even after five days, which is when GitGuardian stops monitoring their status.

Riot Games, GitHub, OpenAI, and AWS seem to have the best response mechanisms to help detect bad commits and remediate the situation.

AI trend

Generative AI tools continued their explosive growth in 2023, also reflected in the number of relevant secrets exposed on GitHub last year.

GitGuardian saw a massive 1,212x increase in the number of OpenAI API keys leaked on GitHub compared to 2022, leaking an average of 46,441 API keys per month, achieving the highest growing data point in the report.

OpenAI is known for products like ChatGPT and DALL-E, which have widespread use beyond the tech community. Many businesses and employees enter sensitive information on ChatGPT prompts, and exposure of these keys is extremely risky.

Open-source AI models repository HuggingFace had a steep increase in leaked secrets, which is directly associated with its rising popularity among AI researchers and developers.

Monthly key leaks
Monthly key leaks (GitGuardian)

Other AI services, such as Cohere, Claude, Clarifai, Google Bard, Pinecone, and Replicate, also had secret leaks, although at a much lower level.

While those using AI services need to better secure their secrets, GitGuardian says that the technologies can also be used to detect and secure secrets.

GitGuardian says that large language models (LLMs) can detect leaked secrets quickly and with fewer false positives.

However, the massive operational scale, cost and time considerations, and identification efficiency are all limiting factors that keep such endeavors challenging, at least for now.

Last month, GitHub enabled push protection by default to prevent accidental exposure of secrets when pushing new code to the platform.