The Data Behind Data Breaches: What 7 Charts Tell Us

In light of last month’s Facebook data breach, which affected over 50M users, and recent reports on the Google+ security breach, which resulted in the shutdown of the service, the critical issue of security facing tech companies and the digital citizenry today has been front and center.

Security hacks are nothing new. Many of the world’s largest companies have seen these before.  Ebay with 145M records hacked in 2014. Target with 70M records in 2013, Sony Playstation with 77M in 2010. The list goes on and on.

The nature of data breaches has expanded in the digital era. Serious data breaches occur when confidential or sensitive information involving anything from personally identifiable information (PII- credit card numbers, names, address) to protected health information (PHI - patient name, DoB, SSN, healthcare treatments) to even IP and trade secrets are exposed in an unauthorized manner.

Given how much of our world has been digitized, it’s no surprise the impact of cybersecurity threats seems to be growing. In 2015, President Obama said that cybersecurity is one of the biggest challenges to the future growth of the US economy.

Which makes sense. There’s a lot at stake when it comes to a data breach, both to businesses and individuals. Besides reputational damage and bad press, companies who are the subject of such attacks risk devaluation, wiping off millions of dollars in stock value overnight, while facing steep correction costs. Individuals face the threat of financial and personal damage when personal information is disclosed to untrusted third parties, such as the Anthem breach in 2015, which resulted in over 80M+ patient records containing SSNs, medical records and their street addresses being sold on the black market by hackers for a profit.

With October being cybersecurity month, we wanted to dig into the data to see what was really going when it comes to data breaches. Here’s what we found.

The volume of global data breaches is increasing over time

Total number of records stolen by year.
Source: informationisbeautiful.net: https://docs.google.com/spreadsheets/d/1Je-YUdnhjQJO_13r8iTeRxpU2pBKuV6RVRHoYCgiMfg/edit#gid=322165570
(A public dataset on the world’s largest data breaches)

Frequency of big breaches and overall intensity is on the rise

Total data sensitivity, total county entity by year.
Source: informationisbeautiful.net: https://docs.google.com/spreadsheets/d/1Je-YUdnhjQJO_13r8iTeRxpU2pBKuV6RVRHoYCgiMfg/edit#gid=322165570 (A public dataset on the world’s largest data breaches)

Active hackers are the leading cause of stolen information

Total number of records stolen by method of leak.
Source: informationisbeautiful.net: https://docs.google.com/spreadsheets/d/1Je-YUdnhjQJO_13r8iTeRxpU2pBKuV6RVRHoYCgiMfg/edit#gid=322165570 (A public dataset on the world’s largest data breaches)

After loading a public dataset into ThoughtSpot, I simply searched to see if the number of breaches were growing, as well as changes in the intensity of the breaches. My hunch was correct - both the number and intensity of breaches are growing.

That led me to my next question - what was causing this increase? One quick search later, and I had my answer. Active hackers are contributing more than 51% of all stolen records, yet breaches also occur due to a variety of less sophisticated hacking methods, such as weak passwords, lost or stolen laptops, and poorly designed systems. For example, companies that don’t follow PCI compliance may be exposing their users payment information by maintaining the card numbers, pin or authentication information in-house when they don’t need to do so. Additionally enterprises may accidentally expose various sensitive information on the internet by misconfigured access controls or flawed password management mechanisms.

More data breaches happen on the internet, but government breaches are the most sensitive

Total data sensitivity, total records lost by organization.
Source: informationisbeautiful.net: https://docs.google.com/spreadsheets/d/1Je-YUdnhjQJO_13r8iTeRxpU2pBKuV6RVRHoYCgiMfg/edit#gid=322165570 (A public dataset on the world’s largest data breaches)

After determining how the records were being stolen, I wanted to know where these records were originating. I quickly searched for the answer, and found the maximum number of data breaches have happened over the web, followed by government and financial services. That said, government data breaches have seen the highest sensitivity, followed by financial and healthcare.

Active hackers attacked mostly the web & financial services

Total data sensitivity by total NO of records stolen and method of leak.
Source: informationisbeautiful.net: https://docs.google.com/spreadsheets/d/1Je-YUdnhjQJO_13r8iTeRxpU2pBKuV6RVRHoYCgiMfg/edit#gid=322165570 (A public dataset on the world’s largest data breaches)

Looking at my treemap for intensity and total number of records stolen, I wanted to know whether or not the source of hacks was consistent in every sector. One more question in ThoughtSpot, and I instantly had my answer. The majority of the breaches on the web and financial services were due to active hackers, whereas when looking at global government hacks, these can be attributed mainly to poor security or accidental publishing.

The cost of defending against cybersecurity attacks is rising

Total cost by year.
Source: Statista (https://www.statista.com/statistics/615450/cybersecurity-spending-in-the-us/)

IoT presents a whole new set of challenges

Even as breaches become more common and more intense, our world is becoming more connected.

Total IOT devices, total spending by year.
 Source (Gartner: https://www.gartner.com/en/newsroom/press-releases/2017-02-07-gartner-says-8-billion-connected-things-will-be-in-use-in-2017-up-31-percent-from-2016)

I quickly analyzed some data from Gartner, and found IoT devices are expected to grow globally from 6B in 2016 to around 20B in 2020. Likewise, IoT spending is expected to climb up to 3 trillion USD by 2020.

As IoT devices proliferate, many of which have little to no security and use default passwords or unauthenticated communication, personal data could be more vulnerable to attack than ever before.

The Assumption of Breach

It’s clear we’re moving closer to a world with the “assumption of breach”, a term first coined a decade ago, is the new normal. Organizations need to take the necessary measures to prioritize and systemize security policy into their everyday workflow and fabric, while adapting to the new, and increasingly digitized world.

It’s clear from our research enterprises must consider and prevent multiple sources of breaches, and not just active hackers. This means enterprises need to deploy employee training best practices around strong password protections and anti-phishing in addition to their traditional cybersecurity efforts.

It’s also clear we live in an increasingly connected world, underpinned by cloud technologies. One of the reasons why companies have traditionally been shy of the cloud is concerns around security and sensitivity of data. The assumption was that in SaaS models, enterprise security is shared and can be compromised in more ways and that these are not designed for actual use cases in multi-tenant architectures, but instead built off a one-mould-fits-all style solution. Serverless apps expose new surfaces and avenues for attacks and do not absolve the original developer of the responsibility of security. While in reality, data residing on premise may not be all that secure either, cloud adoption fuelled by resource optimization and lower total costs of ownership is only slated to grow in the future, which will require a new breed of cloud security architectures.

At the system design level, identifying and solving for API vulnerabilities which may provide open and unprotected interfaces for injection or transaction replays is imperative. Steps such as multi-factor authentication, short-lived access tokens and rate limiting policies for public APIs are becoming more commonplace and pervasive. From a devops and release management perspective, security is already becoming native and central to the CI/CD model of rapid and continuous deployment through tools like Jenkins, which make it part of regular quality assurance testing.

Going with the ‘assumption of breach’ mindset, companies will need to design policy in a way such that when the attack does occur it can be contained and does not spread to other parts of the system. Here they will need to define security in the context of application workloads and not just underlying network infrastructure, enforce individual workload isolation and give granular control to admins at the level of individual workloads.

Big Data Can Help

Advanced analytics can continuously monitor systems, deriving patterns and trends from large swaths of data to detect irregularities in a network. Automated tools which harness the power of AI and ML to serve the data easily can be used by experts to analyze, categorize, and handle the risks appropriately.

There’s no denying data breaches and cybersecurity incidents will be part of our future. While there’s no perfect solution to this problem today, companies and individuals can take a variety of steps to protect themselves against this growing threat and turn the tides in the war against cybercrime.