South Africans exposed; 30 million unique records leaked

A data leak exposed more than 30 million unique personal records of South Africans following an alleged breach that took place around May 2017, according to an investigation by security researcher Troy Hunt.

Hunt received a 27GB file that, he believes, “is definitely floating around between traders.” Based on the headers published on Pastebin, the leaked data includes unique ID numbers, marital status, employment history, property ownership, income and company dictatorships, and other information from as early as the 90s. Some of the data belongs to people who are now deceased.

A further investigation was carried out by iAfrikan, who found that the database was publicly available on the internet, even after the leak was detected. iAfrikan looked into the possible companies that may have been breached.

An investigation of South Africa’s largest credit bureau, TransUnion, led to data aggregator Dracore Data Sciences, a client of TransUnion. The search went deeper into their GoVault platform, whose domain was registered to Jigsaw Holdings (Pty) Ltd, affiliated to the Dracore business. The company may have been breached or leaked the data without knowing, the article suggests.

In an interview with Troy Hunt, the researcher says Dracore Data Sciences may have collected a lot of data without user consent, which they published to an unsecured web server.

“This, however, does not necessarily mean they were responsible for the site where the leaked records were found,” concludes iAfrikan.

AWS Cloud: Proactive Security & Forensic Readiness five-part best practice

In a time where cyber-attacks are on the rise in magnitude and frequency, being prepared during a security incident is paramount. This is especially crucial for organisations adopting the cloud for storing confidential or sensitive information.

This blog is an introduction to a five-part blog series that provides a checklist for proactive security and forensic readiness in the AWS cloud environment.

Cyber-attack via third party services

A number of noteworthy information security incidents and data breaches have come to light recently that involve major organisations being targeted via third-party services or vendors. Such incidents are facilitated in many ways, such as a weakness or misconfiguration in the third-party service, or more commonly, a failure to implement or enable existing security features.

For example, it has been reported that several data breach incidents in 2017 occurred as a result of an Amazon S3 misconfiguration. Additionally,  the recent data breach incident at Deloitte appears to have been caused by the company’s failure to enable two-factor authentication to protect a critical administrator account in its Azure-hosted email system.

Security responsibility

Many of our own customers at BH Consulting have embraced the use of cloud, particularly Amazon Web Services (AWS). It is estimated that the worldwide cloud IT infrastructure revenue has almost tripled in the last four years. The company remains the dominant market leader, with an end-of-2016 revenue run rate of more than $14 billion.  It owes its popularity to its customer focus, rich set of functionalities, pace of innovation, partner and customer ecosystem as well as implementation of secure and compliant solutions.

AWS provides a wealth of material and various specialist partners to help customers enhance security in their AWS environment. A significant part of these resources is a shared responsibility model for customers, to better understand their security responsibilities based on the service model being used (infrastructure-as-a-service, platform-as-a-service or software-as-a-service).

Figure 1: AWS Shared Responsibility Model

When adopting third-party services, such as AWS, it is important that customers understand their responsibility for protecting data and resources that they are entrusting to these third parties.

Security features

Numerous security measures are provided by AWS, however, awareness of relevant security features and appropriate configuration, are key to taking full advantage of these measures. There may be certain useful and powerful features that a customer may be unaware of.  It is the responsibility of the customer to identify all the potential features so as to determine how best to leverage each one, if at all.

Five-part best practice checklist

The blog series will offer the following five-part best practice checklists, for proactive security and forensic readiness in AWS Cloud.

  1. Identity and Access Management in AWS
  2. Infrastructure Level Protection in AWS
  3. Data Protection in AWS
  4. Detective Controls in AWS
  5. Incident Response in AWS

Part 1 – Identity and Access Management in AWS: best-practice checklist, will be available soon!

Gordon Smith and Valerie Lyons contributed to research and editing for this post.

419 Scammers Offer $60M in Exchange for Adopting Their Teenage Son

419 scammers are tempting unsuspecting users with a fake offer of $60 million in exchange for adopting their teenage son.The scam begins when a user receives a Twitter DM from the account of someone who appears to serve in the armed forces. Such unexpected correspondence could (and should) strike the recipient as odd. But the United States, the Caribbean Islands, and several other regions are still recovering from historic natural disasters. Given this tumult, the user could make an exception and decide to contact the fake account’s email address provided in their DM.

A screenshot of an attack DM sent by the 419 scammers. (Source: Malwarebytes)A few days later, the user can expect to receive an email message from the scammers with a most unusual request. As quoted by Malwarebytes lead malware intelligence analyst Christopher Boyd:“Welcome my dear, I received your letter and well understood by me, Due to my present condition i am not available to care for my Son, and i don’t want him to grow up in my family home, Now am facing medical treatments which i never know if i will get feet from it, I want you to take good care of my Son , in this case i directed you to receive the sum of $60 Million usd from Africa development bank of Togo, so that as soon as the funds entered into your account my Son will join you. 13 years old boy. dearest I want you to keep this within you to protect the project.“I will give you full contact information of the bank where the funds deposited so that you will contact them and have to transfer the funds to your account.“Provide me your personal details address and i code of your id card, as i received it i will forward it to the bank and instruct to conduct the funds to your account.”We know from past experience that scammers oftentimes stoop low in an effort to trick unsuspecting users. But using the adoption of a young teenager as bait? That’s a whole new level of reprehensibility.In this particular case, the scammers want to steal a user’s personal information, address, and copy of their ID card so that they can try to steal access to the victim’s bank accounts.Users can prevent these instances of unauthorized access by familiarizing themselves with the most common types of Twitter scams. With that new awareness, they will know to exercise caution around correspondence sent from unfamiliar accounts and research such users before they decide to reach out to them outside of the social media platform. They will also know to never provide personal information to anyone whom they don’t know.

Questions about the Massive South African “Master Deeds” Data Breach Answered

Presently sponsored by: Build your own mock malware and test your stack. Stackhackr will tell you if your company is vulnerable. Built by Barkly.

This week, I started looking into a large database backup file which turned out to contain the personal data of a significant portion of the South African population. It’s an explosive situation with potentially severe ramifications and I’ve been bombarded by questions about it over the last 48 hours. This post explains everything I know.

Who Am I and Why Do I Have This Data?

Some background context is important as I appreciate there’s a lot of folks out there who haven’t heard of me or what I do before. I’m an independent Australian (I have a Microsoft Regional Director title but RDs don’t actually work for Microsoft) and I specialise in security training folks who build online systems. For the last 4 years, I’ve also run a free service called Have I Been Pwned (HIBP) which aggregates data breaches and presently contains about 4.8 billion records from these incidents. In simple terms, this means that when there’s a hack of a service like Dropbox, LinkedIn or MySpace and the data is published online (as each of those was last year), supporters of HIBP frequently send that data to me so that I help people impacted by the incident learn of their exposure. People either search by email address on the website or I automatically notify subscribers. About 1.7 million people presently subscribe to those notifications and I’ve had up to 3 million people visit the site in a single day after a major data breach.

On March 14 this year, someone sent me a 27GB file called “masterdeeds.sql” which was a MySQL database backup file. There was nothing immediately remarkable about it; there was no clear indication of a source (many similar examples include the source website in the file name) and there were “only” 2.2 million email addresses in the file (I was dealing with breaches containing tens or even hundreds of millions of records at the time). It went into an archive folder with literally hundreds of other similar files which, time permitting, I’d come back to and review later.

Fast forward to this month and I’m running out of space on the disk holding the breaches I’m yet to process. I start working through the largest incidents first; one of those is Victory Phones which has since made headlines due to it containing Republican donor records. Another is the masterdeeds.sql file which I begin loading into a local database on my laptop for further analysis. The import runs for several days until eventually last Sunday, I had to get on a plane to head interstate and run some training which meant turning off the machine and ceasing the process. It stopped after importing 31,631,992 records. (You’ll read later how the complete size is significantly larger than this.)

Tuesday my time, I had the afternoon free so I sat in my hotel room and started looking closer at the data. It was clear there were a lot of South Africa references in there but just by looking at the data, I still couldn’t work out the origin so I tweeted out for some help:

I followed up by sharing the script that creates the database table in the hope that someone would recognise the field names:

Multiple South African Twitter followers then chimed in with thoughts on the origin. Several of them also got in touch with me privately and shared personal information about themselves so that I could verify the accuracy of the data. Searching through the incompletely imported database, I didn’t find everyone who contacted me but for those who did, the data was always accurate. Realising that the government issued ID’s were also present, I began searching the 27GB file directly for the ID rather than the partially incomplete database. Every search for every person that sent me their number returned a hit.

During this process, I learned that these government issued IDs contain both the owner’s date of birth and gender which is usually considered very personal data. This resource on decoding your South African ID number explains it quite clearly:

Questions about the Massive South African

I also learned that like social security numbers in the US, the IDs are frequently used for identity verification and should be considered secret. Disclosure en mass like this could have serious ramifications for all sorts of situations where folks in South Africa are required to prove their identity, primarily because it’s enormously useful information for people wishing to impersonate others.

Attributing the Source

The morning after my original tweets seeking support, I had a number of emails from Tefo Mohapi of iAfrikan. Tefo had done some great investigative work in an attempt to track down the source of the data which he later covered in two stories. The first was South Africa’s Largest Ever Data Breach in which he identified a company named Dracore as a possible source. The Dracore website explains how they offer “data enrichment” services which includes the following:

Our data services are designed to help you access top quality, reliable tracing data – fast. Our database is continually updated 24 hours a day, 7 days a week, 365 days a year.

Dracore themselves then refer to this data as “a goldmine of information”:

Questions about the Massive South African

Which is all beginning to sound analogous to the Master Deeds data we were dealing with. Tefo made multiple attempts to reach out to them which resulted in the following response:

Escalating This Matter To Our Legal Counsel

Now I want to make something clear here: the resulting investigation indicated that whilst the data may have been originally “enriched” by Dracore, another party was subsequently responsible for the leak. However, there is only one acceptable response Dracore could have given at this point and it’s “let us do everything we can to get to the bottom of this as a matter of priority”. I’m enormously disappointed to see a response like this which puts self-interest in front of the privacy of tens of millions of South Africans.

Shortly after the original piece, Tefo followed up with a story titled Is Dracore Data Sciences Responsible For South Africa’s Largest Ever Data Leak? In that piece, he said the following:

Dracore is also known for having a number of clients in the real estate business. This, however, does not necessarily mean they were responsible for the site where the leaked records were found.

Again, I want to be clear about this: whilst it appears the original source of the data was Dracore, it’s always been entirely possible that a customer of theirs was responsible for disclosing it. In that post, Tefo identified that customer as Jigsaw Holdings. It’s best you read his original article to understand how he joined those dots, I’d prefer to focus purely on the data exposure here.

In fairness to Dracore, I’d also like to share a link to their response.

Where the Data Was Located and When It Was Removed?

During his investigation, Tefo was contacted by an individual going by the name of Flash Gordon on Twitter. It turns out it was this person who originally located the data and I was able to date when I received it by looking back at my DMs with him or her. “Flash” was also able to advise that alarmingly, the data was still publicly exposed 7 months on from when they’d originally located it. Let me talk about that in more detail.

Flash had found the entire 27GB file sitting on a publicly facing web server. It had literally been published there and then the server configured to allow directory browsing. What this meant is that anyone with a web browser could go to that address and see all the files hosted on the site. The Master Deeds file had a “Last modified” date of 8 April 2015; it could have been exposed since that date.

This is really alarming because it means at the absolute least, the data was left open to the public for 7 months. At worst, it was 2.5 years if we go all the way back the “Last modified” date in early 2015. In fact, it could have been exposed for even longer because that’s just the date it was last changed, not when it was created and not when it was necessarily placed on that server.

Tefo did his utmost yesterday to get the data taken offline and eventually, I got confirmation at 10:30 Wednesday morning South Africa time that it was down.

Who Else Has the Data?

I have absolutely no idea how far this has spread. What I can say with confidence though is that people are constantly scanning the web looking for precisely this sort of data. I’ve been involved with a bunch of similar cases in the past including the Red Cross Blood Service. In fact, I presented at the AusCERT conference earlier this year and shared part of the conversation I had with the individual who found that data (not Flash):

Questions about the Massive South African

“Just scanning IPs” – it’s frequently highly-automated and indiscriminate. It was the same story with Michael Page and the Indian pathology lab to name just a couple of others. These were discovered by individuals simply browsing the web via automated tools.

The logs of the server involved may reveal how many times the data has been requested. That is if they exist and if they go back far enough and even then, at the very least they’ll show that unauthorised parties accessed the data. They’ll give no indication how much further the data was spread after that.

At this time, the only safe assumption is that the owner of the data has lost control of it.

How Do I Know if My Data Was Exposed?

I’ll start with the easy bit: I’ve loaded the 2.2 million unique email addresses in the data set into HIBP. You can search for your email there now and it will give you a yes or no answer as to whether it exists, but obviously the addresses only represent a small portion of the overall data set.

I do not have any plans to make the personal identification numbers searchable. Given the sensitivity of that data, it’s not information I want to be responsible for managing on a service like this. However, given the size of the data as compared to the population of South Africa, there’s an extremely high likelihood that anyone with an ID is in the data set.

What’s the Total Size of The Data?

As I mentioned earlier, I had to stop the original data import at about 31 million rows. For the more technically inclined, the data was being restored to a MySQL database and there were multiple indexes defined in the script which always slows down insert statements. Yesterday, I dropped those indexes and ran the import again. This time it completed in the space of a few hours. This was the result:

The fact I only originally had only just over half the data loaded helps explain why some records weren’t found when I originally queried the restored data but were subsequently found when I searched through the source file. As for that 60 million number, why is it so high? I mean South Africa only had a population of 55 million in 2015, how is the number larger than that? It turns out that the data also contains records where the individual is flagged as “deceased”. South Africans living abroad may also account for the high number, the only thing we can confidently conclude is that the data represents a significant portion of the country.

What Now?

There’s no easy or happy answers to this. People often ask if it’s possible to “cleanse” data like this from the internet to which I usually reply that “trying to do that is like trying to remove piss from a pool”.

A question that must be asked is whether South Africa wants private organisations like Dracore (allegedly) collating this much information about its citizens. To the best of my understanding, this wasn’t done with consent; people didn’t willingly provide their data for “enrichment” purposes. Now maybe that’s still a totally legal activity on their behalf, but is it really in the country’s best interests for an organisation to collate and then sell data to other parties in this fashion? The potential ramifications are now becoming clear.

Obviously, attribution is going to need to be confirmed at some point too. It’s looking likely that Jigsaw was responsible for losing the data but to the best of my knowledge, they’re yet to accept responsibility. Mind you, there’s not a lot they can do about it at this time other than to help authorities understand the extent to which they may have leaked the data.

In terms of authorities, this raises a difficult question for the government and organisations alike; with this much data about this many people having been exposed for this long, what’s the impact on identity verification processes? I mean if people need to provide data such as name, address and government issued ID in order to prove who they are, how does that change when an untold number of people have this information for the entire country? That’s what worries me more than anything because for that, there are no easy answers.