I’ll start this post where I start many of my talks – what does a hacker look like? Or perhaps more specifically, what do people think a hacker looks like? It’s probably a scary image, one that’s a bit mysterious, a shady character lurking in the hidden depths of the internet. People have this image in their mind because that’s what they’ve been conditioned to believe:
These are the images that adorn the news pieces we read and we’ve all seen them before. Hell, we’ve seen literally the same guy over and over again. See that bloke in the bottom right? He’s the guy! No really, I wrote about him last year and exposed his involvement in everything from state-sponsored Iranian hacking to typosquatting to him potentially being Ed Snowden. These images are used because they’re scary and people are drawn to scary headlines.
It’s not just the media using scary imagery either, check out the first 20 seconds of this video promoting a home security product:
Holy shit! That’s one bad dude! Except… what’s he actually doing on that machine? I mean we know he’s a hacker because he has a hoodie and we know he’s hacking because the text on the screen is green, but it doesn’t totally add up. As it turns out, he’s using hackertyper.net and if you head over there now and allow your cat to walk over the keyboard, you can achieve exactly the same effect. They’re trying to sell you a security thing based on something my 5-year-old can do!
Now, you might say “ah, that’s just marketing”, but let’s go back to the hooded bandits in the original image. When TalkTalk was hacked in 2015, the perpetrator (or a representation thereof), was this bloke:
Hoodie – check! (Also note the balaclava for extra security.) As the news broke, a “former cyber crime cop” was quoted as saying:
They are claiming to be from Russia and be an Islamic cyber jihadi group
Russian Islamic cyber jihadis – holy shit! How many scary things can you roll into one headline?! It’s hard to imagine just how scary these characters are… except that now we know precisely what they look like:
Well, we kinda know what he looks like, his face is obfuscated because he’s a child! He’s 17 here but was only 16 when he caused TalkTalk £42M worth of damage. (Incidentally, his punishment was that he received a “12-month youth rehabilitation order and had his iPhone and computer hard drive confiscated”. That’ll teach him.)
Not so scary, right? Unfortunately though, “not so scary” doesn’t sell newspapers. But, of course, we’ve seen this all before. Remember LulzSec? They were particularly effective at wreaking havoc on the web around 2011 and back then, they too were represented as being another bunch of scary dudes. Well, at least until a teenager named Ryan Cleary turned up in court with his mum:
Check out his mum’s face – he is so grounded! And like the TalkTalk kid, actually, not all that scary after all.
Getting to the point of all this, the other day I shared a couple of tweets:
Tempted to write a smack down on the use of this term versus the reality of where these breaches are sourced from, what do you think?
— Troy Hunt (@troyhunt) February 1, 2018
This seemed to resonate with a lot of people who, like me, have their bullshit-o-meter go off every time they hear the term “dark web” used in this way. The particular article I was responding to talked about a significant whack of breached credentials from big companies being found on the aforementioned “dark web” and per the earlier tweet, that struck me as odd; here I have lots of billions of records in Have I Been Pwned (HIBP) and only a very small portion of them came from the “dark web”. So what’s that about? (Incidentally, the media piece led to a company’s website which led to a request for your personal information – no free email accounts allowed – before you could read the content.)
So let’s start with the facts – what is the “dark web”? Here’s a neat pic from thedarkwebsites.com which puts it all into context:
For the sake of simplicity, that top 94% is what we all use day in, day out. We shop there, we bank there, we socialise there. As I’ll show shortly, we also find huge troves of breached data there.
That remaining 6% of content in the “dark web” consists of resources accessible by “hidden” services, namely Tor. And that’s a good place to start breaking down the “dark web” FUD because counter to what the headlines suggest, Tor hidden services aren’t nearly as scary as they sound. When I hear most folks talk about the “dark web”, I get the distinct impression that they’re thinking about an IRL equivalent; it’s like going down to the docks late at night where you come face to face with shady characters who, on a whim, may cave your head in with a baseball bat. Instead, Tor hidden services can be very familiar environments:
This is merely Facebook accessed via their Tor service and they’ve had that up for years now. (My daughter in those shots is the one who’s adept at hackertyper.net for which you allegedly need to buy a CUJO to keep her out of your network…) I’m using the Tor browser and in case you’re thinking “wow, that looks just like a normal browser”, that’s because it’s based on Firefox ESR with a few extra bits thrown in to help with anonymity. For example:
It’s also configured to route requests out over Tor and… that’s pretty much it. Now I’m obviously not exactly seeking anonymity by signing into my own Facebook account over Tor, but you can appreciate how the right to privacy is enormously valuable to all sorts of people. Folks in the countries that predominantly read my blog are usually less concerned than those in other parts of the world, but in places where anything from political views to sexuality can have life-changing consequences, anonymity can be enormously important. The point here is simply that the “dark web” is very easily accessible and can have very mainstream uses. It’s not necessarily this scary place full of shady characters doing dodgy things.
But, of course, there’s also that element of it. You’re probably familiar with stories of dark web market places (no air quotes this time as I’m not using the term hyperbolically), perhaps most notoriously Silk Road. Since then many others have come and also gone; Hansa, AlphaBay, TheRealDeal – all gone, many with their operators in jail or dead (it didn’t work out so well for the operator of AlphaBay). But, of course, others still spring up in their place and even today, finding drugs on a marketplace behind Tor is trivial:
Yes, they’re ecstasy tablets in the shape of Trump’s head. Yes, they’re orange. No, I don’t know if it’s merely coincidental that both Donald Trump and the psychoactive drug shaped in his likeness may cause paranoia and lead to depression.
There are many less humorous products for sale on these same marketplaces. Some of them have led directly to the deaths of those who’ve used them and the legal consequences for buyers, sellers and marketplace operators alike can be dire.
Let’s turn our attention back to our personal data being sold on the “dark web” though (back to air quotes) because that’s what we’re really here for. On that same marketplace selling Trump ecstasy, you can buy the Ashley Madison data dump:
In this case, “DrunkNinja” (who’s a stand-up bloke based on his rating), is offering it for about $10 worth of BTC (he’s also using my description of the data classes from HIBP). So, does this mean they constitute a portion of the stash reportedly found on the “dark web”? Keep in mind that the Ashley Madison data was torrented extensively by the people that stole it in the first place! In fact, that was their entire MO – spread the data as far as possible. Anyone who’s ever downloaded a torrent before could have easily grabbed it in minutes. No “dark web”. No special browsers. Just. Plain. Torrents.
Oh – and just in case downloading the Tor browser is too much like hard work, Tor hidden services are accessible in any browser via Tor2Web anyway:
It’s literally just a matter of adding .to after the onion address. Yes, that does put anonymity at risk (which somewhat defies the point of an anonymity service), but it illustrates just how readily accessible the “dark web” really is.
Many times, exposed data is literally just lying around on publicly facing services. For example, here’s an extract from my AusCERT talk last year that shows the discussion I had with the person who identified the Red Cross Blood Service data down here in Australia a couple of years ago:
That URL in his last comment was just an IP address on the clear web. That’s identical to how the massive stash of South African data was exposed last year. Or take the CloudPets situation – exposed Mongo DB with no credentials on it. Clear web again. When I look at the largest data breaches in HIBP, it’s clear web for a long way down; the 711 million email addresses in the Onliner Spambot was another publicly facing folder:
The billion plus records from the Exploit.In and Anti Public combo lists can be found floating around the clear web quite easily. In fact, very frequently there are entire personal stashes of data breaches just sitting there in public folders. I’m not going to screen cap them here because they’re often easily discoverable via Google once you know the file names; you remember Google, it’s that service that sits right up the top of that “surface web” image from earlier on. This stuff is very easily discoverable on the web we all use day in and day out.
Here’s another example that perfectly illustrates the hyperbole surrounding the “dark web”: back in December, we saw a heap of these headlines:
These rather sensational stories were in response to a company called 4IQ writing about the find a few days earlier. On the 9th of December, they explained that “while scanning the deep and dark web for stolen, leaked or lost data, 4iQ discovered a single file with a database of 1.4 billion clear text credentials — the largest aggregate database found in the dark web to date”. Now, in fairness to them, that may be precisely how they’d found it – by crawling around Tor hidden services. However, they could have saved themselves a bunch of work and just downloaded it directly from the torrent posted to one of the world’s largest websites by the very person who prepared it:
That was 4 days before the data was “found on the dark web”. Yes, I’m aware that people may now locate that post from the screen cap above, but the thing is sitting there on Reddit FFS! This is presently the 7th largest website in the world. Not the “dark web”. Not even the “deep web”. Reddit – “the front page of the internet”. And in case you’re wondering why you haven’t seen this loaded into HIBP, it’s because it’s already there:
A random sample of 1k addresses from the 1.4B list shows that 99.6% of them are already in @haveibeenpwned. It’s pointless loading this, I’ll keep working through the source incidents so that people know where their data actually came from. 6/7 pic.twitter.com/0b5DLGf8fY
— Troy Hunt (@troyhunt) December 10, 2017
You can probably sense the frustration in my writing when the headlines are screaming out about this massive new dump found in a secretive location and I’m looking at it going “this is all stuff we’ve seen already – and it’s on Reddit”. But that headline doesn’t have quite the same ring to it now, does it?
Moving on, how about Experian’s “Dark Web” search”:
4 seconds in and I’m petrified! I don’t know precisely what the guy at the start of that video is doing, but there’s a lot of green screens and we all know that means there’s some serious hacking going on. Curious, I gave it a go and, well, where do we even start? Perhaps at the beginning before you do the search:
— Troy Hunt (@troyhunt) September 9, 2017
Geez the Experian “dark web search” is terrible: several days to get a result, useless info in the report and 2 subsequent spam mails since pic.twitter.com/ccFEbZStxq
— Troy Hunt (@troyhunt) September 13, 2017
I can’t really complain about the spam mails because I’m sure I just agreed to them in the terms and conditions I didn’t read. And the other email I got about the “dark web” search didn’t tell me where they found my data. Except that based on the date, I know precisely what that breach in the second image is: it’s LinkedIn. I also know the asterisks under “Password” mean absolutely nothing because LinkedIn SHA-1 hashed their passwords and whilst yes, that was a woefully inadequate approach, nobody is cracking that 40-character random password generated out of my password manager. And as for the whole “dark web” thing, save yourself opening up the Tor browser and do a bit of Googling if you’re looking for the LinkedIn data breach because just like the other incidents they reported my address as being found in, there’s nothing “dark” about where you’ll find them.
Now in fairness, there’s a lot of data that’s not easily discoverable publicly. For example, I’m yet to see some of the data breaches I was sent last year appear in many of the usual places; Kickstarter, Bitly, Disqus and imgur just to name a few. But this doesn’t make them “dark web”, it merely makes them “whoever has them is sitting on a private stash and not shooting their mouth off about it”. I know, it doesn’t have the same ring to it as “leaked on the dark web”, but that’s the reality.
Every time you see the words “dark web” used, ask yourself this question: what is the emotion the publication wants you to feel? Do they want you to feel scared? Will they sell more security things if you do? Will you be more likely to click through, read the story and become part of the ad monetisation campaign? Yes? Then it’s probably FUD.
You’ll see the term “dark web” accompanying all manner of security-related services for all the reasons mentioned above. “Dark Web Threat Alerts”. “Dark Web Intelligence”. “Dark Web Monitoring”. Seriously – Google it – prepare popcorn first. One of them even recently described their “dark web [thing]” as being like “haveibeenpwned.com on steroids” and after checking on Wikipedia, I realised this was probably referring to the side effects of delusions, psychosis or possibly even cognitive impairment (it’s aloso possible that they’re on Trump ecstacy). On that front, a bunch of services similar to HIBP have popped up in recent times and they frequently lean on the “dark web” term in reference to where they’ll be searching. This is what happens when the marketing team makes up terms they think will sell a product. It’s in the same realm as “delivering proactive metrics”, “streamlining world-class schemas” and “leveraging scalable applications”. And in case you’re thinking those sound ridiculous, it’s because they all came from bullshitgenerator.com and if you keep generating bullshit on that site, you’ll eventually get some “dark web” in there. The service just wouldn’t be complete without it.