Skip to navigation
Analysis

The dark side of the web

Posted on 9 Mar 2010 at 15:47

Google sees only a fraction of the content that appears on the internet. Stuart Andrews finds out what's lurking in the deep web

When Google indexes so many billions of web pages that it doesn’t even bother listing the number any more, it’s hard to imagine that much lies beyond its far-reaching tentacles.

Beneath, however, lies an online world that few know exists. It’s a realm of huge, untapped reserves of valuable information containing sprawling databases, hidden websites and murky forums. It’s a world where academics and researchers might find the data required to solve some of mankind’s biggest problems, but also where criminal syndicates operate, and terrorist handbooks and child pornography are freely distributed.

At the same time, the underground web is the best hope for those who want to escape the bonds of totalitarian state censorship, and share their ideas or experiences with the outside world.

Interested? You’re not alone. The deep web and its “darknets” are a new battleground for those who want to uphold the right to privacy online, and those who feel that rights need to be sacrificed for the safety of society. The deep web is also the new frontier for those who want to rival Google in the field of search. Take a journey with us to the other side of the internet.

Deep webs, the dark web and darknets

The first thing to grasp is that, while the elements that make up this other web have aspects in common, we’re not talking about a single, unified entity. Those in the know will often talk in terms of the deep or invisible web, darknets and the dark web, and you might think these are all the same thing. In fact, they’re separate phenomena, albeit linked by common themes, properties or interests.

The deep web isn’t half as strange or sinister as it sounds. In computer-science speak, it refers to those portions of the web that, for whatever reason, have been invisible to conventional search engines such as Google.

The majority of this deep web is made up of dynamically created pages and database entries that are accessible only through manual completion of an HTML form

The majority of this deep web is made up of dynamically created pages and database entries that are accessible only through manual completion of an HTML form. A smaller proportion has been accidentally or purposefully made inaccessible to Google’s crawlers, while other areas sit behind password-protected or subscription-only sites.

Make no mistake, the deep web is huge. Michael Bergman’s pioneering 2001 study, The Deep Web: Surfacing Hidden Value, estimated that it accounted for 7,500TB of data at a time when search engines could index only 19.

Even the more conservative estimates in a 2007 paper written by Google’s Jayant Madhavan, Alon Halevy and colleagues, suggests that there are more than 25 million different sources of deep web content, many of which are huge repositories.

“There is a prevailing sense in the database community that we missed the boat with the WWW,” the Google paper concluded. “The over-arching message of this paper is that a second boat is here, with staggering volumes of structured data, and that boat should be ours.”

Treasures of the deep

“There’s a lot of legitimate and valuable content in the deep web,” said Dr Juliana Freire, the leader of a University of Utah project, DeepPeep, which aims to make deep web content more accessible.

“For example, there are several scientific data sets (such as the Sloan Digital Sky Survey and the Center for Coastal Margin Observation & Prediction), documents and databases, and these are useful to society and have many important applications.”

1 2 3 4
Subscribe to PC Pro magazine. We'll give you 3 issues for £1 plus a free gift - click here
User comments

and those who feel that rights need to be sacrificed for the safety of society

"They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety."

Benjamin Franklin, 1775.

By Lacrobat on 12 Mar 2010

estimated what?

accounted for 7,500TB of data at a time when search engines could index only 19.

What, 19 pages, sites, TBs?

By greemble on 12 Mar 2010

TB. It's self explanatory really

By TimoGunt on 18 Mar 2010

Great Post

Great post.
Here is a good article that adds some additional detail to the topic and a good set of links to the deep web search engines and other helpful sites.

By theTribster on 19 Mar 2010

Another Article

Attempt 2. See link below.
http://tastethecloud.com/content/deep-dark-invisib
le-web is a good article that adds some additional detail to the topic and a good set of links to the deep web search engines and other helpful sites.

By theTribster on 19 Mar 2010

The dark side of the web

Fascinating article. I had no idea that their was an "underworld" web.

I agree with the article author about content and use. When you put togther any number of people in doing something, there will always be those whose purposes are less than honorable. But, that does not change the fact that the good of its use can outweigh the bad.
Thank you for this post. I learned a lot from it.

By moomoosweetbaby on 25 Mar 2010

Leave a comment

You need to Login or Register to comment.

(optional)

For more details about purchasing this feature and/or images for editorial usage, please contact Jasmine Samra on pictures@dennis.co.uk

advertisement

Latest News StoriesSubscribe to our RSS Feeds
Latest Blog Posts Subscribe to our RSS Feeds

advertisement

Sponsored Links
 
SEARCH
Loading
WEB ID
SIGN UP

Your email:

Your password:

remember me

advertisement


Hitwise Top 10 Website 2010
 
 

PCPro-Computing in the Real World Printed from www.pcpro.co.uk

Register to receive our regular email newsletter at http://www.pcpro.co.uk/registration.

The newsletter contains links to our latest PC news, product reviews, features and how-to guides, plus special offers and competitions.