Discovering Ideas

English Composition Spring 2000 Palomar College

Searching the Internet

Any time people talk about information, there's another word that's likely to come up: 'search'. As it becomes possible to store and communicate ever-increasing amounts of information, it becomes more and more important to have efficient means of searching that information. [grammar note] This is true in computer science, where a tremendous amount of effort is spent developing algorithms for searching databases and other forms of information. But it's also true in daily life. Even if you don't know it, you probably perform several searches every day, and you become frustrated when your search tools aren't up to the job.

What sorts of searches do you perform? Well, any time you look up a phone number in the phone book, or a word in the dictionary, or a movie in the newspaper listings, that's a search. In each of these cases, you're trying to extract a particular item out of a large collection of information. How long it takes you depends, in part, on how large the collection is. In fact, this is one of the most basic laws of information: The average search time increases with the size of the database. How fast the search time increases depends on a lot of things, such as the efficiency of your search method and the comprehensiveness of the indexes (if any) for that database. It's easy to see how this is true: If you keep a list of 20 or so important phone numbers in your wallet, it doesn't really matter what order they're in, or whether you search through the list in any kind of systematic way. But now imagine trying to find a particular number in a non-alphabetized Los Angeles phone book. Even if you spent every waking minute checking entries at about one per second, you would still have only about a 25% chance to find the number you're looking for after a month, and you wouldn't be certain of finding it until you'd read the whole book. What's more, since the book isn't alphabetized, it doesn't matter what order you search in; on average, it would take the same amount of time whether you started at the beginning, or worked out from the middle, or just chose entries at random (as long as you marked them to make sure you didn't check the same entry twice). {future X-link to computational complexity}

This is a fantasy example (or perhaps nightmare would be the better term), but there are plenty of real ones. Have you ever written down a phone number and forgotten whose number it was? If so, you probably just called and hoped you could recognize the voice, or perhaps you gave up and threw the number away. The funny thing is, you do have a database -- the phone book -- that lists phone numbers with their matching names. Since it's not indexed or sorted by phone number, though, it's just as hard to use for that purpose as the nightmare book. So, despite the fact that it contains the right information, it's of no use to you. Even though you may not think about it this way, this sort of problem occurs all the time; for example, it's why people invented Scrabble™ dictionaries, which let you look up a word by letters other than the first one, and rhyming dictionaries for poets and songwriters.

This is where online databases have a big advantage over paper ones. With an online database, it's fairly easy to generate new indexes, and to search using complex criteria like "all single adults with incomes over $50,000 and addresses on Rodeo Drive." Directory CD-ROMs that provide this sort of search capability have become very popular with direct-mail advertisers who want to target their mailings to the types of customers who are most likely to be interested in their products. Over the last two decades, the vast majority of library indexes have been converted to CD-ROM and/or online formats. And of course, databases of Net resources have been online as long as they've existed.


Written by Ilya Farber Copyright © 1998 Encylopaedia Britannica, Inc

Table of Contents Syllabus

Discovering Ideas

    Palomar College

jtagg@palomar.edu

This page was last edited: 08/17/04