The Internet Searcher’s Handbook

I’ve decided to wander memory lane by posting and commenting on book excerpts.

Let’s begin with my first, The Internet Searcher’s Handbook, which I co-authored with Louis Rosenfeld and Joseph Janes in 1996. Back then, before Google, people knew they needed help with search. I’ve selected the foreword, written by our visionary friend Richard Wiggins who sadly passed away in 2014; and the beginning of a chapter I wrote about using the Internet for research. Together, they afford a glimpse into how much has (and has not) changed in twenty years.

Foreword by Richard Wiggins

Brewster Kahle, the inventor of an important Internet indexing and search mechanism called WAIS, tells a story about an Internet demonstration he gave at San Francisco’s science museum, the Exploratorium. A variety of young people tried their hands at surfing the Internet and at finding the specific things they might want to learn more about. Then, one boy said “I’d like to ask the Internet a question.”

Kahle explained to the young man that sadly, the Internet isn’t that sort of critter. First off, there isn’t any one place on the Internet to which you should present your question; you have to pick a particular information resource, then submit your query to that service, using the search language that facility expects. But wouldn’t it be nice, Kahle wonders, if indeed users could simply ask the Internet a question, and expect a reasonable answer?

That sort of simple model has been the dream of computer users as long as we’ve had online systems. Shouldn’t we be able to ask the Internet all sorts of questions?

  • How many acres of corn were planted in Iowa last year?
  • What is the population of Senegal?
  • What is the e-mail address for the admissions office at Auburn University?
  • How did Congressman Ehlers vote on the telecommunications bill last month?
  • What books did C.S. Lewis write?
  • Show me a picture of Comet Shoemaker-Levy hitting Jupiter.
  • What television programs are dedicated to Internet topics?

These are all questions for which one can find answers on the Internet. Unfortunately, though, we can’t just present these questions to a single, all-encompassing Internet interface and get the answers we’re looking for. As a global network of networks, the Internet has far too many information resources, and too diverse a set of information publishing tools, for things to be that simple.

Fictional computer systems would have no problem handling our example questions. A “Star Trek” character would simply say “computer” and ask any of these questions, and the machine would reply with the answer desired, delivered promptly and with smooth, modulated inflection. No computer today is that friendly and helpful, but a number of visionaries believe the day will soon arrive when online systems will achieve that level of responsiveness to our needs.

In fact, vendors of information technology software have tried for years to implement such systems. Ten years ago the goal was to devise “natural language” query mechanisms that sat atop existing databases. Today, the mantra is the “information agent,” a tool that roams the Net, monitoring, gathering, filtering, and presenting information on our behalf. Tom Selleck’s voice-over on those “You Will” commercials tells us that AT&T will provide the right tools. Numerous startup companies say they’ll win the race. Some of us believe that information agents have been oversold, and that we’re a long way off from carrying out productive voice conversations with our online systems.

In the meantime, we have to live in today’s world. There are some analogies between how you find what you want on the Internet and how you find things in a traditional library. You wouldn’t walk into a bricks-and-mortar library and, standing in the vestibule, begin shouting your question, expecting that answers would flow forth from the building itself. From our early schooldays onward, we’re taught how to exploit the standard reference resources of the library: when to go to the Readers’ Guide to Periodical Literature, when to explore the Encyclopaedia Britannica, when to consult an unabridged dictionary, when to search the catalog (these days probably an online catalog, not a card catalog), and when to ask a reference librarian.

Similarly, when you want to find something on the Internet, you have to have some understanding of the standard reference materials and the catalogs that are at your disposal. You also have to phrase the question appropriately for the catalog you’re trying to consult. If the Internet is a virtual library, this book is your guided tour of the reference department and the catalog.

In 1898, Halsey William Wilson found that his job as a bookseller was too complicated: keeping up with all the titles he might want for his inventory meant reading through numerous publishers’ catalogs. He decided to create a single catalog, which he christened the Cumulative Book Index. Today H.W. Wilson and Company remains an important vendor of library indexes.

In 1993, one of the authors of this book, Lou Rosenfeld, saw a need for a single place where people could go to find resources on the Internet. He formed the Clearinghouse for Subject-Oriented Internet Resource Guides at the University of Michigan. His goal was to have librarians and other Internet scouts collaborate to build a high-quality compendium of bibliographies. Today, the Clearinghouse is co-sponsored by Argus Associates and the University of Michigan. The managing editor of the Clearinghouse is the primary author of this text, Peter Morville. Rosenfeld and Morville are now the principals of Argus Associates, which seeks to do for the Internet community the sorts of things H.W. Wilson sought to do for the print publishing world at the turn of the last century.

The people at Argus Associates are not alone in this fin de siecle quest, of course. The names of Internet index tools are becoming part of the popular lexicon: first Archie, then Veronica, then Yahoo, and Webcrawler and Lycos. Corresponding to many of these names are highly successful commercial enterprises. The early successes of these ventures testify as to the importance of tools that help Internet users find what they’re looking for.

Of course, not all of your Internet expeditions will involve “serious” searches for answers. Sometimes people like to browse casually through information sources, which is why even major research libraries usually offer a browsing collection. Some people enjoy spending time skimming through dictionaries or encyclopedias, but those reading sessions are enjoyable only if you understand the landscape of the document you’re reading. This book will help you understand the landscape of the Internet, so that you will find peripatetic browsing of the Net enjoyable and even serendipitous.

Peter Deutsch, the co-inventor of Archie, says that he and his colleagues, collaborators and competitors are trying to reinvent 100 years of library science, this time in an Internet context. The Internet Searcher’s Handbook will be uniquely useful because its authors are not just toolsmiths, but also scholars in the field of library science. Their discussion brings the singular insights of people who have helped advance the state of Internet catalogs, from the perspective of library and information sciences.

We eagerly await a new millennium and a new era in online information retrieval: the day when that boy can “ask the Internet a question” and get the answers he needs. In the meantime, let The Internet Searcher’s Handbook be your guide to the reality of resource discovery on today’s Internet. Whether your goal is casual browsing or purposeful searching, your voyages will yield more fruit as a result of this book.

Richard Wiggins, 1995
Author, The Internet for Everyone


 Chapter 3, Using the Internet for Research

To conduct research is to search or investigate carefully and exhaustively. Variations on the definition range from comprehensive academic research within a particular discipline to less structured research on a personal topic of interest. A university professor searching through piles of bibliographies for academic articles about molecular engineering is conducting research. So is the hobbyist trying to compile a list of model railroad clubs, conferences, and events around the country.

Although many of the same search tools are useful in conducting ad hoc or reference queries, the goals and processes of research are very different. The goal of an ad hoc query is to find the answer to a specific question. The goal of a research investigation is to find all or most of the information on a particular topic. Reference queries are usually short and simple. Research queries tend to extend over days, weeks, or months, be highly iterative and interactive, and involve a wide range of tools and resources. Traditional research tools include library catalogs, reference books, microfilms, CD-ROMs, commercial online databases, and the telephone. Some tools are relatively new while others have been around for hundreds of years.

Some would have us believe that the global Internet is the ultimate research tool. Digital libraries, electronic journals, image databases, and hypermedia encyclopedias put information from around the world at our fingertips. Intelligent agents scour the networks searching for new information to index. Powerful search engines with well designed query interfaces provide intellectual access to this vast ocean of knowledge.

This dream of an Internet information utopia that provides one stop shopping for professional researchers and amateur hobbyists alike is a long way from being realized. The contents of most books, journals, magazines, technical reports, and databases are not available via the Internet. In fact, when compared with the volume of information available in print, the Internet’s vast oceans seem more like lakes or puddles. Today’s Internet is a distributed chaotic environment that changes every day. The most useful information resources of today may be gone tomorrow. Servers crash and phone lines go down. Resources vary tremendously with respect to quality, currency, and level of organization. There is no editorial board and no enforceable standard for content. Information on the Internet may be out of date, misleading, or just plain wrong. To make things worse, there’s no top-down organizational hierarchy and no card catalog to cyberspace. Locating useful information can be as difficult as finding a needle in a haystack.

Despite these problems, the Internet does provide access to a growing body of information that is far less accessible via the traditional research tools. Government publications, product and service information, technical data, software programs, and weather statistics are just some of the information resources that are most easily accessible via the Internet. The distributed and digital nature of the Internet lends itself well to information which changes constantly and must be gathered from multiple locations. Since any individual organization can make information available, we tend to see great volumes of sales and marketing literature, political commentary, travel advertisements, and so on. The Internet is an information space to which anyone can contribute, and they do. Much of the information is useless but some can be very useful. The skilled researcher learns to make use of the various tools and resources for sifting through this ocean of data for the information they need.

The collection of tools and resources for conducting Internet research is rich and varied. Virtual libraries, Internet directories, search tools, and communities of people are all available to help in the search. Some of the search tools such as Lycos and Open Text are highly automated, employing intelligent software agents and powerful search engines. Others such as the Clearinghouse and the World Wide Web Virtual Library integrate human effort and software tools to provide topical access to information resources. None of these tools provides a complete solution. In order to search or investigate carefully, the researcher must integrate a number of complimentary tools. Internet resource discovery is an iterative and interactive process in which a searcher makes use of virtual libraries, directories, search tools and communities of people to find Internet information resources. An Internet directory might lead to an online community where someone mentions an electronic journal which points to a virtual library, and so on. It’s important to keep in mind that Internet resource discovery is more an art than a science. The Internet’s chaotic and ever changing nature will ensure that some of the tools and resources described in this book will be replaced over the coming months and years. However, the basic principles and heuristics of conducting Internet research described here should endure as the environment evolves.