Note from archiver<at>cs.uu.nl:
This page is part of a big collection
of Usenet postings, archived here for your convenience.
For matters concerning the content of this page,
please contact its author(s); use the
source, if all else fails.
For matters concerning the archive as a whole, please refer to the
or contact the archiver.
Subject: Information Research FAQ v.4.6 (Part 1/6)
This article was archived around: 06 Apr 2002 06:54:07 GMT
Last-modified: Feb 2001
Copyright: (c) 2001 David Novak
Maintainer: David Novak <firstname.lastname@example.org>
The Information Research FAQ
100 pages of search techniques, tactics and theory
by David Novak of the Spire Project (SpireProject.com)
Welcome. This FAQ addresses information literacy; the skills, tools and
theory of information research. Particular attention is paid to the
internet as both a reservoir and gateway to information resources.
This FAQ is an element of the Spire Project, the primary free reference
for information research and an important source for search assistance.
Do visit http://spireproject.com . It is free and compliments this FAQ
with links, forms and tools. We also publish a fine All-in-one search
page, a collection of our best search tools called Spire Project Light
This FAQ resides as text at http://spireproject.com/faq.txt and
http://spireproject.co.uk/faq.txt and with pictures at
*** The Spire Project also delivers a 3 hour public seminar called
*** Beyond Boolean: exceptional internet research. This is
*** a fast paced demonstration supported with webbing, reaching
*** beyond the ground covered on our website and FAQs. Please visit
*** http://SpireProject.com/seminar.htm for synopsis and venues.
*** Register you interest and we will try to come to your city.
David Novak - email@example.com
The Spire Project : SpireProject.com and SpireProject.co.uk
. . Prelude.
. . . . . . Everyday searching has a simple approach.
. . . . . . Searching for specific, quality information demands a more
. . . . . . Let's understand how information is arranged on the
. . . . . . Each format (book, article, web, etc...) has unique search
tools and resources.
. . . . . . Specific guidance on libraries, discussion groups and other
. . . . . . Review and discuss types of information in specific fields.
. . . . . . Boolean, proximity, field searching, Dewey and patent
. . . . . . Quality depends on source, currency, search process,
. . . . . . Commercial information industry, libraries and the
. . . . . . Information moves and evolves in fascinating ways.
. . . . . . Steps to improve an online search.
Many of us unwittingly digest great amounts of information in the course
of a day. Our information needs are more modest and usually repetitive.
When we have questions, we reach for a small collection of preferred
information sources close at hand with a collection of assessments as to
what is credible and trusted.
As a child, these sources include the school library, an encyclopedia
and parents. All the sources are trusted.
As an adult, these sources include the state library, the newspaper,
bookstores and current magazines. Adults understand truth has become a
little more relative, but when the evening news declares presidential
hopeful George W Bush is ahead by 3% (on a sample of 707) we slip into
thinking he is leading.
There is more to information literacy. It is, after all, a profession.
There are tools you know nothing about and techniques you have never
heard of. There is a specialized vocabulary just made to confuse you.
Research, or rather information research (to distinguish it from
lab-coat style research) is so very much more involved.
Yet there is great simplicity to research too. Just under the murky mist
of confusing resources rests a solid platform to stand on. In any one
field there are just a handful of databases, directories and periodicals
to consider. After decades of library and information industry
evolution, clearly valuable sources have already floated to the top,
monopolizing their respective fields. Most cities have just one or two
primary newspapers. Large industries like book publishing have few book
databases and a handful of primary book distributors.
Enters the internet: not so much a change of information as a revolution
in access to information. Previously you could justify having just a
handful of preferred information sources because these were the sources
easily available. Today, and the future, is filled with information
close at hand. We are dropped into a morass of competing information
just waiting to capture our attention, and strain both our capacity to
absorb information and our capacity to understand the differences
A great segment of our community will fall back to tried and true
information sources they grew up with: state library, bookstore, local
newspaper. The better alternative sources will be ignored for no
particular reason. The rush of the information revolution will push past
them. They will only hear of changes when their information needs
suddenly change - and they are confronted with a vast collection of
unfamiliar options, and struggle with understanding what sources they
A smaller segment of our community, by virtue of frequently tackling
questions best answered with unfamiliar sources, will be driven to
understand the information world: to become truly information literate.
There is another story here too. The way our society handles information
is undergoing some very fascinating changes. Any predictions for the
future should acknowledge the tension and flow of information in our
society. Take, for example, the vast surplus of information emerging on
the internet, and the convulsions of the commercial information industry
in response. Rather than focusing on how information is organized, we
can also focus on how information becomes organized. The who, where and
why of information, the sociological perspective, adds meaning to the
phrase "information revolution".
- - - - - - - - - - - - - -
It was another warm day. The young Egyptian boy strode purposely out the
gate towards the river. The Nile was low this time of year. Very
abundant with fish and bird life. With luck, Shakh would return at
sunset with food for the pantry. Mother would be pleased with that.
Shakh knew fishing had changed little over the last hundred years. The
walls of his family's ancestral home had just such a scene of his
grandfather fishing on the Nile from a small reed boat. The thinly
carved relief was complete with spear, fish, ducks and Shakh's
grandmother nearby holding lotus flowers.
Shakh stopped by old-man Jacob on his short walk to the bank of the
Nile. He liked the old trader. Years ago Jacob had traveled to the
Levant and brought back many strange artifacts. Some even came as far a
field as the Harrapan people who were said to live beyond Sheba, across
the waves, some three years journey away. He especially liked the small
black head carved in a style so unlike anything else Shakh had seen.
- - - - - - - - - - - - - -
The Harrapan people lived on the banks of the great Indus river in
modern-day Pakistan. A great civilization almost on par with the
Sumerians and the more distant Egyptians, very little remains today.
They built vast cities of clay brick with rectangular city blocks. They
built drains, public toilets and state granaries. They were the first to
populate the Indus river valley. (see
Little remains. The Harrapan civilization fell with the arrival of the
Aryan race and the intervening millennia treated their past poorly. The
arrival of Islam erased much of their history as did the shifting Indus
river itself. The British used the bricks from one ancient city in the
construction of a great railway. Only today are the archaeological digs
once again unearthing the past.
I search for Harrapa on the internet. Nothing special, just type
'Harrapa' into any of the popular search engines and I uncover
harrapa.com, a website devoted to some recent information from these
digs. Looks good. Pictures of ancient pots. Children's toys. A map to an
Of course, Shakh would have known of the Harrapan civilization. While it
is uncertain ancient Egyptian ever visited in person, goods and rumors
traveled far from trader to trader. Ancient Egyptians, while not
accomplished conquerors abroad, did travel and mix with distant peoples.
Shakh lived in a civilization centuries distant from us, yet both you
and Shakh know a similar amount about the Harrapan civilization. The
intervening years have not made everything clear. Even the information
revolution has not changed the facts. Both you and Shakh have just a
single source of information about the Harrapan civilization. You have
the pictures on harrapa.com and our short excerpt here. Shakh has the
old-man's art object to look at, the old-man's myth of a civilization
beyond the waves.
This story carves the act of searching in deep relief. Searching is a
skill, a trade and to some a profession. It is also just a simple task
of finding information - something we do every day, in so many ways,
without any of the difficulties we will get into later in this FAQ.
The difficulties only emerge when you want to do something spectacular.
Should you wish to know something specific about the Harrapan
civilization, or understand something contentious - then we require a
greater degree of expertise and experience. The search becomes a
challenging adventure in its own right.
- - - - - - - - - - - - - -
The Nile was always a slow river but three months out of the year it
burst its banks and flooded the fields, bringing life on the banks of
the Nile to a complete halt. For these three months Shakh's family would
move into the ancestral home in the streets surrounding the great
pyramids. It was an old home, centuries old. Well suited to their needs
with a storeroom for food, separate rooms for the parents, and an active
social life in close proximity to others. In many ways, this was the
most exciting time for young Shakh. For the rest of the year he lived in
relative isolation in the village by the Nile. For these three months,
he lived in a city, bustling with activity, construction and recreation.
Shakh had expected this year to be like the last but his father secured
Shakh an important position - he would be in training to become a
scribe. Father had grand plans for young Shakh, plans that extended far
beyond life as a scribe. What's more, with luck and further prosperity,
Shakh's father had the means to secure his further advance.
- - - - - - - - - - - - - -
Much of ancient Egypt is available for us to read off the walls of the
many remaining buildings. They were not a literate nation, yet were able
to adorn almost everything with writing and pictures. They lived in the
most enlightened society of the day. Years later, Egypt would gift the
fledgling Hellenic state a full third of their Greek vocabulary.
This is part of the reason for such an interest in travelling to Egypt.
It is the visual symbols that inform us and draw us in so deeply.
Standing before the great religious statues, we begin to feel how it was
to live and work in that day. To run amok as a young student, waiting
for the Nile to subside once again.
Yet, there is much more to knowing ancient Egypt than just the monuments
and wall reliefs. Years of study has recovered their lost language of
hieroglyphs. Years of archaeology has unearthed their daily lives.
History and Archaeology are fine examples of searching in practice. Both
fields struggle openly with the bias and uncertainty each new fact
brings forth. Malta is a small island off the coast of Sicily, close to
Tunisia. Should evidence emerge of ancient Egyptians living on Malta,
what does it mean? Was Malta an Egyptian conquest or an occasional
station for their fishing fleet?
This uncertainty applies to all information, in all situations. One of
the first events for the new regime in Pakistan was to acknowledge that
important national statistics, like the national GDP figures, had been
fudged to a serious and significant degree. Important national
statistics are not intrinsically true because of their source. This is
not a problem solely of underdeveloped nations. Rumor suggests that
during the height of Singapore's land value bubble their national
figures were unreliable too.
Searching is a skill and an attitude. In this FAQ we progressively
unfold the way information is found. Initially, let's cover a simple way
to find information; a structured approach to an everyday problem.
Afterwards, we shall look more closely, and with more complexity, at the
world of information.
Searching is Simple.
Searching is simple. It starts with a question. It ends with an answer.
Everything between is searching. Much of it has to do with the tools you
use. Select the right tool and you can get to the answer almost by
default. Luckily, for any given topic there tends to be just a handful
of must-use tools. For more complicated questions, there are usually
plenty of people to ask for assistance.
The answers you are seeking will be found in a selection of different
formats. In this I mean books, articles, interviews, and more. This is a
very convenient concept and forms the foundation to all our work both
here and in the Spire Project. Few research tools cover more than a
single format; those that do, tend to cover each format poorly. Start a
search by selecting the specific format you are seeking. Then, select
your preferred search tool from a small collection specific to that
format. To get the information, simply follow through and read, search
or interview. Everything follows naturally.
Have a Question.
Select a Format.
Select a Search Tool.
There are just a few formats to consider.
. . . . . Dense, factual, comprehensive and a minimum of 6 months to a
. . . . . Shorter than books but focused on one topic.
. . . . . Short and shallow. Immediate.
. . . . . Factual. More reliable.
. . . . . Very thick. Deeply researched. Esoteric.
. . . . . Immediate, mixed quality, with limited factual support.
. . . . . Immediate, varied quality, partly digested.
Each format has a selection of simple tools to find information. Many of
these tools will be on the internet - which may mean easily accessible.
A word of caution: try not to confuse search tools that happen to be on
the internet with searching internet information. The Amazon.com book
catalogue is a search tool useful in locating books. Though on the web,
searching Amazon is part of a book search, not a web search. A search of
the Reuters newswire is a news search, not a web search, even though
Reuters releases current news on the web. Each format should remain
distinct in your mind.
Tools to Find Books
1) Some books, particularly classics, are free on the internet through
efforts like Project Gutenberg.
2) Libraries allow you to read books. Library catalogues are frequently
3) The largest libraries, like the Library of Congress and the British
Library, list millions of books in their online catalogues.
4) Most currently available 'in print' books are listed in national
5) Each country maintains a special government publication database.
6) Lastly, online bookstore catalogues like that of Barnes & Noble, list
a sizeable portion of current in-print books.
Tools to Find Webpages
1) Global search engines index hundreds of millions of webpages for free
text searching. Consider Altavista and All-the-Web.
2) Global directories list resources by category. Consider Yahoo or the
Open Directory Project.
3) Regional search engines and directories focus more tightly on
regionally important topics.
4) Lastly, more specialized search tools, from search engines which
focus on specific topics (like maths or government webpages), services
which link you to important topic-specific websites, and services which
manually review websites, all can take you further.
Tools to Find News
1) Current news is found in newspapers and the evening news. News clips
can be delivered electronically, or purchased through specialist news
2) Newswires redistribute regional news to a larger audience. Many
newswires release their text news free online.
3) Specialized search engines like NewsBlip and TotalNews aggregate
current online news.
4) State libraries archive past copies of regional papers.
4) Individual newspapers maintain libraries of previous articles. Many
are available as commercial databases.
5) Larger commercial databases unite the news from many prominent
newspapers. These databases of news articles stretch back many years.
This story is repeated with all the formats information comes in.
To drum this in with repetition, searching starts with a question.
Select the format (book, news or webpage). Next, select one or more
tools from our short list of search tools for that format. Want to
understand the lifecycle of the spider? A book should prove useful.
Let's look at either our local library book catalogue or a big
commercial bookstore catalogue like Barnes & Noble (http://bn.com).
Search. Read. Voila, the lifecycle of the spider.
If searching appears a little boring at this point, you have not visited
a library recently. The excitement comes in finding the information. The
rest is dull indeed.
- - - - - - - - - - - - - -
The information revolution washes over us, picks us up and pushes us
forward like so much driftwood. From now on our lives will forever be
awash with information. We will eat it. Breathe it. Live in it. Drown in
it. Some of us will even learn to live for it. Those most capable will
have the skills to search, sift and sort information.
The information revolution is not about primary research, lab coats and
discovery. It is about a surplus of information. The searching we have
just discussed is not a particularly creative process. Simple searching
is not sufficient to deal with the great tide of information moving
against us. But then, simple searching lacks finesse. Simple searching
is, well, simple.
Searching is one of those most delightful tasks where skill is
everything. A search without talent will give you just a taste. Like
pottery perhaps. Anyone can get something but only an expert can
accomplish wonders. Quality information, reliable answers, effective
coverage of resources; it takes skill to get to this level.
Advances in technology and the delivery of search assistance has made
searching easier than ever before. Many search tasks can be accomplished
without any experience. With more challenging questions a novice will
get results - results they will be proud of. But not results they should
be proud of. With experience, you will recognize how much more is
Let's proceed by adding a little more complexity.
Searching is Complex
Your value as a searcher is directly related to the number of resources
you can reach for quickly, and your skill at phrasing a research
question. Consequently, as a searcher, you will work hard at building
ready access to a range of resources. You also work hard at
understanding the special characteristics of collections of information.
The technical name for complex searching is 'Information Research'. I
prefer to think of information research as an effort to locate answers,
efficiently. Information Research is not vague browsing of available
information for something that interests you. It is not browsing the
library bookshelf or reading the newspaper, nor is it internet surfing.
Information research is searching with a purpose ... and it is hard
Research is also an art form. The skills, tools, and resources we work
with are only the canvass and paints of an artist. Research extends from
commercial, legal, reporting, through the skills of interviewing,
database searching, and research analysis using books, articles, experts
and patents. Research is so large a field, involving so many skills,
tools and resources, you will quickly find you do not wish to learn it
At the heart of information research lies a simple motto: "Someone,
somewhere, probably knows the answer."
To quote The Information Broker's Handbook (Sue Rugge and Alfred
Glossbrenner): "As information brokers, we shouldn't consider ourselves
capable of providing solutions... What we 'can' provide, and what sets a
really good information broker apart from the rest, are resources. We
can provide the client with the kinds of information he or she needs ...
that make it possible for individuals to solve their problems."
Let this sink in. We are not experts in the field we are researching.
Collecting information on the moons of Jupiter? Do not pretend to be an
astronomer. We are only experts at the tools for gathering information.
A Quick Introduction to Effective Searching.
1) Searchers work hard to properly frame the question.
2) Searchers know the technology, know where to look.
3) Searchers know you can ask.
Step One: Properly Frame the Question
The preparation of your question is critical. There is a galaxy of
difference between a young student asking, "I am interested in trees",
and a specific, attainable question like "Where would I find a tree
surgeon I can talk to?"
The information sphere is very large and rather confusing. Each item of
information has aspects of authenticity, accuracy, reliability, and
bias. Information comes in many formats: interviews, books, articles,
statistics. We learn about information from many sources: literature,
discussion, resource lists, experience. There are also personal issues:
budget, time, depth and purpose.
With all this to think about, we must be very careful about each
question we ask. This issue is vital once we start an article search,
and can easily mean the difference between 5 concise articles, and
hundreds of general articles. The essence of our question is the manner
with which we approach the information sphere. The question directs our
One key is to treat searching as an art, much like painting or
photography. The true mark of an artist, and the primary step wanna-be
artists miss, is visualizing what you want before you begin.
When searching, sit down and visualize what a successful search would
look like in this situation. How many pages? How many documents? What
kind of authors and what kind of quality of document? Go through the
whole gamut of different types of research tools and describe it. Would
a simple three-line newspaper article be a success? Would a 20-year-old
dissertation be acceptable? Would a short conversation with an expert
suffice? Would all three together suffice? (This approach works
exceptionally well with internet research too.)
If you can phrase a question in a way that lends itself to your
resources, you are far more likely to get the answers desired. Oddly,
this often means you are asking for places where the information resides
rather than asking directly for the information.
A novice starts with a question like, "What can I do for my exceptional
child?" You should rephrase this question immediately. "What resources
will help me help my exceptional child." These are both valid questions
but the second question has a distinct answer - the first is far too
vague. Other questions could be "What are other parents doing for their
exceptional child?" or "Who can help advise me on how to teach my
Now we shape the question to get precise answers. "Where do I find a
definitive list of associations?" (or a search for "+association
+directory") works much better than, "What association works with
exceptional children?" What about, "Who would know of associations for
exception children?" and, "Are there pamphlets of advice for parents of
exceptional children?" and, "What umbrella organizations/specialist
libraries exist for exceptional children?"
Questions are not right or wrong, just better or worse at illuminating
certain aspects of the answer. Make sure your questions illuminate
There are ways to frame questions for commercial databases, for research
assistance, for interviews, for getting the truth from to your children.
Your skill in phrasing the question has a lot to do with the results.
Poor questions tend to come back and haunt us later when you miss
relevant information. Set aside ample time to refresh and reframe your
Step Two: Know the Technology, Know Where to Look.
Research rests on understanding the technology and an awareness of the
resources. In the example above, a directory of associations does exist.
Here in Australia it is the "Directory of Australian Associations",
found in most important Australian libraries. The Australian "Department
of Education" has a major interest in promoting exceptional children. In
Western Australia, Infolink, a community information service, should
have a record of major community groups for exceptional students. I have
no direct knowledge of umbrella organizations or specialist libraries,
though I expect both the education department and Infolink would. A
quick search of some large libraries may help us find some of the
Knowing of specific resources is helpful. It is great if you live next
door to the president of Mensa. You have easy access to someone
knowledgeable, able to give his or her take on the situation.
Knowing the tools to help you find resources, the meta-resources, is
vital. So what if we do not know exceptional students come under the
Department of Education. Do we know who to ask to find the government
department involved? If you do not know of the directory of
associations, who or where would you look for one? Being unfamiliar with
meta-resources is a serious handicap - you will find yourself searching
hours for something a professional would do on the phone while drinking
Keep in mind the Spire Project is dedicated to providing you some of
this experience. Our web articles should suggest directions to look. But
there are limits to how we can help. At some point you simply must sit
down with the Kompass Directory, or the Gale Directory of Databases, or
the Australian Bureau of Statistics library, and become familiar with
getting to all the relevant information.
Another must, for all searching, is experience searching electronic
databases with complex research queries - a difficult task only made
better with practice. As a general rule, if you don't use Fields,
Proximity and Boolean search terms, you are doing it wrong. Most people
do it wrong.
Step Three: Know You Can Ask.
There is very little mystery about professional research. Lots of people
are experienced in different aspects of this field. My personal weak
point is in direct interviewing where as I am a pioneer in secondary
resource research. This is OK. In fact I use this liberally to determine
the skill of professional researchers - do they know their own limits?
The field is much too large to be an expert in all its aspects.
The positive site to this is many people welcome requests for help. I
enjoy asking librarians questions. I also ask my customers, my suppliers
and other professional researchers. Never get caught in the trap of
feeling you know what to do. The joy in this profession is that most
people do not expect you to be an expert in their field, just an expert
in your field: particularly the meta-resources. Even if it requires a
polite reminder, customers will appreciate you asking them for likely
keywords in difficult searches. I always make a habit of asking
librarians if I am missing something. A librarian is always fluent in
their collections and I frequently locate real gems this way. (As an
example, my state library arranges computer books in two sets, one Dewey
and another in an alternative structure. Who would have guessed?)
Especially if you are just a student, always keep your ears open. You
will frequently find yourself in the presence of some expert in some
facet of research telling you something you already know. Consider
carefully before you interject... Your expert may be about to explain
something new to you.
Information research is a dedication to learning. At its heart is a
collection of specific research skills, an awareness of research tools,
and a gifted mind. - Oh, and a large amount of coffee. Without knowledge
of and access to relevant research-worthy resources, your research will
be severely limited and doubtful. This is why much of your work becoming
an effective researcher involves learning about the resources and
meta-resources for your field. Much of our work in the Spire Project is
drawing your attention to relevant resources.
Before we progress to specific resources for specific formats (books,
webpages, news), let us attack head on the role of the internet in
information research. This should surprise you.
The Internet Format.
As Shakh became more proficient with writing, father wrote more
frequently of the family deity. Horus, the falcon god, had long watched
over his family. Horus sees all, his father would write, and even across
the many miles separating you from us, Horus will watch over you and
keep you close. It was a great comfort to Shakh to have the family deity
looking after him.
Shakh too devoted himself to a life of watching and knowing.
- - - - - - - - - - - - - -
We have discussed how information comes packaged in certain standardized
formats like books, articles or news clips. Each format has particular
qualities and standards that reflect the way the information is
prepared. For example books are dense, factual, comprehensive and a
minimum of 6 months to a year old.
So how can we apply this newfound wisdom to the internet?
Let's start at the beginning. The internet is an inexpensive and
pervasive system for the delivery of data. It is also the medium of a
dramatic shift in the way we access information.
A (1) dramatic drop in the cost of publishing is fuelling (2) the
liberation of information from previously closed systems, leading to (3)
an emergence of alternative funding for certain public resources and (4)
an eagerly awaited 'direct to consumer' commercial information industry.
The first mental knot to untie is the separation of internet resources
into distinct formats. Electronic books share most of the qualities of
books published on paper. News stories found on the web share all of the
qualities of news in your local newspaper. The fact they are electronic
or appear as webpages has nothing to do with it. News is news.
Electronic books are almost books.
But if online news is news, and online books are almost books, and both
are not internet formats, what is an internet format?
The search-by-format method is a concept to simplify and understand the
many information resources which exist in the world. The concept is only
as valuable as it is successful at enlightening us. As to the internet,
we have more to learn, but could safely divide the internet into several
formats at this time, perhaps webpages, online discussion and ftp
resources. Yet this is largely superficial. The real value comes from
understanding the qualities of different types of webpages. We shall
divide the webpage format further.
Must we really learn this?
You would be pardoned for equating searching and the internet. Much of
the hype surrounding internet search tools builds the illusion that the
skill of searching can somehow be distilled computationally then
delivered to you electronically. Through the wonders of modern science,
you can have the best information at your finger tips without having
learn anything of search technology.
This is a pervasive lie (or marketing fiction). The electronic research
industry has been around for decades and has worked on this problem for
some time. No upstart internet guru has invented a technique to suddenly
transform the search process. Such thinking would work in section two
(Searching is Easy) but is the first illusion we must shatter for you to
Case in point, Lycos and All-the-Web search engines use the same
database of webpages. This database is growing rapidly, it stood at
350,000,000 webpages in June 2000 and hopes to reach one billion
webpages by the end of 2001. It stands as a grand achievement in
Wrong. Years ago I was using a unified database of news called Global
Textline (no longer available but replaced by others). It had an
astounding four billion news articles available for advanced text
searching! Four billion news items, representing many years of news from
all over the world. This was superficially 10 times the size of the
current All-the-Web search engine.
No, the internet does not even hold the record for being the largest
information field. Oh, it will surely surpass the quantity of commercial
information, and superficially we could say it may already have achieved
this. But the internet is not a new medium for information research. It
is emerging as a new resource, not a new phenomenon.
The internet is a new medium for business - most businesses have never
incorporated the immediacy or global nature of internet involvement, so
considerable rethinking is required. The internet is a new medium for
publishing for almost all of us; very few of us published electronically
before the internet emerged. The internet is NOT a new medium for
research. Information researchers have been working electronically for
years. The internet is just a new resource we can reach for with
strengths, weaknesses and peculiar traits we must appreciate.
By way of an example, let us compare Link Analysis as used in Google and
Raging (of Altavista) with the process of editorial vetting as used in
Through the magic of link analysis, we can make certain assumptions
about the value of a webpage by adding up the number of other pages
linking to that page. In its simplest form, webpages with at least 100
inbound links from other websites are judged to be quality, valuable
resources. A webpage without any inbound links has the suspicion of
being of poorer quality. After all, no one has thought it valuable
enough to add a link to their further resources page.
This logic has some serious shortcomings. Firstly, the process rewards
long-term projects that have been online long enough to earn links. A
brilliant new webpage would have few links - yet. It would be ranked
poorly, undeservedly. Secondly, link analysis rewards websites over
webpages. The pages with the most links are often homepages. Rating
homepages over second level webpages works at odds to keyword searching.
Our keywords will be found in specific, perhaps second-tier webpages.
Links go to the top level. Thirdly, link analysis is a mass market,
popular technique. You are banking on the intellectual finesse of a mass
of mindless computer users much like yourself. It is the same kind of
popular democratic selection that votes B-grade actors into the
Let's contrast this with the process of editorial vetting used in
scientific journals. Each article is reviewed by a selection of
knowledgeable peers who understand the topic is great depth. Each
article is further improved by the editing of the journal editors, and
by self-editing, for there is great competition and prestige at stake.
Only a handful of the many submissions are judged worthy and appear in
the printed journal. Success places the successful in the standard of
record; stamped with an external statement of truth and importance.
Of course, the logic of editorial vetting also has shortcomings.
Firstly, the process is time and effort intensive. Many of the most
important journals will delay six months or more between submission and
publication. In our digital era this is increasingly unacceptable.
Secondly, the number of submissions accepted are at odds with the pace
of development. So much more happens in the world than can be digested
in this manner. Thirdly, editorial vetting supports the clannish
behavior leveled against the upper echelons of science. New and novel
developments have difficulty floating to the top if the peer review
process should not be open to new ideas.
If link analysis is popular and democratic, editorial vetting is elitist
and autocratic. Both approaches have pros and cons.
Once you have absorbed the drama between link analysis and editorial
vetting, please do not retain the belief that your search needs will be
completely solved for you. Searching is a complex, overgrown garden and
its time to get your hands dirty.
So what does the internet have to do with searching?
The internet changes searching in two ways. Firstly, the webpage is a
new format to contend with.
"Webpages are often of unknown age, of only guessed at quality and
potentially the easiest information to retrieve. There are many points
of entry to web resources but search tools differ. Try to match your
search tool to your question."
The internet is also a conduit to many of the pre-existing tools for
searching other formats (books, news, interviews).
With an internet connection, we can reach database retailers and many
commercial quality databases like LOCOC, ERIC, MOCAT and AGIP directly
from the source. We can also remotely search the catalogue of most
libraries in the world. These are not new resources, just new ways to
In this day of interconnectivity and change, it is too tempting to
declare the information industry is in rapid flux. Everything I have
learned suggests this is not so. There are some changes associated with
new channels but by and large the process of searching for information
remains the same.
Let's look briefly at news as an example. News articles are written by
the reporter, sold to international newswires which then distribute
these stories to interested newspapers and news channels, that
incorporate the news into your newspaper or evening TV news.
Journalist - Newswire - Newspaper/News show - You.
News would also be added to commercial databases of past news. These
databases are then provided to database retailers like Dialog or
Lexis-Nexis who sell occasional access to you.
Journalist - Newswire - Commercial Database - Database Retailer - You.
With the internet, newswires have also provided their text news to
online sites. Text news is thus available for you to browse or search.
Journalist - Newswire - Internet News Sites - You.
I draw your attention to several facts. The fundamental nature of the
industry has not changed. Journalists and newswires still impart upon
the news the same nature as before. It is short, shallow, immediate. It
is created to journalistic standards.
If you wish to search past news, you must still reach for the commercial
database, most likely through a database retailer. Searching for news
online only goes back two weeks at most.
Lastly, to date only the text format for news is widely disseminated.
Sometimes a couple of pictures are included but the visual news, as used
in the evening news on TV, is sure to remain priced beyond public
So what has changed? There is another venue for you to pick up the news.
There are opportunities for new databases to be created, some of limited
time (like totalnews.com - a database of current news on other
websites). Little else has changed. The creation and dissemination of
news remains pretty much as before the internet arrived.
Let us look even more briefly at book publishing. Books are produced by
authors, improved by editors, published by publishers, marketed by
bookstores, then purchased by you.
Author - Editors - Publishers - Bookstores - You.
Today we have a couple of new online bookstores - and a large number of
new old online bookstores (existing bookstores now selling online). We
have a collection of free books online (largely classics like
Shakespeare, which strangely, were immediately published as really
inexpensive paperback classics available in airports everywhere).
There are also a range of very useful commercial quality book databases
which have become free to search online. I am thinking the government
publication catalogues (MOCAT [US], AGIP [Australia] and Stationery
Office Online Catalogue [UK]) and the online catalogues for the Library
of Congress (LOCOC) and the British Library.
Lastly, the online catalogue to the large bookstores like Barnes and
Noble, Amazon and The Internet Bookshop (UK's WHSmith) can provide a
free and fast database of books in print, though not as good as the
commercial Books-in-Print databases. Of course, any local bookstore will
offer to search books-in-print for you, so this is not as revolutionary
as it might at first appear.
In summary, we have a collection of recently discounted book databases
we can more easily search, we have additional sites to buy books, and
little else. The creation and dissemination of books remains pretty much
as before the internet arrived. Has the book industry changed? Not
The most remarkable change has been the emergence of group discussion
online, the emergence of a new format for information (like the webpage)
and the opportunities to connect faster to a whole range of pre-existing
This is the reason why we discuss searching-by-format. Later, at the end
of this FAQ, we return to this topic and show that the real revolution
is not in resources or industry or search tools but a revolution in
immediate access. Access, it turns out, enriches the art of searching.
On counterpoint, as an information resource, the internet can still be
much too limited for many situations. If we are not careful, searching
the internet becomes no better than browsing the shelf of your state
What most impresses me about the internet is the promise of changes in
the future. The internet as a system suggests radical improvements to
the current decade-old systems that have attained their search-worthy
status. What impresses me most are the improvements mostly still in the
future, not yet proven, set to remain promising ventures for a time.
This is not to say internet research can not be rewarding. In some
fields like computer studies, the internet has already surpassed parity
with books, articles and associations. Just when you will consult the
internet as a research-worthy resource depends on cost, effort, and the
quality of the information returned. This judgement call requires more
than a little experience.
Value is important. I sincerely hope we can suppress our enthusiasm for
free information in favour of a truer appraisal of the value of
information. Make no mistake, commercial information is brilliant. It is
almost heresy to even compare commercial information with the results of
a few hours on the internet.
Internet Information Theory
Let us agree the internet is great fun to surf but more challenging when
you have a specific question in mind.
To improve our search skills, we begin by understanding how information
is arranged on the internet. Contrary to myth, information is not
disorganized but rather organized very carefully along clear patterns.
Many patterns are specific to the information format (text document,
webpage, email message, printed article). Further patterns match the way
we become aware of information, or are specific to the information
systems (mailing list, FAQ, peer-reviewed journal). Your understanding
of the strengths and weaknesses of each pattern, each format, each
system, guides your search for information. We shall start by shattering
the internet, and commenting on the many pieces.
Three Definitions of the Internet
Do be careful when using the word 'internet'.
1_ The internet is a physical network; more than a million computers
continuously exchanging information. The internet allows us to transfer
information around the world.
2_ The internet is a landscape of information available on almost every
topic imaginable. This information appears almost chaotically
distributed to the world but holds clear patterns. For instance, linking
information together are various structures like government web links,
search engines and FAQ documents.
3_ The internet is a community of 500+ million individuals. These are
real people who choose to interact, discuss and share information
In this example, let me just draw your attention to the way most of our
research effort focuses on the second definition: a landscape of
information. Much of the best information originates in the third
definition: the internet is a community. Sometimes it is far more
effective to ask real people than search the information cyberspace.
What I just mentioned is not so important as the technique I just used.
I broke the large seemingly chaotic system into smaller pieces: pieces
that hopefully make more sense. Eventually, when we've made sense of the
little bits, perhaps we can comment astutely on the big-picture.
Information, transaction, entertainment
There is a triad of functions to all online activity:
Function - Activity - Unit
Information - Research - The Fact or Conclusion
Exchange - Business - The Transaction
Entertainment - Play - The Experience
Each internet function grows at a different rate and moves in a
different direction. The development of forums is firmly in the smallest
segment dealing with information. This segment is quite poorly organized
and confusing. The entertainment function in contrast is well financed
and graphically innovative with clear, profitable opportunities.
Much of the web is prepared with Exchange or Entertainment in mind.
"Brochureware" (purely promotional webpages) is rarely required for
research but is critical to securing a transaction. Entertainment
related or just entertaining websites abound. Let us recognize just how
few webpages are information & research related.
My own experience suggests we are just beginning to see the movements
towards profiting from providing information. Direct selling of
information is still chaotic and unrewarding.
The way information is packaged has a great bearing on the content,
quality and use of the information. This theme is evident throughout the
work of the Spire Project, and is particularly applicable to internet
information. Webpages, text files, software, email and database entries
each have particular qualities. Each shapes, constrains and restricts
the informative content. These particular qualities apply irrespective
of the information involved.
Books are dense, factual, a little old. Articles are short, sharp, more
recent. News is puff, introductory, immediate. Each way the information
is packaged, each format, presents the information to set standards.
Information formats on the internet are the same. Webpages are
graphical, technical to produce, and not easily updated. FAQs are easier
to maintain, text only, and attract more peer review. Mailing lists are
simpler still, text, short, immediate, very peer-reviewed, characterized
by discussion and resource discovery. Newsgroups are characterized by
extremely low costs, vulnerable to trashing, poorly managed. Email is
simple use, one-to-one discussion.
Let's look at books more closely. Books are created by authors who have
something to write. Books are printed and marketed by Publishers to the
bookstores that then provide it to the readers. Each facet of this
process defines the resource. Books have quality, editorial vetting but
minimal peer-review, marketable value and a potentially lengthy
When it comes to research, why look for a book when investigating
digital money? Books would just have the wrong qualities - would present
the information poorly. We need a more current format (digital money is
a fast moving topic), and a more peer-reviewed format (books have
editorial vetting but not intrinsic peer-review). Why not search for a
mailing list, an FAQ, or an association website. These formats have
qualities more appropriate to our question.
Information flows also impress patterns on internet information. Most
information is transplanted to the web - first created elsewhere. The
source of information imparts as much pattern as the eventual format the
Information may appear as a webpage, and conform to our expectations for
all webpages but the information may have been prepared from the
discussion on a mailing list - and thus enjoy a more topical, specific,
timely and peer-reviewed quality.
Let's look at FAQs. The best resource in the world on copyright law is
the musings of a group of copyright lawyers who form the copyright
mailing list. The copyright FAQ supported by this group is a logical
document summarizing much of the discussion of this mailing list. FAQs
are vetted by the news.answers team, then automatically mirrored around
the world. From its origins in the mailing list, the FAQ is a
peer-reviewed document, often full of links to further resources,
topical, knowledgeable and factual. As an FAQ, the document is not
immediate, graphical or financially rewarding (some FAQs stagnate).
Only some internet information is created within the internet
environment. The concept of 'brochureware' describes the common traits
to promotional webpages directly prepared from paper promotional
One of the more exciting trends is the movement of information from the
dusty shelves of government offices and association libraries to their
more accessible websites. The quality of information retained in your
average government agency, from quality research reports, to detailed
studies, to current industry monitoring is very high. These qualities
are then brought over to the web format. Such web-documents tend to be
isolated (not linked to other related resources) and perhaps a little
behind the time line but of a generally high quality.
An exciting holistic view of the internet information landscape is based
on these descriptions. Imagine, for a moment, information flowing
through a collection of systems. At certain points, information groups
together, and generates new, perhaps higher quality information, which
then flows in a different system, a different direction, to different
The flow of information from one person to another, from one format to
another, imprints qualities to the information along the way. Each
organization, or subsequent re-organization, imparts specific styles and
conventions and quality to the result.
Let us proceed to a third set of patterns. Information appears on the
internet for one very specific reason. Someone Publishes (DUH). The
motivation behind publishing colours the information. This is a pattern
we can use to quickly judge the contents of a webpage.
Ask yourself who is publishing, and why.
One of the biggest publishing segment a year ago were individuals
publishing documents derived from their personal expertise. A typical
document would be one with minimal peer review, a list of aging links to
further resources, simple graphics, variable to short length, prone to
bias but moderately reliable because the publisher knows their topic
well. These pages are often located on web pages with private
sub-directories (usually starting /~name/).
Commercial sites publish mainly for the promotional value. Their
secondary purpose is to provide sales information to prospective
clients. Rarely do commercial sites go beyond this. Commercial webpages
often reside on their own domain name, as a .com, or in sub-directories
- without the tilde symbol. Commercial sites also tend to age badly.
They are very noticeable from their front page.
Government agencies are emerging as valued publishers. Slowly their
dormant information becomes available through this new medium. Currently
almost all government documents on the internet also appear in print,
meaning they are factual, exhaustively reviewed, tend to be a little old
(but age well), and come from highly paid knowledgeable people who
believe it is their duty to inform others. Such documents are lengthy
and appear on .gov domains.
These patterns are simple to see.
Grant-funded projects create brilliant research resources and hold much
promise in pushing the limits of this technology. I am eager to see the
results of the US Patents project, and appreciate the value of having
Supreme Court rulings on the internet. Often such projects focus deeply
on content. Most projects reside on educational servers and are widely
discussed within knowledgeable groups.
Associations publish association-kind-of-things. Most are initially just
like the commercial webpages. With time such sites become much more
factual and research-worthy. Most associations are dedicated to
developing awareness of their chosen topic, albeit coloured by their
chosen bias. Few associations are significant publishers but in time,
this segment will begin to liberate dormant information within
Let's summarize. The key is to always watch who is the publisher. We can
assume a great deal, quickly. We are unlikely to find the latest changes
to patent law from government or commercial publishers. Such
organizations are simply not motivated to present such information.
Publishing is one achievement but you and I will never read any
information until we learn it exists. This simple fact creates even more
patterns to internet information. Knowledge of information moves through
set routes on its way from writer to reader.
Promotion is not simple. It is a process that takes time, effort and
perhaps money. Information without serious promotion tends not to be
promoted far from the source. Another way to phrase this; you must
search close to the source to find poorly promoted information.
A search engine indexes pages relatively indiscriminately. This also
means a site of quality is not likely to reach your attention. The odds
are not good, and from a promotion point of view, search engines
generate minimal traffic to your webpage. Search engines also drop you
rather randomly into a website. It is often necessary to move up a
directory to understand the purpose and motivation of a site you find
Information published through advertising tends to have a financial
payoff for the promoter. This kind of information tends to be
promotional information. Brochureware.
The alternatives are to promote a webpage or website through one of the
referral tools. Each such tool accepts links on some criterion. Each
tool you use to locate information also selects particular types of
information for your attention.
If you arrive at a document by recommendation through a mailing list,
the document is likely to be recent, on-topic and specific to the
purpose of the mailing list. Alternatively, (for poor mailing lists) it
will be wildly off topic and trash. You are unlikely to see referrals to
old documents or documents of historical importance. These are the
qualities most acceptable to the mailing list environment.
Directory trees, FAQs, guidebooks and related promotion tools all work
as historically important documents. In the past, such resources list,
describe and alert people to relevant information for the field. Slowly,
over time, this function becomes acknowledged, reinforced and promoted.
Time is the essence of this fame.
Webpages or websites found through historically important documents, by
their nature, tend to be long lasting websites with lasting importance
in the field. Such documents point to other similar documents or
websites that have achieved a long-lasting importance. You are unlikely
to find specific documents but rather sites that focus or bring together
information. In short, there is little motivation to link to specific
webpages, when a link to an important website is just as good.
Similar generalizations can be made of each type of promotional tool,
and become important in rapidly seeking our information which matches
our intention, as well as summarizing the likely motivation, and bias,
of webpages we are interested in.
Information Clumps. Information is created, nurtured, develops, gets
transplanted, gets arranged and then becomes visible through a process
which brings similar information together.
As we have discussed, there are factors deeply affecting all information
on the internet. Motivation, Preparation, Format and Promotion all
define the quality and content of any given item of information. With so
many influences, we should not be surprised to learn information
naturally groups together. In reality, there is nothing natural involved
- it is a social phenomenon reinforced each time you and I visit or read
one resource but not another.
History can explain some aspects of internet development. As a small
collection of sites become dominant in particular fields, by collecting
and delivering better content to more people, new sites find it
progressively more difficult to capture attention. This dynamic works
for websites reaching out for visitors, and discussion groups reaching
out for subscribers. In each case, seniority counts.
Seniority counts in several ways too. Promotion is directly related to
quality, interest, traffic and time. The longer a site is active, the
better the footpath develops, the more people visit. Secondly, quality
content is directly related to access to quality content, peer review,
and time/money. Important existing sites gain in every way.
This results in a grand system where the first-in, best-dressed, can
capture the high ground and secure a grand lead in awareness and
footpath over competitors who follow. Yahoo is a prime example of a
directory tree, not even the best in most areas, which has achieved
unparalleled traffic & awareness.
This competition is equally evident where no money is involved. Perhaps
your association wishes to create a new referral website, or an open
mailing list, or an informative guide. All sound concepts, effective
projects. However, if older, established resources exist, the work will
be long and arduous.
Despite the marketing message, the internet is not a world where the
best information floats to the top. The internet will not let you to
reach millions. You must compete for the attention, participation,
devotion and assistance in a manner very similar to building a business.
In concrete terms, information clumps on the internet. The best resource
could appear on any internet system (webpages, email mailing lists,
ftp-archives, FAQs, online databases, newsgroups...) but we can be
fairly certain the best information will congregate in just one or two.
Consider this as an application of the 80:20 rule. 80% of the good
information will be found on 20% of the formats, arranged concisely by
20% of the search tools.
Consider our article "Searching the Web"
(http://spireproject.com/webpage.htm). We progressively search different
web tools, looking for the most worthy. Searching the internet is the
same. You must touch each system to see which system is dominant, where
the information is congregating for your topic.
Bringing this together
In summary, we have broken down and discussed various qualities of
published information and promoted information. We have made sweeping
generalizations and educated guesses about information on the internet.
When a painter begins to paint, they have already visualized some of the
image. They already have a concept of the finished result. Internet
research is no different. We start by building a vision of the
information we seek. Who would publish it? Where would I find it? What
is its motivation? How would we find it? We now have a practical vision.
The address is one of the keys. The web address (or URL - Uniform
Resource Locator) for any item of information gives us a surprising
amount of information - particularly as we are making generalizations
about information patterns. We can guess if information resides on a
personal webpage, a funded university project, or a commercial project.
The information resides on a .gov website? - the quality is likely to be
higher and conform to our expectations of government resources.
We use this new-found experience in three ways. Firstly, we restrict our
searches to the most likely sources. Secondly, we quickly jump through
lists of resources (such as those generated by search engines) to the
sources that match our expectations. Thirdly, our assessment of
information quality can be guided by our snap-judgements of its origin
Internet newcomers often expect to have instant access to the latest
information at the touch of the button in beautiful colour and peer
reviewed quality prose. Who is publishing this? Where is this
information coming from? Who would help us find this? Such a vision is
fantasy. If we were instead to look for an association website,
dedicated to a certain type of research, or an informed newsgroup,
maintained by people passionate about sharing this technology, then we
have made four steps forward. We are clear about where to look for the
answers we seek, and we will know quickly if the answers are online.
Let us now leave this discussion on internet organization and internet
theory. This is tough newly discovered territory, more than a little
rough. I fear it will make most sense to people with considerable
experience with the internet. Let us now explore the fertile grounds of
understanding more familiar formats like books and news.
This document continues as Part 2/6
Copyright (c) 1998-2001 by David Novak, all rights reserved. This FAQ
may be posted to any USENET newsgroup, on-line service, website, or BBS
as long as it is posted unaltered in its entirety including this
copyright statement. This FAQ may not be included in commercial
collections or compilations without express permission from the author.
Please send permission requests to firstname.lastname@example.org