[Comp.Sci.Dept, Utrecht] Note from archiver<at>cs.uu.nl: This page is part of a big collection of Usenet postings, archived here for your convenience. For matters concerning the content of this page, please contact its author(s); use the source, if all else fails. For matters concerning the archive as a whole, please refer to the archive description or contact the archiver.

Subject: Web and Internet Search Engine FAQ (WISE FAQ) Dec'98

This article was archived around: Mon, 21 Dec 1998 07:30:22 GMT

All FAQs in Directory: web
All FAQs posted in: alt.journalism.newspapers, alt.journalism.freelance, comp.infosystems.search, soc.libraries.talk
Source: Usenet Version

Archive-name: web/wisefaq Posting-Frequency: monthly
Web and Internet Search Engine FAQ (WISE FAQ (copyright) 1997-1998) Copyright 1997-1998 Ken Bogucki krb@infobasic.com kenbog@netcom.com WISE FAQ (c) Ver. 3.6 DEC, 1998 An HTML Version of this FAQ can be obtain at: http://www.infobasic.com/pageone.htm The current ASCII version of this FAQ can be found at: ftp://rtfm.mit.edu This web site undergoes considerable change and new information is added weekly. This web site also contains a collection of various search sites & internet databases with links to some of the best core sites on the net. CHANGES IN THIS FAQ... 1. Some minor sections have been deleted. 2. The inclusion of a listing of select search engines, general, Meta and Geo specific search engines: Section 9.0 3. Minor revisions to Excite and Lycos. Beginning immediately, the WISE FAQ and its associated documents can be found at our new web site http://www.infobasic.com/pageone.htm All email queries, complaints or corrections should be, when possible, addressed to wisefaq@infobasic.com COPYRIGHT This FAQ is copyrighted material. The copyright is owned by the author of this FAQ, Ken Bogucki kenbog@netcom.com This FAQ may not be reproduced or distributed, in whole or in part, for commercial purposes without the express written permission of the author. This FAQ may be used for non-commercial purposes as long as the author is notified in advance and the entire FAQ is used without alterations (except for formatting purposes) and the copyright notice & warranty notice remain intact and a part of the FAQ. WARRANTY. This FAQ is an AS-IS document. !! WEB SITE ADDRESSES ARE CASE-SENSITIVE !! When necessary, double brackets [] are used in this FAQ for clarity. These brackets are not part of any search expression. Their only purpose is to separate the search words, expressions and results from the surrounding text. CONTENTS*** 1.0 Introduction 1.1 (Reserved) 1.2 Definitions 2.0 Search Engine Queries, A Quick Tutorial 2.1 All Search Engines Are Not Created Equal 2.2 Understanding Search Syntax & Odds and Ends 3. General Search Engines 3A Alta Vista http://www.altavista.digital.com 3A.1 Alta Vista Simple Searches 3A.2 Alta Vista Complex Searches 3A.3 Restricting A Simple and Complex Search 3A.4 Sorting Results by Ranking 3A.4.1 Simple Search Ranking 3A.4.2 Complex Search Ranking 3A.5 Misc. Information about Alta Vista 3B Excite http://www.excite.com 3B.1 Excite Concept Based Queries 3B.2 Excite Advanced Queries 3B.3 Excite Exact Match Queries 3C Lycos http://www.lycos.com 3C.1 Lycos Simple Searches 3C.2 Lycos Complex Searches 3D Infoseek http://www.infoseek.com 3D.1 Infoseek Simple Searches 3D.2 Infoseek Complex Searches 3E Web Crawler http://www.webcrawler.com 3E.1 Basic Searches 3E.2 Using Logical Word Operators 3F Yahoo http://www.yahoo.com 3F.1 Yahoo Menu/Simple Searches 3F.2 Yahoo Complex Searches 3G Euroferret http://www.euroferret.com 3H (Reserved) 3I Hot Bot http://www.hotbot.com 3I.1 Hot Bot Simple Searches 3I.2 Hot Bot Complex Searches 4.0 Meta Search Engines 4A Internet Sleuth http://www.isleuth.com 4A.1 Accessible Search Engines 4B Meta Crawler http://www.metacrawler.com 4C ProFusion http://profusion.ittc.ukans.edu 5.0 Specialized Search Engines 6.0 (Reserved) 7.0 Subject Trees 8.0 Quick Reference Card 8.1 Alta Vista 8.2 Excite 8.3 Lycos 8.4 Web Crawler 8.5 Yahoo 8.6 Infoseek 9.0 A Partial List of Select Search Engines 9.1 General Search Engines 9.2 Meta Search Engines 9.3 Geo Specific Search Engines 10.0 Contacting the Author **** 1.2 DEFINITIONS ***These definitions are applicable only to this FAQ. -APPLET A Java program found on some Web pages. -DOMAIN Last portion of an internet address; .com, .mil, .net, .uk, .it -HOST The computer where the Web page is located -META A program used to manipulate other programs. -URL Full internet address, http://www.xyz.com; ftp://abc.xyz.com; etc. -WILDCARD Symbol used to denote a number of missing letters, usually this symbol is a "*". -POINTER A search result that "points" to other sources of information. **** 2.0 SEARCH ENGINE QUERIES, A QUICK TUTORIAL. 2.1 ALL SEARCH ENGINES ARE NOT CREATED EQUAL Different search engines accomplish their job by taking different approaches to indexing the web. Some engines index every word of every page, some index the first hundred words, some index every word and filter out noise words. Noise words are words like: but, the, are, is, at, --words that have no particular meaning when used alone. In the phrase, [ "the quick brown fox jumped over the lazy dog" ], the noise words might be: the, quick, over, the. The definition of "noise words" will vary from search engine to search engine. To get a better understanding of this concept and how it's applied, go to the Excite search engine, http://www.excite.com, and run the following phrase search (a phrase search is any group of words enclosed in quotation marks): [ "to be or not to be" ]. Excite will not find matches for this phrase search. Excite considers all the words in this phrase as "noise words" and Excite does not index noise words. One of the most famous phrases in the English language cannot be found at Excite by using that phrase as the search criteria. Now go to Alta Vista, http://www.altavista.digital.com, and run the same phrase search. Alta Vista will display more than 500 hits for this query. This does not make Alta Vista the best search engine for all your needs. It is, however, the most inclusive search engine. 2.2 UNDERSTANDING SEARCH SYNTAX, ODDS AND ENDS These are general suggestions, however, they do apply to most search engines. Some expressions used in this tutorial: " " this is used to denote a phrase expression or search. All the words in the [ " " ] must be found at a web site to produce a hit. [ OR ] a OR b will find either the "a" keyword or the "b" keyword [ AND ] a AND b must find both "a" and "b" at a web site to produce a hit. [ ( ) ] these are used to organize a complex search expression A.) On the surface, two different queries may appear the same. However, search engines will interpret the queries differently, consequently the results will be dissimilar. For example: [ (labor OR labour) AND union ] is not the same as [ "labor union" OR "labour union" ]. The queries appear to ask the same question, however, search engines will see differences in the structure of the two queries. These differences will effect the result. The first expression will find those web pages that contain any of the words, anywhere in the document, regardless of the number of words separating the different sides of the AND expression. The first expression will find: [ labor should organize into a union ], [ labor and management should realize that success depends on the union of their interests and aims ], etc. The second expression will only find those web pages where the words "labor" and "union" or "labour" and "union" appear next to each other. This is because the [" "] in the second expression makes that query a phrase search. Phrase searches require the words in the phrase to be next to each other in the web document. The second expression will find: [ a labor union is in the interest of workers ], [ a labour union is the best way to counter management ]. The second expression will not find [ labor should organize into a union ]. Note, the first example or expression, however, will also find the same pages as the second example. The reverse is not true. B.) The web is referred to as the world wide web. It is important to realize that words and phrases that are common in North America, for example, are not necessarily common anywhere else in the world. Searching for corrugated steel in the UK is probably useless. In the UK corrugated steel is usually called corrugated iron. Likewise, there are regional differences in terms and concepts. The individual words soda or pop can refer to a soft drink. In some parts of the USA, soda you mix with Scotch and pop is a soft drink. Also keep in mind differences in spelling: labor/labour, color/colour, organise/organize. A world wide search for [ "labor organizations" ] might be best if the search query was: [ "labor organization" OR "labour organisation" ]. The search criteria would be better phrased: [ "labor organization" OR "labour organisation" OR "trade unions" ]. Allow for the possibility of misspelled words. One search for politics also found hits when the search word was misspelled "polotics". Remember, English is not always the first language of the people publishing web pages. C.) Probably one of the more flexible search options available at most search engines is the "*" operator or wild card operator. Wild card searches allow queries to contain incomplete words, however, this kind of query will probably yield a considerable number of unnecessary hits. For example: [ orang* ] will produce hits for [ orange ], [ oranges ] and [ orangutan ]. If you're searching for something to eat instead of something that co-starred with Clint Eastwood, consider restricting wildcard searches with addition search words. For example: [ orang* AND fruit ] will not produce hits about Clint Eastwood's co-star. The search has been limited with the inclusion of the word [ fruit ]. D.) The position and organization of the keywords in the search query is also important. For example, if you're looking for timely information on earthquakes your keywords might be: "earthquake", "information", and "important". If you run a complex search at Alta Vista using the following query: [ "earthquake information" AND important ], Alta Vista will display more than 960+ hits. The query: [ important AND "earthquake information" ] will generate less than 850 hits and the query [ "important earthquake information" ] will generate only 1 hit. The last expression may seem the most logical expression to use, however, things are not always that simple. SUMMARY OF VARIOUS SEARCHES......................... earthquake AND important AND information 22846 hits earthquake AND "important information" 1012 hits "earthquake information" AND important 960 hits "important information" AND earthquake 957 hits important AND "earthquake information" 850 hits "important earthquake information" 1 hit .................................................... If you're looking for office furniture on the net there are a number of possible search expressions and each expression will provide varying degrees of success. For example these search expressions were run at Alta Vista, the results of each search is listed. ................................................... "office furniture for sale" 40 hits "for sale office furniture" 8 hits ................................................... E.) Also important at most search engines is the case of the query. In most instances search engines assume a lower case query is a case insensitive query. This means that the search engine will find both upper and lower case occurrences of the search expression. However, if the search expression contains upper case letters the search engine will treat the query as a case sensitive query and will only find exact matches for the query. Obviously this will effect the results of any query. For example: ................................................ "Apples Peaches Pumpkin Pie" 65 hits "apples peaches pumpkin pie" 116 hits ................................................. Both of these queries were run at Alta Vista. The first expression is a case sensitive search. The second expression is a case insensitive search. This second expression produced results that included web sites where the case sensitive expression, "Apples Peaches Pumpkin Pie", could also be found. The first expression, the case sensitive expression, only found exact matches to that search query. In most cases, the value of an upper case query rests in its use as a utility to restrict a search. F.) If searching for information in a specific country, consider using a search engine that will allow you to restrict the search to a specific country domain. For a list of country domain names go to http://www.infobasic.com/100codes.htm. The following sections in this FAQ, about specific search engines, will explain the process of restricting searches based on domain names. G.) Lastly, some keywords used in a search expression are useless. This is not because the keywords are not specific enough, it is because the keywords are too common on the web. For example: if you're looking for a piece of shareware and you run the query [ shareware AND download ] Alta Vista will report 280,000 hits. However a search for a specific piece of shareware (by name), [ "xyz.zip" ], will produce fewer and more precise hits. Even a partial file name, [ xy*.* ], is more effective than the first example. The keywords, "shareware" and "download", are too common on the web to produce any kind of meaningful result. One last word, some search engines go to some lengths to advertise the fact that their site will generate twice as many hits as "xyz" or that they index twice as many pages as so and so. The issue of quantity is secondary. The real question relates to the quality of the first 10, 20 or 30 hits. If your query is properly structured, the information you're looking for should show up in the first several dozen hits. If you haven't found the information you need in the first two or three pages or if the ranking falls below 75%--consider restructuring your query and try the search again. **** 3A.0 ALTA VISTA SEARCH ENGINE http://www.altavista.digital.com Alta Vista is one of the more complex search engines. It may seem intimidating, however, for those with a serious interest or pressing need to find information, Alta Vista may be the place to go. Like other search engines, Alta Vista has simple and complex searches. It also contains several other options that allow the user to optimize their time and efforts. One is ordering your search results based on ranking (not necessarily confined to the original search criteria) and the ability to restrict the search to certain types and locations of Web pages. 3A.1 ALTA VISTA SIMPLE SEARCHES apples peaches "orange juice" : documents where only "apples" or "peaches" or the phrase "orange juice" appear. +apples +pears -"orange juice" : documents where only "apples" and "oranges" appear and not the phrase "orange juice". Wildcard Operator "*" app* : all documents that contain the words "apples", "applets", "appraise", etc. It will not find "applications" or "applicable". The "*" notation can only be used to represent a max. of 5 characters. The above Operators can be used in any combination. For example: +oranges -app* : documents that contain the word "oranges" but not the words "apples", "apply" and "applets", etc. 3A.2 ALTA VISTA COMPLEX SEARCHES There are two ways to construct an Alta Vista complex search. You can use either Logical Word Expressions or Logical Symbol Expressions in the search request. Alta Vista will interpret both types of logical expressions the same way. WORD EXPRESSION is the same as SYMBOL EXPRESSION ---------------------------------------------------- a AND b is the same as a & b a OR b is the same as a | b a NOT b is the same as a ! b a NEAR b is the same as a ~ b SPECIAL NOTE: Logical word and symbol expressions are precise search tools. The search expression... apple AND peach...will find "apple" and "peach" but not "apples" and "peaches". In Alta Vista, the complex search page contains an editing window 3 lines by 70 characters. This window allows you to viewand edit the entire complex search expression at one glance. AND apple AND orange : sites that contain the word "apple" as well as the word "orange", however, this expression will not display those sites that have "apples" and "oranges" in the same document. (See Special Note above) OR apple OR orange : sites that contain either the word "apple" or the word "orange". NOT apples NOT oranges : sites that contain the word "apples" but not the word "oranges" NEAR apple NEAR juice : will generate a list of pages where the word "juice" is within ten words of the word "apple". Note, the Alta Vista NEAR operator uses a default 10 word range. 3A.3 RESTRICTING A SIMPLE AND COMPLEX SEARCH This is a method of confining the Web search to certain pages or sites that meet specific criteria. [partial list] anchor:click-here : only search pages that contain the phrase "click-here" in the text of a hyperlink. applet:<java class> : only search pages that have the specified Java class applet in the applet tag of the Web page. domain:ie : only search pages that originate in the domain .ie (Ireland), or any of the other country codes and the miscellaneous standard codes, .com, .org, .mil, etc. host:xyz.com : only search those pages that reside at the host name xyz.com. image:apples.jpg : search those sites that contain the image tag, "apple.jpg". link:xyz.com : search those sites with a link to xyz.com. If you have a Web page and are curious about how many other pages carry a link to your page then run this search; link:www.yourhomepage.com. title:"Apples and Oranges" : search those pages that have "Apples and Oranges" in the title of the Web page. 3A.4 SORTING RESULTS BY RANKING Ranking results, simply, is a way to sort the results of your search. For example, if you use a complex search for "apples" and "oranges", you can instruct Alta Vista to sort the results so that those sites with the most references to "apples" appear first in the result list. Simple searches are sorted automatically by Alta Vista. 3A.4.1 Simple Search Ranking Alta Vista automatically uses a formula to sort the results of a simple query. Results are ranked according to the following criteria: 1. results score highest if the search criteria are meant in the first few words of a document 2. query words and phrases are found close to each other in a document 3. query words or phrases appear more than once in a Web document. 3A.4.2 Complex Search Ranking On the complex search page, there is a separate window for ranking. After establishing the search expression, go to the ranking window and insert those words (these words need not be the same words you used in the search expression) that will be used to sort the result list. For example, if your search expression is; "apples & oranges", you may then use the ranking window and include the word "California". The end result is that the search will produce all those documents that contain the word "apples" a nd the word "oranges" in the same document. With the ranking example above, Alta Vista will then sort the result list so that all documents that have a reference to "California" will appear first in the list. More than one word or phrase may be used in the ranking window. 3A.5 MISCELLANEOUS INFORMATION ABOUT ALTA VISTA 1. Alta Vista can handle phrases in a search expression in a number of ways, however, the best way to search for a phrase is with the use of double quotes. For example: "United States Army" or "orange juice", etc. 2. In Alta Vista a lower case search expression is a case-insensitive search. Using capital letters in the search expression restricts the results to exact matches. For example, if you search for "oranges" you will get "oranges", "", "oRanges", etc. (case-insensitive search). If you search for "Oranges" you will only get "Oranges" in your search results. The results will not show up instances of "oranges" "oRanges", "oRanGes", etc. (case-sensitive search). 3. The wildcard marker [ * ] has certain restrictions. The "*" marker requires that at least three letters preceded the notation, for example, "*go" & "or*" will not work, however, "ora*" and "appl*" will work. Also the "*" marker will only display from 0-5 letters; "appl*" will display "apples" & "applets" but not "applications". **** 3B EXCITE SEARCH ENGINE http://www.excite.com Excite uses several methods for finding the requested information. One is a concept based query, another is an advanced based query and the last is an exact match query. NOTE: Excite provides it's own relevancy rating. The user cannot directly change or alter this rating. Excite uses " " marks to indicate a phrase search, for example, "apple butter" will find those sites where the phrase --apple butter-- can be found but not those sites that list only the word apple. 3B.1 A concept based query utilizes the relationship between words and ideas to find matches. For example, in a concept based search the keyword "fruit" will yield "fruit", but also, "apples", "oranges", etc. Concept based queries rely on the user requesting information in the form of one or more keywords. 3B.2 ADVANCED BASED QUERIES In a Advanced based query the operators "+" and "-" are used. +apples +oranges : documents that have the word "apples" and the word "oranges" on the same page. -apples +oranges : documents that have the word "oranges" but not the word "apples". +apple -pears -tarts : documents that have the word "apple" but not the words "pears" or "tarts". This query will not return "apple tarts" but will return "apple turnovers". 3B.3 EXACT MATCH QUERIES Exact match queries use Logical Word Expressions to find documents. The logical word operators are: AND, OR, AND NOT plus (). Using the logical word operators will turn off Excite's concept based search. A keyword search for "fruit" will instruct Excite to search only for those sites that contain the word "fruit". Excite will display sites that contain related words like "apples", "oranges", etc. apples AND oranges : sites that contain both the words "apples" & "oranges" in the same document. apples OR oranges : sites that contain either the word "apples" or the word "oranges". apples AND NOT oranges : sites that contain only the word "apples" but not those sites that contain the word "oranges". () is an organizational operator. For example, "apples AND NOT(oranges OR peaches)" will produce sites that contain the word "apples" but not the words "oranges" or "peaches". **** 3C LYCOS SEARCH ENGINE http://www.lycos.com Lycos has two search levels, simple and complex. In the case of Lycos, the complex search function is menu driven and not difficult to use, however, because of its menu interface this Lycos search is somewhat more restrictive than other search engines. 3C.1 STANDARD SEARCH (Simple) Standard searches do not use Logical Word Operators. apples oranges peaches : will yield sites in which all three words appear [ - ] This is a restrictive operator. apples oranges -berries : all documents in which "apples" and "oranges" appear but not those pages where "berries" appear. If "apples", "oranges" and "berries" appear in the same document, this document will not appear in the search results. [ $ ] This is a wildcard operator. app$ : will yield all pages in which the words, "apples", "applications", "applets" appear. [ . ] This a delimiting tag. Searching for "apple" will yield "apples" and "apple", however, if the search were "apple." then only those documents with the word "apple" will be returned and not those pages with the word "apples". 3C.2 CUSTOM SEARCHES(Pro Search) Complex searches are done through a menu interface. All of this is fairly intuitive. Just a very brief explanation is required here. Everything that appears on the complex search page has a corresponding on screen example and explanation. **** 3D INFOSEEK http://www.infoseek.com Infoseek has two search options, simple and complex. Both search options provide only limited query syntax. Infoseek has no way to rank search results. However, Infoseek is fast and is more than suitable for those quick search needs. The site is low graphics and works well with text browsers. 3D.1 INFOSEEK SIMPLE SEARCHES Infoseek's simple searches use a combination of commas, plus and minus signs, quotes (to make phrase searches) and caps. apples oranges : will find pages with either "apples" or "oranges". +apples oranges : normally will return pages with just "apples", however, pages that contain "oranges" as well are acceptable. Those pages, however, will receive a lower ranking. "apple juice" : will display those pages where the words "apple" and "juice" appear next to each other. Caps are used to indicate proper names and a case sensitive search: Johnny Appleseed : will find only pages with the name "Johnny Appleseed". Johnny,Appleseed : will find pages with either name. Note: commas are only used to separate names. apples -grapes : will find pages with "apples" but not with the word "grapes". 3D.2 INFOSEEK COMPLEX SEARCHES There are only a few addition symbols that distinguish a complex query from a simple query. the pipe symbol [ | ] is used to construct a search within a set of search results. fruit | apple | juice : will find pages that refer to "fruit" then search out those pages within that result that contain the word "apple". Finally, the last group of results will be searched for any pages that contain the word "juice". title:fruit : will find any pages where the word "fruit" appears in the title of the web page. url:www.orange.com : will find those site that contain the address "www.orange.com". The search expression [ url:fruit ] will find those sites that have the word "fruit" in the URL, for example, "www.fruit.com". link:www.juice.com : will find those sites that are linked to the specified URL site:xyz.com : will bring up all the sites located at the specified address. **** 3E WEBCRAWLER http://www.webcrawler.com One of the better Web search engines is WebCrawler, simplybecause of its flexibility. 3E.1 BASIC SEARCHES apples oranges pineapples : will provide information on those documents that contain any of the words: "apples", "oranges", "pineapples". A simple search expression. 3E.2 USING LOGICAL WORD OPERATORS AND apples AND oranges : will provide information on documents where both the words "apples" and "oranges" appear. OR apples OR oranges : will display information on pages that contain either of the two search words. This is similar to the Simple Search example except that this search employees specific logical word operators. The first search could also be run as: apples OR oranges OR pineapples. NOT fruit NOT apples : displays information about "fruit" but not those pages that reference "apples". NEAR cheese NEAR/15 wine : will display those pages that contain the word "cheese" and is within 15 words of the word "wine". Note, you can specify any number of words in the NEAR operator, NEAR/20, NEAR/5, etc.. ADJ world ADJ war : will display Web pages that contain the word "world" immediately followed by the word "war" " " Quotes have the same effect as the ADJ command above: "world war" will provide the same results as: world ADJ war. () Parenthesis are used to organize complex search expressions. For example: (wine NEAR/10 cheese) AND apples or "California wine" AND prices NOT (white OR rose) **** 3F YAHOO http://www.yahoo.com Yahoo is one of the most intuitive search engines to use. There are two ways to search Yahoo, one is a very simple, menu driven search and the second is by use of logical word operators. However, this second search option is also a menu driven search. 3F.1 MENU/BASIC SEARCHES The Menu interface is easy to use and understand. Simply select the type of material you want to search (WEB, Usenet, etc.) and how the search should be conducted. Select how the results should be displayed, 20, 30, 40 per page and click the search button. 3F.2 MENU/ADVANCED SEARCHES [ + ] apples +oranges : those sites that have "apples" as well as "oranges" in the same document. [ - ] apples -oranges : those sites that have "apples" but not those sites that have "oranges". [ t: ] A restriction operator that will confine the search to Web page titles. For example, t:apples will restrict the search to pages with the word "apples" in the title of the page. It will not search a page if the page title is "Oranges". The correct usage of the "t:" operator in a search expression is [ +t:oranges +apples ] this expression will yield documents that have the word "apples" in the Web page and the word "oranges" in the Web page title. The expression, "+apples t:oranges" is incorrect. The "t :" operator must immediately precede the search word. [ u: ] A restrictive operator. Confines the search for the keywords to certain URLs. For example, [ u:xyz ] will restrict the search to URLs that have an "xyz" in the url address. The "u:" operator follows the same rules listed for the "t:" operator. [" "] Phrase combining operator: "orange juice", "apple juice", etc. [ * ] Wildcard search. For example, "pea*" will return "pears", "peas", etc. **** 3.G EUROFERRET at http://www.euroferret.com EuroFerret is a small search engine run off several Sun computers. This search engine specializes in web pages located in the European community. The search syntax is extremely simple. Euroferret accomplishes its magic by examining web pages and deciding on the 60 most important words and 12 key phrases in each document. Euroferret works on the assumption, for example, that page titles are more important than disclaimers. The search at Euroferret is very intuitive. Once a search is run, Euroferret will list the best possible matches to the query and will suggest terms, through check boxes, that might be used to further refine the search. However, because of the way that EurroFerret indexes web pages do not expect miraculous results. Though it purports to index more pages than Alta Vista, Euroferret's indexing is less concise and all inclusive than other search engines. That said, Euroferret is still a good place to go if the information you're looking for is located in the European community and you have a reasonable handle on the subject matter. The engine is fast and the results reliable. Euroferret also displays a text only version for people who may not want the graphics or who use a text browser like Lynx. **** 3.I HOT BOT at http://www.hotbot.com This is a service of WIRED MAGAZINE. Hot Bot uses a graphic interface with pull down menus and check boxes to make searching easier. However, Hot Bot lacks some of the sophisticated query options available at other sites. Even some of the more elemental query options are missing from Hot Bot. For example, Hot Bot does not allow proximity searches ("apple" within 10 words of "juice") and Hot Bot does not support wild card searches. At most search engines, a search for "appl*" will yield results that contain "apple", "apples", "applejack", and "applesauce." This wild card search is not possible at Hot Bot. ***** 3I.1 HOT BOT SIMPLE SEARCHES A phrase or group of words are entered and a pull down menu specifies if the result should include all the words, some of the words, a Boolean search or if the words entered should appear in the title of a web page. Hot Bot allows simple searches to be enhanced by permitting the user to select which countries the web search should concentrate. In addition, the user can refine their search by specifying if the web page should be several weeks old, several months old or several years old ( a number of time parameters can selected). The user can further restrict the search by specifying that the web pages must have, audio, video, images or Shockwave material. This pull down menu interface is very intuitive and really needs little explanation. 3I.2 HOT BOT COMPLEX SEARCHES The complex search or Super Search is an expanded version of the simple search option. The date can actually be specific, before such a date but not after this date, etc. The word or phase search itself can be further broken down and the kind of media type included in a search is expanded to include, Java, Java scripts, Acrobat, ActiveX and can also include extensions, .gif .txt .dbf etc. All of this is accomplished either through pull down menus or check boxes. **** 4.0 META SEARCH ENGINES A Meta Search engine will search a number of general search engines at the same time from a single query. 4A INTERNET SLEUTH http://www.isleuth.com Internet Sleuth is a unique search engine. It will allow you to search several Web search engines simultaneously, up to six different search engines. However, it is important to realize that since the search expression must be understood by all search engines the expression must be common to the multiple search engines. Simple searches and phrase searches are the best. For example, "a basket of apples and oranges" : this phrase search is understood by most search engines. Internet Sleuth also allows you to use multiple search engines for Usenet, Web Reviews, News and Headlines, Business and Finance, and software searches. Below is a list of some of the search engines available for various topics. Internet Sleuth uses a graphic interface. The interface is self explanatory. 4A.1 ACCESSIBLE SEARCH ENGINES WEB SEARCH ENGINES AVAILABLE Lycos Excite Alta Vista Magellan Web Crawler Yahoo REVIEWED Web SITES Excite Previews Lycos Top 5% Yahoo, New Listings Magellan Reviewed Sites NEWS AND HEADLINES AP Headlines News Tracker Washington Post Headlines Electronic Newsstand BUSINESS AND FINANCE CNN Financial News Business Wire Hoover's Company Capsules PR Newswire APL Quote Service SOFTWARE Info-Mac shareware.com Winsite Windows Software USENET NEWS Alta Vista Usenet News Deja News Hotbot Reference.com **** 4B METACRAWLER http://www.metacrawler.com Metacrawler is a multiple search engine site. MetaCrawler will simultaneously run searches on several search engines and display the results. Currently, Metacrawler uses, Infoseek, Excite, Lycos, Yahoo and Alta Vista to run its simultaneous searches. There are two search options in MetaCrawler, a standard search page and a power search page. STANDARD SEARCH In the standard search you simply type in your keywords (no special syntax, or Logical Word or Logical Symbol Operators) then click the appropriate button if you want all words found, any of the words found or if you want the keywords treated as a phrase. Click the "GO" button and MetaCrawler will process your request through the various search engines. The results will be displayed in the usual format. POWER SEARCH The power search is basically the same as the standard search. There are, however, several additional options included in the power search page. These options allow you to decide how many search results are to appear per page, how many results per source, and where, geographically, the results should be obtained. For example: everywhere, North America, Europe, South America, etc. **** 4C PROFUSION http://profusion.ittc.ukans.edu There are two pages for the ProFusion search engine. The first page requires the use of a browser that supports tables and Java scripts. The second page does not have these requirements. The search syntax for both pages is the same. (Java enabled and table capable browsers) http://profusion.ittc.ukans.edu (other browsers) http://profusion.ittc.ukans.edu/ProFusion1.html ProFusion allows the user to search either the Web or the Usenet. There are three type of searches available at this site: default, Boolean searches and phrase searches. For the sake of clarity and uniformity Boolean searches will be referred to as either Logical Word or Logical Symbol searches. A default search is nothing more than a list of multiple keywords or a single keyword search. A phrase search is any search enclose in [ " " ]. In a phrase search all the words must appear together exactly as they appear in the " " marks. Logical Word or Logical Symbol expressions allow for greater versatility in the query. The Logical Word and Logical Symbol expressions used at ProFusion are identical to those used at Alta Vista and their meaning and use is identical. ProFusion uses the following search engines. The number of engines and the choice of engines is left up to the user. When ProFusion returns the results it will delete any duplication among the selected search engines. Alta Vista Excite Lycos Open Text Yahoo Infoseek Hot Bot Magellan. ***** 5.0 Specialized Search Engine. This subject covers search engines that seek out specific types of information from the web and internet, medical, legal, etc. As the web becomes more and more complex more of this type of search engine will become more common place. Essentially most of the engines work in a similar fashion to the general search engines and usually the search syntax is not as complicated as the general engines. This FAQ provides general information on these sites and an explanation of the search syntax when necessary. In most cases the search syntax utilizes simple logical word or logical symbol expressions. 5.1 5.2 Internet Legal http://www.ilrg.com/ "...Internet Legal Resource Guide. A categorized index of 3100 select web sites in 238 nations, islands, and territories, as well as more than 850 locally stored web pages and other files, this site was established to serve as a comprehensive resource of the information available on the Internet concerning law and the legal profession ... Designed for everyone...it is quality controlled to include only the most substantive legal resources online." 5.3 Newswise (Medical Research) http://www.newswise.com/search-1.htm 5.4 Satellite Ency. http://www.tele-satellit.com/cgi-bin/local_search 5.5 Sydney Math Search http://www.maths.usyd.edu.au:8000/MathSearch.html Provides a search ability for over 90,000 documents on mathematics and statistics around the web. Most of the documents relate to research or university level mathematics. Search instructions are very easy and the site is usable with both a text based browser and the usual graphics browser. 5.6 U.S. Business Advisor http://bacchus.fedworld.gov/Search_Online.html A web site that provides searching capability for business information from USA Federal sites. Indexing the contents of over a half a million government sites and notes those sites that contain information of value to business. The search expression is a plain language or natural language query. Site provides access to a text only version. **** 7.0 SUBJECT TREES Subject trees are not search engines. Subject trees are pages where web sites and sources of information are arranged according to subject. For example the subject heading "History" might lead to subsections: "American", "European", "African", "Asia," each subsection listing an appropriate list of general web sites. Following the "American" link might lead to even more web sites also sorted by specific headings: "American Revolution," "American Civil War," "Mexican-American War," etc. Each section leading to addition web sites and each section again broken down to more specific headings. For example, "American Civil War" might lead to subheadings: "Union Forces," "Naval Battles," etc., each subsection with appropriate web site listings. Two of the best general subject trees around are: BUBL at http://bubl.ac.uk Berkeley Subject Tree at http://sunsite.berkeley.edu/InternetIndex.html These sites are worth a first visit when beginning any net research project. **** 8.0 REFERENCE CARD NOTE: This reference card is designed on the assumption that you have a basic understanding of the search expressions and criteria covered in prior sections of this FAQ. The double brackets [] in the reference card are not part of the query syntax. **** 8.1 ALTA VISTA http://www.altavista.digital.com [apples "orange juice"] "apples" or the phrase "orange juice" [+apples -"orange juice"] "apples" & not the phrase "orangejuice" [app* (wildcard)] "apples", "applets", "appraise" (wildcard in Alta Vista requires Min. of three letters before the wildcard and will return from 0-5 characters Max.) Complex Searches (Can use either logical word or symbol expressions) AND or &, OR or |, NOT or !, NEAR or ~ [apple AND orange] "apple" & the word "orange" [apple OR orange] "apple" or the word "orange" [apples NOT oranges] "apples" but not the word "oranges" [apple NEAR juice] "juice" within ten words of "apple" RESTRICTING A SIMPLE AND COMPLEX SEARCH [anchor:click-here] pages with "click-here" in the hyperlink. [applet:<java class>] pages with the Java class in the applet tag [domain:xyz] pages in the domain "xyz" [host:xyz.com] sites at the host name xyz.com. [image:a.jpg] sites with an image tag, "a.jpg". [link:xyz.com] sites with a link to xyz.com. [text:orange] sites with "orange" in the visible text [title:"A, B and C"] sites with "A, B and C" in the title. RANKING Simple searches: The ranking is automatic. Complex searches: Enter any word or groups of words in the ranking window. Alta Vista will sort the results based on these words. **** 8.2 EXCITE http://www.excite.com Concept Based Search [+apples +pears] "apples" and "pears" [-apples +peach] "peach" but not "apples" [+apples -pears -berries] "apples" but not "peaches" or "berries" Exact match queries use Logical Word Expressions to find Web documents. The Logical Word Operator are: AND, OR, AND NOT. Using logical word expressions will turn off Excite's concept based option. Precise searches require the use of Logical Word Operators. [apples AND peaches] pages with "apples" and "peaches" [apples OR peaches] pages with either "apples" or "peaches" [apples AND NOT peaches] pages with "apples" but not with"peaches" **** 8.3 LYCOS http://www.lycos.com STANDARD SEARCH Standard searches do not use logical word operators. [apples oranges peaches] pages where any of the words appear [apples +berries] "apples" and "berries" [apples -berries] "apples" but not "berries" [app$ (wildcard)] "apples", "applets" etc.. [apple.] "apple" but not the word "apples" CUSTOM SEARCHES Complex searches are done through an intuitive menu interface. **** 8.4 WEBCRAWLER http://www.webcrawler.com [apples oranges or apples OR oranges] pages that contain any of the words. [apples AND oranges] "apples" and "oranges" [fruit NOT apples] "fruit" but not "apples" [cheese NEAR/(x) wine] "wine" is within "x" words of "cheese" [world ADJ war] "world" & "war" are next to each other [".. " Phrases searches] "us army", "jack and jill went up the hill" [(..)] used to organize search expressions **** 8.5 Yahoo http://www.yahoo.com Advanced Options: [apples +oranges] "apples" as well as "oranges" [apples -oranges] "apples" but not with "oranges". [t:] confines the search to certain Web titles. [u:] confines the search to certain URLs. [" "] phrase operator "orange juice", "apple juice", etc. [pea* (wildcard)] "pears", "peas", "peaches" etc. **** 8.6 Infoseek http://www.infoseek.com Simple Searches [apples oranges] either "apples" or "oranges". [+apples oranges] "apples", pages with "oranges" are ranked lower. ["apple juice"] "apple" and "juice" appear next to each other. Caps are used to indicate proper names and a case sensitive search: [Johnny Appleseed] will find the name "Johnny Appleseed". [Johnny,Appleseed] will find either name. Note: commas are only used to separate names. [apples -grapes] "apples" but not "grapes". Complex Searches [fruit | apple | juice] will find "fruit" then search results for "apple" then search those results for "juice". [title:fruit] "fruit" in the title of the page. [url:www.orange.com] sites with address "www.orange.com". [url:fruit] sites with "fruit" in the URL, "www.fruit.com" or "www.fruitandnuts.com". [link:www.juice.com] will find sites linked to the specified URL [site:xyz.com] will find all sites at the specified address. **** 9.0 Partial List of Select Search Engines 1.0 General Search Engines Alta Vista at http://www.altavista.digital.com/ AT1 at http://www.at1.com/ Excite at http://www.excite.com/ Galaxy at http://www.einet.net/search.html Go2.com at http://www.goto.com/ HotBot at http://www.hotbot.com/ i-Explorer at http://www.i-explorer.com/home.dll?? Identify at http://www.identify.com/ Infohiway at http://www.infohiway.com/ Infoseek at http://guide.infoseek.com/ Internet Explorer at http://www.iexplorer.com/ Internic Directory at http://www.internic.net/dod/ Intuitive Web Index at http://intuitive.iexp.com/ Jayde at http://www.jayde.com/ Aliweb at http://www.nexor.com/public/aliweb/search/doc/form.html LEO at http://www.leo.org/cgi-bin/leo-search Linkcentre at http://linkcentre.com/ LinkMaster at http://linkmaster.com/ LinkMonster at http://www.linkmonster.com/ LinkStar at http://www.linkstar.com/home/partners/search-engines Lycos at http://www.lycos.com/ Magellan at http://www.mckinley.com/ Matilda at http://www.aaa.com.au/ Nerd World at http://www.nerdworld.com/ NetFind at http://www.aol.com/netfind/ Northern Light at http://www.northernlight.com/ Open Text at http://index.opentext.net/ REX at http://www.skyline.net/REX/ Tradewave Galaxy at http://galaxy.tradewave.com/ web://411 at http://www.sserv.com/web411/ WebCrawler at http://www.webcrawler.com/ Websitez at http://www.websitez.com/ What-U-Seek at http://www.whatuseek.com/ WWWWorm at http://wwwmcb.cs.colorado.edu/wwww.html Yahoo at http://www.yahoo.com/ 9.2 Meta Search Engines http://www.all4one.com/ All4One http://www.cyber411.com/ Cyber411 http://www.dogpile.com/ Dogpile http://www.w3com.com/fsearch/ FrameSearch http://www.highway61.com/ Highway 61 http://m5.inference.com/ifind/ i-Find http://www.isleuth.com/ Internet Sleuth http://www.mamma.com/ Mamma http://www.metacrawler.com/ MetaCrawler http://metasearch.com/ MetaSearch http://www.cosmix.com/motherload/insane/ Mother Load Insane Search http://www.primecomputing.com/pssearch.htm Prime Search http://www.designlab.ukans.edu/profusion/ Pro-Fusion http://guaraldi.cs.colostate.edu:2000/form Savvy Search http://search.onramp.net/ Search.onramp.net http://www.he.net/~kamus/use2en.htm Use It! 9.3 Geo Specific Search Engines http://www.countries.com/index.shtml Countries.com http://www.arab.net/search/welcome.html Arab.net http://www.samilan.com/ South Asian Internet Resources http://www.intercom.com.au/wombat/ Web Wombat --Australian http://www.argos.com.br/ Argos --Brazil http://www.cade.com.br/ Cade --Brazil http://www.radaruol.com.br/index.html Radar --Brazil http://canada411.sympatico.ca/ Canada 411 http://www.canlinks.net/ CANLinks --Can. http://maplesquare.com/ Maple Square --Can. http://www.chilnet.cl/buscai.htm? ChilNet --Chile http://www.euroferret.com/ Euroferret http://www.god.co.uk/ G.O.D. --Europe http://lokace.iplus.fr/ Lokace --Fr. http://vroom.web.de/ web.de --Ger. http://www.genius.net/indolink/ INDOLink --India http://www.arianna.it/ Arianna --It. http://www.keycomm.it/ricerche.htm Ricerche --It. http://www.ipoline.com/~man/jpsearch.htm Japan Super Search http://senrigan.ascii.co.jp/index-e.html Senrigan --Japan http://simmany.hnc.net/ Simmany --Korea http://www.nois.nl/nlurl2/ NL-URL --Dutch http://www.zoek.nl/ Zoek --Dutch http://accessnz.co.nz/ Access New Zealand http://nzexplorer.co.nz/ NZExplorer --New Zealand http://www.aeiou.pt/ aeiou --Port. http://www.cusco.viatecla.pt/ Cusco --Port. http://www.sapo.pt/ Sapo --Port. http://scotland.org/ Scotland.org http://www.ananzi.co.za/ Ananzi --So. Africa http://charybdis.marques.co.za/zaworm.htm ZA Worm --So. Africa http://www.elcano.com/ Elcano --Sp. http://www.search.ch/ Swiss Search http://www.ipoline.com/~man/twsearch.htm Taiwan Super Search **** 10.0 Contact Information Corrections, additions or comments can be sent to: Ken Bogucki krb@infobasic.com http://www.infobasic.com/pageone.htm END WISE FAQ (c) =========================