Search Engine Terms
as suggested by members of the I-Search Digest
This glossary or list of search engine terms is designed to complement the discussions taking place on 
the I-Search Digest discussion list moderated by Marshall D. Simmonds and published by 
AudetteMedia. To subscribe to the I-Search Digest, send mail to join-i-search@list.mmgco.com. A 
collection of I-Search archives is also maintained.
To browse the glossary, click on one of the links below:

 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

The I-Search glossary is also available in the following translations:
En Francais 
 
French version by Chris Hede
In Italiano
 
Italian version by Andrea Misso
Auf Deutsch
 
German version by Karl Heinz Resch
Version en Espanol
 
Spanish version by Oscar Gonzalez Alba
NL-versie
 
Dutch version by Gert Gremmen
Verzija na srpskom 
 
Serbian version by Tijana Miletic 
English mirror
 
Mirror of this page held at MMGCO.

New members of I-Search can use this glossary to help them understand the terms and concepts being 
discussed without having to read all the back-issues. We hope the definitions will also be useful to help 
clarify the meanings of search engine terminology in general.
The definitions here are not fixed or authoritative - please feel free to use this site as a kind of 
whiteboard. If you disagree with (or can improve) a definition, post a message on the I-Search list and 
suggest your change! The list of definitions will never be complete - if something is missing, please ask 
on I-Search for someone to provide a definition. New definitions provided via I-Search will appear here 
too.

Please mail changes/additions to Keith Bramich at Orion Web Design.
Last updated 17th June 1999 (to I-Search digest 134). Translators' Information here

Search Engine Terms A-D

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Adjacency 
A property of the relationship between words in a search engine (or directory) query. Search 
engines often allow users to specify that words should be next to one another or somewhere near 
one another in the web pages searched. 
  
Agent Name Delivery 
The process of sending search engine spiders to a tailored page, yet directing your visitors to what 
you want them to see. This is done using server side includes (or other dynamic content techniques). 
SSI, for example, can be used to deliver different content to the client depending on the value of 
HTTP_USER_AGENT. Most normal browser software packages have a user agent string which 
starts with "Mozilla" (coined from Mosaic and Godzilla). Most search engine spiders have specific 
agent names, such as "Gulliver", "Infoseek sidewinder", "Lycos spider" and "Scooter". 
  
By switching on the value of HTTP_USER_AGENT (a process known as agent detection), 
different pages can be presented at the same URL, so that normal visitors will never see the page 
submitted to search engines (and vice versa). 
  
In practise this is somewhat simplistic. Some search engines pretend to be "plain mozilla" browsers 
to prevent use of agent name delivery. Effective use of agent name delivery can be very difficult, and 
may not even work. 
  
How do you spot agent name delivery at work? This is quite difficult, as the owners of web pages 
using agent name delivery can control what you see! You may be able to guess that a page is using 
this technique if it appears to be indexed incorrectly or the title or description don't match the page 
you see, but this could also have been achieved by switching pages after the relevant search engine 
has indexed it. If you really want to see the search engines' tailored version of a page, write a 
program (e.g. a Perl script) to retrieve the URL with HTTP_USER_AGENT set to each of the 
strings used by the search engine spiders. If agent name delivery is in use, one or more of the 
retrieved pages will be different to the others! 
  
See also hidden text and IP delivery. 
  
Altavista 
A popular search engine with the largest database on the web, indexing more than 140 million 
pages. Its main URL is http://www.altavista.com. Until 1998, this search engine provided the search 
facility for Yahoo. Altavista indexes all the words in a web page, and new pages are normally 
added to the database fairly quickly, within a couple of working days. You are asked to submit just 
the main page of your site. The Altavista spider will then explore your site and index a representative 
sample of the pages. Some problems with spamming have been noticed. The use of keyword meta 
tags is penalised. Altavista places various alternative options before its search results, including 
suggested questions (using the Ask Jeeves service), RealNames. Paid entries are beginning to 
appear at the start of the search results. 
  
AOL Netfind 
The default search engine for users of the AOL internet service provider, and hence a busy site. Its 
URL is http://www.netfind.com. It is essentially the same engine as Excite. 
  
Applet 
A small program, often written in Java, which usually runs in a web browser, as part of a web 
page. It is possible that the use of such a program may cause spiders and robots to stop indexing a 
page. 
  
ArchitextSpider 
The name of the Excite search engine's spider. 
  
Ask Jeeves 
A meta search engine which can be asked questions in English. This service is also in use at 
Altavista. http://www.askjeeves.com. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Bait-and-Switch 
The provision of one page for a search engine or directory and a different page for other user agents 
at the same URL. Various methods can be used, e.g. Agent Name Delivery or IP Delivery. 
  
Bridge Page 
See Gateway Page. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

CGI 
Common Gateway Interface - a standard interface between web server software and other 
programs running on the same machine. 
  
CGI Program 
Strictly, any program which handles its input and output data according to the CGI standard. In 
practice, CGI programs are used to handle forms and database queries on web pages, and to 
produce non-static web page content. 
  
Channels, Channel listings 
Lists of links to selected (and usually popular) web sites. The links are maintained by search engines 
and directories and are sorted into categories or channels. Sites are picked by a channel editor, 
often because of a site's already high ranking with the search engines. Some search engines and 
directories allow visitors to nominate sites for inclusion in their channels. 
  
Client 
A computer, program or process which makes requests for information from another computer, 
program or process. Web browsers are client programs. Search engine spiders are (or can be said 
to behave as) clients. 
  
Click through 
The process of clicking on a link in a search engine output page to visit an indexed site. 
  
This is an important link in the process of receiving visitors to a site via search engines. Good 
ranking may be useless if visitors do not click on the link which leads to the indexed site. The secret 
here is to provide a good descriptive title and an accurate and interesting description. 
  
Cloaking 
The hiding of page content. Normally carried out to stop page thieves stealing optimized pages. See 
also Bait-and-Switch. 

Clustering 
The listing of only one page from each web site in a search engine or directory's list of search 
results. This avoids occupation of all the top results by a small number of web sites and makes the 
list of results clearer and more useful to the user. 

Comment 
The HTML <!-- and --> tags are used to hide text from browsers. Some search engines ignore text 
between these symbols but others index such text as if the comment tags were not there. Comments 
are often used to hide javascript code from non-compliant browsers, and sometimes (notably on 
Excite) to provide invisible keywords to some search engines. 
  
Crawler 
See Spider. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Dead Link 
An internet link which doesn't lead to a page or site, probably because the server is down or the 
page has moved or no longer exists. Most search engines have techniques for removing such pages 
from their listings automatically, but as the internet continues to increase in size, it becomes more and 
more difficult for a search engine to check all the pages in the index regularly. Reporting of dead 
links helps to keep the indexes clean and accurate, and this can usually be done by submitting the 
dead link to the search engine. 

De-listing 
The removal of pages from a search engine's index.

Removal can occur for various reasons, including unreliability of the machine that hosts a site or 
because of perceived attempts at spamdexing. 

Description 
Descriptive text associated with a web page and displayed, usually with the page title and URL, 
when the page appears in a list of pages generated by a search engine or directory as a result of a 
query. Some search engines take this description from the DESCRIPTION Meta tag - others 
generate their own from the text in the page. Directories often use text provided at registration. 

Direct Hit 
A system which monitors the search engine users' selections from search engine results, counting 
which results are clicked on most, and how long visitors spend at that site, so as to improve 
relevancy. Used by HotBot and as a plug-in to Apple's new innovative Sherlock search system. 
See www.directhit.com. 

Directory 
A server or a collection of servers dedicated to indexing internet web pages and returning lists of 
pages which match particular queries. Directories (also known as Indexes) are normally compiled 
manually, by user submission (such as at whatsnew.com), and often involve an editorial selection 
and/or categorization process (such as at LookSmart and Yahoo). 
  
Dogpile 
A meta search engine. Found at http://www.dogpile.com. 
  
Domain 
A sub-set of internet addresses. Domains are hierarchical, and lower-level domains often refer to 
particular web sites within a top-level domain. The most significant part of the address comes at the 
end - typical top-level domains are .com, .edu, .gov, .org (which sub-divide addresses into areas of 
use). There are also various geographic top-level domains (e.g. .ar, .ca, .fr, .ro etc.) referring to 
particular countries. 
  
The relevance to search engine terminology is that web sites which have their own domain name 
(e.g. http://www.nativetongues.com) will often achieve better positioning than web sites which exist 
as a sub-directory of another organisation's domain (e.g. 
http://ourworld.compuserve.com/homepages/tijana/). 
  
Doorway Page 
See Gateway Page. 
  
Dynamic content 
Information on web pages which changes or is changed automatically, e.g. based on database 
content or user information. Sometimes it's possible to spot that this technique is being used, e.g. if 
the URL ends with .asp, .cfm, .cgi or .shtml. It is possible to serve dynamic content using standard 
(normally static) .htm or .html type pages, though. Search engines will currently index dynamic 
content in a similar fashion to static content, although they will not usually index URLs which contain 
the ? character. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 


Search Engine Terms E-I

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Entry Page 
See Gateway Page. 
  
Euroseek 
A search engine which concentrates on information relating to Europe. The URL is 
http://www.euroseek.com. 
  
Excite 
Regarded as one of the best search engines, with an index of 55 million pages. It can be slow to 
index new sites. The URL is http://www.excite.com. Sites using frames must have a NOFRAMES 
section in order to be listed. Some spamming has been noticed. Excite previously ignored the 
DESCRIPTION meta tag, but is now using this in its listings (although the contents do not affect 
relevancy, which is based mainly on the title and body text). The use of gateway pages and hidden 
text is allowed. Excite has an audio/video search facility which is a branded component of 
RealNetworks' RealPlayer G2. 

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Fake Copy Listings 
Sometimes a malicious company will steal a web page or the entire contents of a web site, re-
publish at a different URL and register with one or more search engines. This can cause a loss of 
traffic from the original site if the search engines position the copy higher in the listings. If you find 
that someone has stolen your site in this way, write to the company concerned and ask them to 
remove the stolen content. Also contact the hosting service used by the company, any company that 
benefits from the theft and any search engine(s) concerned. If the thieves refuse to remove the 
material or ignore you, obtain legal advice. It is also well worth having printed evidence to support 
your claim that your copy of the material was there first, and that you have the copyright! See also 
Mirror Sites. 

False Drop 
A web page retrieved from a search engine or directory which is not relevant to the query used. 
This could be for one of the following reasons: 
? The web page contained the keywords entered, but used in the wrong context, with a different 
meaning or with a different inter-relationship to that expected. 
? The web page is an attempt at spamdexing. 
? The search engine has a fault in its database or a bug in its query program. 
Flash Page 
See Splash Page. 
  
Font and Background Spoofs 
Various techniques used to place invisible text in a web page, to improve positioning without 
affecting the appearance of the page. These are mostly based on setting the font and background 
colours to the same value (e.g. white). Most search engines now detect these tricks. 
  
Frames 
An HTML technique for combining two or more separate HTML documents within a single web 
browser screen. Compound interacting documents can be created to make a more effective web 
page presented in multiple windows or sub-windows.

A framed web site often causes great problems for search engines, and may not be indexed 
correctly. Search engines will often index only the part of a framed site within the <NOFRAMES> 
section, so make sure that the <NOFRAMES> section includes relevant text which can be indexed 
by the spiders. If your site uses frames, consider providing a gateway page or adding navigational 
links within the framed pages. Submit the main page - the one containing the <FRAMESET> tag to 
the search engines. If you use a gateway page, submit this separately. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Gateway Page 
A web page submitted to a search engine (spyder) to give the relevance-algorithm of that particular 
spyder the data it needs, in the format that it needs it, in order to place a site at the proper level of 
relevance for the topic(s) in question. (This determination of topical relevance is called "placement".) 
  
A gateway page may present information to the spyder, but obscure it from a casual human viewer. 
The gateway page exists so as to allow a web-site to present one face to the spyder, and another to 
human viewers. There are several reasons why one might want to do this. One, is that the author 
may not want to publicly disclose placement tactics. Another is that the format that may be easiest 
for a given spyder to understand, may not be the format that the author wishes to present to his 
viewers for aesthetics. Still another may be that the format that is best for one spyder may differ 
from that which is best for another. By using gateway pages, you can present your site to each 
spyder in the way which is known or thought to be best for that particular spyder.

Also known as bridge pages, doorway page, entry pages, portals or portal pages. 
  
An example gateway page: 
http://www.isquare.com/gateway.htm 
  
Go.com 
A portal partnership between Infoseek and Disney, with search capabilities based on the Infoseek 
index, at http://go.com/. 
  
GoTo 
A search engine, powered by Inktomi, which only returns one URL per domain in its search 
results. Operates a "pay per click" scheme where websites can pay to increase their relevancy. The 
URL is http://www.goto.com. 
  
Gulliver 
The name of the Northern Light Search Engine's spider. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Heading 
Many search engines give extra weight and importance to the text found inside HTML heading 
sections. It is generally considered good advice to use headings when designing web pages and to 
place keywords inside headings. 
  
Hidden Text 
Text on a web page which is visible to search engine spiders but not visible to human visitors. This is 
sometimes because the text has been set the same colour as the background, because multiple 
TITLE tags have been used or because the text is an HTML comment. Hidden text is often used for 
spamdexing. Many search engines can now detect the use of hidden text, and often remove 
offending pages from their database or lower such pages' positioning. 
  
Text can also be hidden using agent name delivery or IP delivery either to present different text 
to different search engine spiders or to hide the real HTML source from competitors. The Stealth 
META Tag CGI Script probably uses this technique and is available at 
http://www.OutRank.com/stealth.shtml. Another software product which hides HTML source is 
called Psyral Phobia and is available at http://www.merlesworld.com/software.htm. 
  
Hit 
In the context of visitors to web pages, a hit (or site hit) is a single access request made to the server 
for either a text file or a graphic. If, for example, a web page contains ten buttons constructed from 
separate images, a single visit from someone using a web browser with graphics switched on (a 
"page view") will involve eleven hits on the server. (Often the accesses will not get as far as your 
server because the page will have been cached by a local internet service provider). 
  
In the context of a search engine query, a hit is a measure of the number of web pages matching a 
query returned by a search engine or directory. 
  
Hotbot 
One of the largest search engines, indexing 110 million pages. Powered by Inktomi, new 
submissions appear to be taking two weeks or longer to appear. The URL is 
http://www.hotbot.com. 
  
HTML 
HyperText Markup Language - the (main) language used to write web pages. 
  
HTTP 
HyperText Transfer Protocol - the (main) protocol used to communicate between web servers and 
web browsers (clients). 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Image Map 
A set of hyperlinks attached to areas of an image. This may be defined within a web page, or as an 
external file. 
  
If the image map is defined as an external file, search engines may have problems indexing your 
other pages, unless you duplicate the links as conventional text hyperlinks. 
  
If the image map is included within the web page, the search engines should have no problem 
following the links, although it's good practice to provide text links too, to aid the visually impaired 
and those accessing the web with graphics switched off or using text only browsers. 
  
Inbound Link 
A hypertext link to a particular page from elsewhere, bringing traffic to that page. Inbound links are 
counted to produce a measure of the page popularity. Searches for the inbound links to a page 
can be made on Altavista, Infoseek and Hotbot. 
  
Index 
See Directory. Also refers to the database of web pages maintained by a search engine or 
directory. 
  
Infind 
A meta search engine. Found at http://www.infind.com. 
  
Infoseek 
One of the largest search engines. New sites are normally added very quickly, within one or two 
business days. The URL is http://www.infoseek.com. Infoseek is one of the few search engines to 
treat singular and plural forms as the same word. Very sensitive to page popularity in its 
positioning algorithm. 
  
Inktomi 
The database used by some of the largest search engines, including Hotbot. Inktomi is also used by 
Yahoo when no matches are found in Yahoo's own database. 
  
IP Delivery 
Similar to agent name delivery, this technique presents different content depending on the IP 
address of the client. It is very difficult to view pages hidden using this technique, because the real 
page is only visible if your IP address is the same as (for example) a search engine's spider. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 


Search Engine Terms J-N

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Java 
A computer programming language whose programs can run on a number of different types of 
computer and/or operating system. Used extensively to produce applets for web pages. 
  
Javascript 
An simple interpreted computer language used for small programming tasks within HTML web 
pages. The scripts are normally interpreted (or run) on the client computer by the web browser. 
Some search engines have been known to index these scripts, presumably erroneously. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Keyword 
A word which forms (part of) a search engine query. 
  
Keyword Density 
A property of the text in a web page which indicates how close together the keywords appear. 
Some search engines use this property for Positioning. Analysers are available which allow 
comparisons between pages. Pages can then be produced with the similar keyword densities to 
those found in high ranking pages. 
  
Keyword Domain Name 
The use of keywords as part of the URL to a website. Positioning is improved on some search 
engines when keywords are reinforced in the URL. 
  
Keyword Phrase 
A phrase which forms (part of) a search engine query. 
  
Keyword Purchasing 
The buying of search keywords from search engines, usually to control banner ad. placement. All 
the major search engines (except EuroSeek and GoTo) insist that keyword purchasing is only used 
for banner ad. placement, and doesn't influence search results. The display of banner ads. for 
bought keywords can be studied using a service called Bannerstake from Thomson and Thomson 
at http://www.namestake.com. which returns the banner ads. displayed when particular queries are 
used. 
  
Keyword Stuffing 
The repeating of keywords and keyword phrases in META tags or elsewhere. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Link Popularity 
See page popularity. 

Log File 
A file maintained on a server in which details of all file accesses are stored. Analysing log files can 
be a powerful way to find out about a web site's visitors, where they come from and which queries 
are used to access a site.Various software packages are available to analyse log files, and some are 
listed below. 
  
Sane Solutions provide NetTracker, which is good at analysing queries from log files. A free 
program called WebLog is available at http://www.awsd.com. See also the reviews at 
http://www.bellacoola.com/html/sample_reports.htm. 
  
LookSmart 
A medium-sized directory. The URL is http://www.looksmart.com. 
  
Lycos 
One of the largest search engines, Lycos appears to be moving towards becoming a directory and 
is using the Open Directory for some search results. It can be slow to index new sites. The lycos 
spider ignores meta tags in pages. Lycos can be found at http://www.lycos.com. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Metacrawler 
A meta search engine found at http://www.metacrawler.com. Results from various search engines 
are summarised in an easy to read form. 
  
Metafind 
A meta search engine found at http://www.metafind.com. 
  
Meta Search 
A search of searches. A query is submitted to more than one search engine or directory, and results 
are reported from all the engines, possibly after removal of duplicates and sorting. Also the meta 
search engine of the same name, found at http://www.metasearch.com. 
  
Meta Search Engine 
A server which passes queries on to many search engines and/or directories and then summarises all 
the results. Ask Jeeves, Dogpile, Infind, Metacrawler, Metafind and Metasearch are 
examples of meta search engines. 
  
Meta tag 
A construct placed in the HTML header of a web page, providing information which is not visible to 
browsers. The most common meta tags (and those most relevant to search engines) are 
KEYWORDS and DESCRIPTION. 
  
The KEYWORDS tag allows the author to emphasise the importance of certain words and phrases 
used within the page. Some search engines will respond to this information - others will ignore it. 
Don't use quotes around the keywords or keyphrases. 
  
The DESCRIPTION tag allows the author to control the text of the summary displayed when the 
page appears in the results of a search. Again, some search engines will ignore this information. 
  
The HTTP-EQUIV meta tag is used to issue HTTP commands, and is frequently used with the 
REFRESH tag to refresh page content after a given number of seconds. Gateway pages sometimes 
use this technique to force browsers to a different page or site. Most search engines are wise to this, 
and will index the final page and/or reduce the ranking. Infoseek has a strong policy against this 
technique, and they might penalize your site, or even ban it.
Other common meta tags are GENERATOR (usually advertising the software used to generate the 
page) and AUTHOR (used to credit the author of the page, and often containing e-mail address, 
homepage URL and other information). 
  
Mining Company 
A large directory spread over many different URLs The main URL is http://www.miningco.com. 
  
Mirror Sites 
Multiple copies of web sites or web pages, often on different servers. The process of registering 
these multiple copies with search engines is often treated as spamdexing, because it artificially 
increases the relevancy of the pages. Filters such as the Infoseek Sniffer now remove multiple 
mirrors from the indexes. 
  
Misspellings 
People quite often spell words incorrectly when using search engines. Pages which use common 
misspellings will quite often receive extra hits, so it is a useful technique to include common 
misspellings of words in alt tags, keywords, page names and titles. A similar effect occurs when 
spaces are missed out and words are accidentally joined together. 
  
MultiCrawl 
A parallel search engine which offers users their own branded versions. http://www.multicrawl.com. 
  
Multiple Domain Names 
The use of several extra domains to provide gateway pages or gateway sites to the main site. 
  
Multiple Keyword Tags 
The use of more than one Keywords META tag in order to try to increase the relevancy of the best 
keywords on a page. This is not recommended. It may be detected as a spamming technique, or all 
but one of the tags may simply be ignored. 
  
Multiple Titles 
It used to be possible to repeat the HTML title tag in the header section of a page several times to 
improve search engine positioning. Most search engines now detect this trick. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Netfind 
See AOL Netfind. 
  
NewHoo 
See the Open Directory Project. 
  
Northern Light 
A search engine with an additional "pay to access" special collection of business, health and 
consumer publication articles. The first search engine to ban meta search engines from its 
database. The URL is http://www.northernlight.com. 

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 


Search Engine Terms O-R

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Open Directory Project 
A directory project run by thousands of volunteer editors. In principal, this is a very exciting and 
powerful way to organise the web. In practice, there have been some problems with the behaviour 
of some of the editors, which has caused some initial difficulty for the organisers. Initially known as 
NewHoo, the project is now part of Netscape (and therefore of AOL). See 
http://directory.mozilla.org. 
  
Open Text 
A large business-only directory. The URL is http://www.opentext.com. 
  
Optimization 
Changes made to a web page to improve the positioning of that page with one or more search 
engines. A means of helping potential customers or visitors to find a web site. Optimization may 
involve design/layout changes, new text for the title-tags, meta-tags, alt- attributes, headings, and 
changes to the first 200-250 words of the main text. A large image map at the top of a page should 
be moved further down the page. Frames should be avoided (unless navigational links are also 
provided within the frames). 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Page Popularity 
A measure of the number and quality of links to a particular page (inbound links). Many search 
engines (and most noticeably Infoseek) are increasingly using this number as part of the 
positioning process. The number and quality of inbound links is becoming as important as the 
optimisation of page content. A free service to measure page popularity can be found at 
http://www.linkpopularity.com. 

Page View 
Used in site statistics as a measure of pages viewed rather than server hits. Many server hits may be 
made to access a single page, causing many separate log file entries. Analysis software can 
determine that these server hits were generated when a visitor viewed a single page, and group them 
together to provide this more useful method of counting visitors. See also Hit and Unique Visitor. 

Placement 
See Positioning. 

Politeness Window 
In order not to overburden any particular server, most search engine spiders limit their access to 
each server. If your page is hosted on the same server as thousands of other pages, the spider may 
never get the time to reach (and index) your page. This can be a powerful argument for having your 
own server. 

Portal 
See Gateway page. Can also mean Portal Site. 

Portal Page 
See Gateway page. 

Portal Site 
A generic term for any site which provides an entry point to the internet for a significant number of 
users.

Examples are search engines, directories, built-in default browser or service provider homepages, 
sites hardwired to browser buttons, sites offering free homepages, e-mail or personalised news and 
any popular (or heavily advertised) sites that significant numbers of people may bookmark or set as 
default pages. 

Positioning 
The process of ordering web sites or web pages by a search engine or a directory so that the most 
relevant sites appear first in the search results for a particular query. Software such as 
PositionAgent, Rank This and Webposition can be used to determine how a URL is positioned for 
a particular search engine when using a particular search phrase. The GoHip Search site allows you 
to see positioning information from many of the big search engines, displayed all on one page. 
  
Positioning Technique 
A method of modifying a web page so that search engines (or a particular search engine) treat the 
page as more relevant to a particular query (or a set of queries). 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Query 
A word, a phrase or a group of words, possibly combined with other syntax used to pass 
instructions to a search engine or a directory in order to locate web pages. 
  
For details of which queries are being used, visit the GoTo.com Search Inventory page. To "spy" on 
queries as they're entered, look at the Metaspy page. A summary of what people actually search for 
can be found at http://www.synergy-marketing.com/search.html. A free program called Word 
Market will collect search terms from the search engines, and is available at 
http://www.softwaresolutions.net/free.htm. The Canadian Email Business Network provides a Meta 
Tags/Keywords Search Engine at http://www.cebn.com/metatags.htm which allows searches 
through thousands of recent search engine queries. 
  
  
 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Ranking 
See Positioning. 
  
RealNames 
An alternate website address system in operation at Altavista. Brand names used in searches are 
mapped directly to the appropriate website, usually because the company owning the brand-name 
has paid a fee to RealNames. http://www.realnames.com 
  
Referrer 
The URL of the web page from which a visitor came. The server's referrer log file will indicate this. 
If a visitor came directly from a search engine listing, the query used to find the page will usually be 
encoded in the referer URL, making it easy to see which keywords are bringing visitors. The referer 
information can also be accessed as document.referrer within JavaScript or via the 
HTTP_REFERER environment variable (accessible from scripting languages). 
  
Refresh Tag 
See the paragraph about HTTP_EQUIV under Meta Tag. 
  
Registration 
The process of informing a search engine or directory that a new web page or web site should be 
indexed. 
  
Relevancy Algorithm 
The method a search engine or directory uses to match the keywords in a query with the content of 
each web page, so that the web pages found can be ordered suitably in the query results. Each 
search engine or directory is likely to use a different algorithm, and to change or improve its 
algorithm from time to time. 
  
Re-submission 
Repeating the search engine registration process one or more times for the same page or site. Under 
certain circumstances, this is regarded with suspicion by the search engines, as it could indicate that 
someone is experimenting with spamming techniques.

The Infoseek and Altavista search engines are particularly vulnerable to spamming because they list 
sites very quickly, and are thus easy to experiment with. Both engines de-list sites for repeated re-
submission and Infoseek, for example, does not allow more than one submission of the same page 
in a 24 hour period. Occasional re-submission of changed pages is not normally a problem. 
  
Robot 
Any browser program which follows hypertext links and accesses web pages but is not directly 
under human control. Examples are the search engine spiders, the "harvesting" programs which 
extract e-mail addresses and other data from web pages and various intelligent web searching 
programs. A database of web robots is maintained by Webcrawler. 
  
robots.txt 
A text file stored in the top level directory of a web site to deny access by robots to certain pages 
or sub-directories of the site. Only robots which comply with the Robots Exclusion Standard will 
read and obey the commands in this file. Robots will read this file on each visit, so that pages or 
areas of sites can be made public or private at any time by changing the content of robots.txt before 
re-submitting to the search engines. The simple example below attempts to prevent all robots from 
visiting the /secret directory:

User-agent: *
Disallow: /secret 
  
For more information, please refer to the Altavista robots.txt page. 

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 


Search Engine Terms S-Z

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Scooter 
The name of the Altavista search engine's spider. 
  
Search Engine 
A server or a collection of servers dedicated to indexing internet web pages, storing the results and 
returning lists of pages which match particular queries. The indexes are normally generated using 
spiders. Some of the major search engines are Altavista, Excite, Hotbot, Infoseek, Lycos, 
Northern Light and Webcrawler. Note that Yahoo is a directory, not a search engine. The term 
Search Engine is also often used to describe both directories and search engines. 
  
Searchking 
A smaller search engine which allows visitors to vote on the relevance of the pages returned by their 
queries, thus ranking sites based on the opinions of searchers. Unlike some of the major search 
engines, there is good customer support. http://www.searchking.com. 
  
Search Term 
See Query. 
  
Server 
A computer, program or process which responds to requests for information from a client. On the 
internet, all web pages are held on servers. This includes those parts of the search engines and 
directories which are accessible from the internet. 
  
Sidewinder 
The name of the Infoseek search engine's spider. 
  
Siphoning 
The use of various means to steal another site's traffic. Techniques used include the wholesale 
copying of web pages (with the copied page altered slightly to direct visitors to a different site, and 
then registered with the search engines) and the use of keywords or keyword phrases "belonging" to 
other organisations, companies or web sites. 
  
Site Hit 
See hit. 
  
Skewing 
Artificially changing search engine results so that, for example, popular queries will return artificially 
created listings. Infoseek is currently experimenting with this technique, using a small group of 
reviewers to artificially force higher relevance for certain sites. 
  
Slurp 
The name of the spider used by Inktomi. 
  
Snap! 
A large directory. The URL is http://www.snap.com. 
  
Sniffer 
The name of the filter program used by the Infoseek search engine to prevent spamdexing. It detects 
multiple mirror pages, font and background spoofs, multiple title tags, keyword stuffing and possibly 
other types of spamdexing. 
  
Spamdexing 
The alteration or creation of a document with intent to deceive an electronic catalog or filing system. 
Any technique that increases the potential position of a site at the expense of the quality of the 
search engine's database can also be regarded as spamdexing - also known as spamming or 
spoofing. 
  
Spamming 
See spamdexing. Spamming is also used more generally to refer to the sending of unsolicited bulk 
electronic mail, and the search engine use is derived from this term. 
  
Spider, Spyder 
That part of a search engine which surfs the web, storing the URLs and indexing the keywords and 
text of each page it finds. Please refer to the Search Engine Watch SpiderSpotting Chart for details 
of individual spiders. See also Robot. 
  
Spidering 
The process of surfing the web, storing URLs and indexing keywords, links and text. 
  
Typically, even the largest search engines cannot spider all of the pages on the net. This is due to the 
huge amount of data available, the speed at which the new data appears, the use of politeness 
windows and practical limits on the number of pages that can be visited in a given time . The search 
engines have to make compromises in order to visit as many sites as possible, and they do this in 
different ways. For example, some only index the home pages of each site, some only visit sites 
they're explicitly told about, and some make judgements about the importance of sites (from number 
and quality of inbound links) before "digging deeper" into the subpages of a site. 
  
Splash page 
Similar to a gateway page but provides an initial display which must be viewed before a visitor 
reaches the main page. This usually acts as a kind of "opening title" sequence, and can be extremely 
annoying. 
  
Spoofing 
See spamdexing. 
  
SSI 
Server Side Includes. Used (for example) to add dynamically generated content to a web page. 
  
Stealth Script 
A CGI script which switches page content depending on who or what is accessing the page. See 
agent name delivery. 
  
Stemming 
A function of some search engines and directories which allows results to be returned from some or 
all keywords based on the same stem as the keyword entered as a search term. For example, when 
stemming is switched on, a search for the word dance will return matches for any word whose stem 
is danc-, matching the keywords dance, dancer and dancing. 
  
Stop Word 
A word which is ignored in a query because the word is so commonly used that it makes no 
contribution to relevancy. Examples are common net words such as computer and web, and 
general words like get, I, me, the and you. 
  
Submission Service 
Any agent which submits your site to many search engines and directories. Useful to get listed with 
many of the minor search engines, but don't rely on such services to get listed with the major search 
engines. Many of these services are automatic and run from web sites. Others run off line. Some are 
free. Beware of supplying your email address to the so called FFA (free for all) services - you may 
receive lots of spam. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Title 
The text contained between the start and end HTML tags of the same name. This text is associated 
with (but not displayed in) the web page containing these tags, and is displayed in a special position 
(usually at the top of the window) by the web browser. 
  
Title text is important because it normally forms the link to the page from the search engine listings, 
and because the search engines pay special attention to the title text when indexing the page. 
  
Don't confuse this text with heading text within the web page which often looks like the title. Usually 
this will be rendered either using the HTML heading tags or just rendered with a large font size. 
  
Traffic 
The visitors to a web page or web site. Also refers to the number of visitors, hits, accesses etc. over 
a given period. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Unique Visitor 
A real visitor to a web site. 
  
Web servers record the IP addresses of each visitor, and this is used to determine the number of 
real people who have visited a web site. 
  
If for example, someone visits twenty pages within a web site, the server will count only one unique 
visitor (because the page accesses are all associated with the same IP address) but twenty page 
accesses. 
  
See also hit and page view. 
  
URL 
Universal Resource Locator. An address which can specify any internet resource uniquely. The 
beginning of the address indicates the type of resource - e.g. http: for web pages, ftp: for file 
transfers, telnet: for computer login sessions or mailto: for e-mail addresses. 
  
URL Submission 
See Registration. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Virtual Domain 
A domain hosted by a virtual server account. 
  
Virtual Server 
An account on a hosting company server, usually linked to its own domain. This provides an 
inexpensive way to run a web site with its own top level domain, and is usually indistinguishable from 
having a separate physical server, except that the virtual server may share an IP address with other 
virtual servers on the same machine. A virtual server account is fine for most uses, but will often be 
slower to respond than a physically separate server, and physical access to the machine will seldom 
be allowed. The cost of a virtual server account is a small fraction of that needed to run a real 
server, mainly because of the expense of the dedicated line needed to connect the server 
continuously to the rest of the net. 

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Web Copywriting 
The writing of text especially for a web page. Similar to the writing of copy for any other type of 
publication, good web copywriting can have a great effect on search engine positioning, so it forms 
a major part of optimization. 
  
Webcrawler 
One of the largest search engines. The URL is http://www.webcrawler.com. 
  

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

XML 
Extensible Markup Language. A new language which promises more efficient data delivery over the 
web. XML does nothing itself - it must be implemented using 'parser' software or XSL. 
  
XSL 
Extensible Scripting Language - an XML style sheet language supported by the newer web 
browsers Internet Explorer 5 and Netscape 5. 

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Yahoo 
Similar to a search engine, but with a database generated by hand, this is the world's most used 
directory of web sites. The main URL is http://www.yahoo.com. It is notoriously difficult to get 
listed in Yahoo and, once listed, even more difficult to get your listing changed or to get out! To 
increase the odds of getting listed, try the following: 
? Select the three categories you want to be listed in very carefully. Consider the regional 
categories. Ensure that the categories match the content of your site. 
? Apply to one of their local subsidiaries for your own country or city. 
? Make sure that your site is well-designed and easy to navigate. 
? Ensure your site has no dead links. 
? Ensure that your pages download quickly. 
? Provide good contact information on your site. 
If you manage to get listed, keep the e-mail they send you. You can e-mail the same person 
subsequently to get your listing changed. 

 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 


 Home A B C D E F G H I J K L M N O P Q R S T U V W X Y Z