archive:internet:archie.man
archie> manpage
ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L)
NAME
archie - Internet archive server listing service
SYNOPSIS
archie
DESCRIPTION
This manual page describes Version 3 of the archie system. This Internet information service allows the user to query a database containing a list of files which are available on hosts connected to the Internet. Software located through this service can be obtained by means of ftp(1); for hosts with access to BITNET/NetNorth/EARN, it can be obtained by electronic mail through the Princeton bitftp (1L) service. Send mail to
bitftp@pucc.princeton.edu
Other Internet users who are not directly connected may use
the services of various ftp-by-mail servers including
ftpmail@decwrl.dec.com
Some archie systems track archive sites globally, others only track the archive sites in their country, region or continent in order to reduce the load on trans-oceanic links. There are a number of archie hosts serving different continental user communities. The servers command will list the most up-to-date information on archie servers worldwide.
archie.au* Australia archie.edvz.uni-linz.ac.at* Austria archie.univie.ac.at* Austria archie.uqam.ca* Canada archie.funet.fi Finland archie.th-darmstadt.de* Germany archie.ac.il* Israel archie.unipi.it* Italy archie.wide.ad.jp Japan archie.kr* Korea archie.sogang.ac.kr* Korea archie.rediris.es* Spain archie.luth.se* Sweden archie.switch.ch* Switzerland archie.ncu.edu.tw* Taiwan archie.doc.ic.ac.uk* United Kingdom archie.unl.edu USA (NE) archie.internic.net* USA (NJ) archie.rutgers.edu* USA (NJ) archie.ans.net* USA (NY) archie.sura.net* USA (MD) Sites marked with an asterisk '*' run archie version 3.0
Sun Release 4.1 Last change: 12 Apr 1993 1
ARCHIE(1L) MISC. REFERENCE MANUAL PAGES ARCHIE(1L)
archie can be accessed interactively, via electronic mail or through archie client programs.
Using the Interactive (telnet) Interface In order to use the interactive system you should use the following procedure:
1) telnet to the archie system closest to you. Do not use ftp for this, it will not work.
2) Login as user archie (no capitals, no password is required). The system should print a banner message and status report before presenting you with the command prompt.
3) Type help for complete information on the system.
For full details, refer to the section entitled ARCHIE COM- MANDS which appears below.
Using the Electronic Mail Interface In order to use the email interface, send requests to:
archie@<archie server>
where <archie server> is one of the hosts listed above, or one returned by the servers command. Send the word help in a message to obtain a list of available commands and features. This is a completely automated interface, acting without human intervention.
For full details, refer to the section entitled ARCHIE COM- MANDS which appears below.
Using the archie clients
The source code for a variety of archie client programs can
be obtained via anonymous ftp(1) from any of the archie
server hosts listed above. They are stored in the
archie/clients or pub/archie/clients directories. These
clients communicate via the Prospero distributed file system
protocol with archie servers, which perform the specified
queries and return the results to the user. Currently there
are command line and X(1) clients available. These clients
should run on most Unix platforms as well as other systems
with compatibility libraries. For more information on Pros-
pero send your queries to info-prospero-request@isi.edu
Communicating with the Database Administrators Mail to archie administrators at a particular archie server should be sent to the address
archie-admin@<archie server>
where <archie server> is one of the hosts listed above.
To send mail to the implementors of the archie system, please send mail to
archie-group@bunyip.com
The archie server system is a product of Bunyip Information Systems.
Requests for additions to the set of hosts surveyed for the
database, modifications to the Software Description Data-
base, or other administrative matters, should be sent to:
archie-admin@bunyip.com
ARCHIE COMMANDS
In the archie system version 3 the telnet and email clients accept a common set of commands. Additionally, there are specialized commands specfic to the particular interfaces. See THE INTERACTIVE INTERFACE and THE EMAIL INTERFACE sec- tions below for a list of these commands.
Note that some archie server sites may disable some of the commands for reasons particular to their site. As well some sites limit the number of concurrent interactive (telnet) sessions to better utilize limited resources.
Commands Arguments to commands shown in square brackets '[]' are optional; all others are mandatory.
find <pattern>
prog <pattern> This command produces a list of files matching the pat- tern <pattern>. The <pattern> may be interpreted as a simple substring, a case sensitive substring, an exact string or a regular expression, depending on the value of the search variable. The output normally contains such information as the file name that was matched, the directory path leading to it, the site containing it and the time at which that site was last updated. The format of the output can be selected through the output format variable. The results are sorted accord- ing to the value of the sortby variable, and are lim- ited in number by the maxhits variable. prog is identical to find. It is included for backward compatibility with older versions of the system.
help [<topic> [<subtopic>] ...] Invokes the help system and presents help on the speci- fied topic. A list of words is considered to be one topic, not a list of individual topics. Thus,
help set maxhits
requests help on the subtopic maxhits of topic set, not on two separate topics. After help is presented the user is placed in the help system at the deepest level containing subtopics.
For example, after typing
help set maxhits
and being shown the information for that topic the user is placed at the level set in the help hierarchy.
list [<pattern>] Produce a list of sites whose contents are contained in the archie database. With no argument all the sites are listed. If given, the <pattern> argument is interpreted as a regular expression (See "REGULAR EXPRESSIONS" below) against which to match site names: only those names matching are printed. The format of the output can be selected through the output format variable.
Note that the numerical (IP) address associated with a site name was valid at the last time the site was updated in the archie database but may have been changed subsequently. Furthermore, the listed IP address is the primary address as listed in the Domain Name System (secondary addresses are not stored).
Example:
list
lists all sites in the database, while
list .de$
lists all German sites.
mail <address> Mail the result of the last command that produced out- put (eg. find, whatis, list) to <address>. This must be a vaid email address.
manpage [ roff | ascii ] Display the archie manual page (this file). The optional arguments specify the format of the returned document. roff specifies UNIX troff(1) format while ascii specifies plain, preformatted ASCII output. With no arguments it defaults to ascii.
domains Asks the current server for the list of the archie psuedo-domains that it supports. See the entry for the match domain variable below. This command takes no arguments.
Example:
domains
requests the list of pseudo-domains from the server. The result looks (in part) something like this:
africa Africa za anzac OZ & New Zealand au:nz asia Asia kr:hk:sg:jp:cn:my:tw:in centralamerica Central America sv:gt:hn easteurope Eastern Europe bg:hu:pl:cs:ro:si:hr mideast Middle East eg:.il:kw:sa northamerica North America usa:ca:mx scandinavia Scandinavia no:dk:se:fi:ee:is southamerica South American ar:bo:br:cl:co:cr:cu:ec:pe usa United States edu:com:mil:gov:us westeurope Western Europe westeurope1:westeurope2 world The World world1:world2
The first column gives the names of pseduo-domains sup-
ported by the server. The second gives the "natural
language" description of the pseudo-domain and the
third column is the actual definitions of those
domains. Thus here the "asia" domain is comprised of
the Domain Name System country codes for Korea ("kr"),
Hong Kong ("hk"), Singapore ("sg") etc. Pseudo-domains
may also be constructed from other pseudo-domains: thus
one component of the the "northamerica" domain is
itself constructed from the "usa" pseudo-domain.
motd Re-display the "message of the day", which is normally printed when the user initially logs on to the client (in the case of the interactive interface) or at the start of the returned message (in the email interface).
servers Display a list of all publicly accessible archie servers worldwide. The names of the hosts, their IP addresses and geographical locations are listed.
set <variable-name> [<value>] Set the specified variable. Variables are used to con- trol various aspects of the way archie operates; the interpretation of <pattern> arguments, the format of output from various commands, etc. See the section below on variables for a description of each one as well as the entries for unset and show.
show [<variable-name> ...] Without any argument, display the status of all the user-settable variables, including such information as its type (boolean, numeric, string), whether or not it is set and its current value (if its type requires a value). Otherwise show the status of each of the specified arguments.
Example:
show maxhits
site <sitename> This command is currently unimplemented under version 3 of the archie system.
unset variable Remove any value associated with the specified vari- able. This may cause counter-intuitive behavior in some cases; for example, if maxhits is not defined by the user, the find command will print the internal default number of matches rather than an unlimited number of matches.
version Print the current version of the client.
whatis <substring> Search the Software Description Database for the given substring, ignoring case. This database consists of names and short descriptions of many software packages, documents (like RFCs and educational material), and data files stored on the Internet.
Example:
whatis uucp
in part gives as a result:
findpath.sh UUCP Pathfinder logfile-stats UUCP LOGFILE analyzer mapstats UUCP map statistics pro- gram
Variable Types The behavior of archie can be modified by certain variables, the values of which may be changed using the set command, or removed entirely by the unset command. There are three variable types:
boolean (Set or unset)
numeric (Integer within a defined range)
string (String of characters which may or may not be restricted).
If the value of a string variable should con- tain leading or trailing spaces then it should be quoted. Two ways of quoting text are to surround it with a pair of double quotes (`"'), or to precede individual char- acters with a backslash (`\'). (A double quote, or a backslash may itself be quoted by preceding it by a backslash.) The resulting value is that of the string with the quotes stripped off.
Numeric Variables maxhits Allow the find command to generate at most the speci- fied number of matches (permissible range: 0-1000; default: 100).
Example:
set maxhits 100
halts prog after 100 matches have been found in total.
maxhitspm Across all the anonymous FTP archives on the Internet (and even on one single anonymous FTP archive) many files will have the same name. For example, if you search for a very common filename like "README" you can get hundreds even thousands of matches. You can limit the number of files with the same name through this variable. For example,
set maxhitspm 100
tells the system only 100 files with the same name. Note that the overall maximum number of files returned is still controlled with the 'maxhits' variable.
maxmatch This variable will limit the number filenames returned. For example, if maxmatch is set to 2 and you perform a substring search for the string "etc", and the database contains filenames "etca", "betc" and "detc" only the filenames "etca" and "betc" will be returned. However, depending on the values of maxhitspm and maxhits you will get back a number of actual files with those names. Example:
set maxmatch 20
max split size Approximate maximum size, in bytes, of a file to be mailed to the user. Any output larger than this will be split in pieces of about this size. This can be set by the user in the range 1024 to ~2Gb with a default of 51200 bytes.
String Variables compress The kind of data compression the user can specify when mailing back output. Currently allowed values are none and compress (standard UNIX compress(1),withadefaultof
encode The type of post-compression encoding the user can specify when mailing back output. Currently allowed values are none and uuencode, with a default of none. Note that this variable is ignored unless compression is enabled (via the compress) variable.
language Allows the user to specify the language in which the help, etc. is presented. Currently the default value is english.
mailto If the mail command is issued with no arguments, mail the output of the last command to the address specified by this string variable. Initially this variable is unset.
Example:
set mailto user@frobozz.com
Conventional Internet addressing styles are understood. BITNET sites should use the convention:
user@sitename.bitnet
UUCP addresses can be specified as
user@sitename.uucp
match domain This variable allows users to restrict the scope of their search based upon the Fully Qualified Domain Names (FQDN) of the anonymous FTP sites being searched. In this way, the user can specify a colon-separated list of domain names to which all returned sites must match. Each component in the list is taken as the rightmost part of the FQDN. For example,
set match domain ca:internic.net:harvard.edu
means that the names of all returned sites must end in "ca" (Canada), "internic.net" (sites in the Internet NIC) or "harvard.edu" (sites at Harvard University).
While these are all real domain names, listing all pos- sible combinations for say, the USA, would quickly become tedious (and if you think that is bad, try list- ing all the countries on the Internet in Europe). To aid in this problem, the archie system has the concept of pseudo-domains to allow users to use a shorthand notation when using this facility. These pseudo-domains are defined on a server-by-server basis and you can use the domains command to query your current server for its list of predefined pseudo-domains.
A pseudo-domain is a list of real DNS domain names
and/or a list of other pseudo-domains. For example, the
archie administrator on the server could define the
pseudo-domain
"usa"
to be
"edu:mil:com:gov:us"
If this definition existed on the server, then you could
set match domain usa
which would be the same as saying
set match domain edu:mil:com:gov:us
In addition, the server administrator may define
"northamerica"
to be
"usa:ca:mx"
meaning that "northamerica" is composed of the psuedo- domain "usa" and the real domains "ca" (Canada) and "mx" (Mexico). This process can be repeated for 20 lev- els (more than sufficient for any naming scheme). By using the domains command you can determine what pseudo-domains your current server supports.
match path Sometimes you only would like your search (using the find) command to look for files or directories with a certain set of names in their full path.
For example, many anonymous FTP site administrators will put software packages for the MacIntosh in a path containing the name "mac" or "macintosh". Another exam- ple is when a document exists in several formats and you are only looking for the PostScript version. You can guess that the file may end in ".ps" or it maybe in a directory called "ps" or "PostScript".
This is usually guesswork, but is is useful to have the archie system only look for files or directories with particular components in their path name.
This variable allows you to do this. The arguments are a colon-separated list of possible path name com- ponents. In the last example above, saying
set match path ps:postscript
will restrict the search only to match those files or directories which have the strings "ps" or "postscript" in their path.
The comparison is always case-insensitive (regardless of the value of the match variable) and there is a log- ical OR connecting the components so that the above statement says: "find only files which have 'ps' OR 'postscript' in their path". If either component matches then the condition is satisfied.
output format Affects the way the output of find and list is displayed. User settable, with valid values of machine (machine readable format), terse and verbose, with a default of verbose.
search The type of search done by the find (or prog) command. User settable with a range of exact, regex, sub, sub- case, exact regex, exact sub and exact subcase with a default of sub. (The exact <x> types cause it to try exact first, then fall back to type <x> if no matches are found). The values have the following meanings:
exact Exact match (the fastest method). A match occurs if the file (or directory) name in the database corresponds exactly to the user-given substring (including case).
For example, this type of search could be used to locate all files called xlock.tar.Z
regex Allow user-specified (search) strings to take the form of ed(1) regular expressions.
Note: unless specifically anchored to the begin- ning (with ^) or end (with $) of a line, ed(1) regular expressions (effectively) have ``.*'' prepended and appended to them. For example, it is not necessary to type
find .*xnlock.*
because
find xnlock
suffices. In this instance, the regex match is equivalent a simple substring match. Those unfam- iliar with regular expressions should refer to the section entitled REGULAR EXPRESSIONS which appears below.
sub Substring (case insensitive). A match occurs if the file (or directory) name in the database con- tains the user-given substring, without regard to case.
Example:
The pattern: is
matches any of the following:
islington this poison
subcase Substring (case sensitive). As above, but taking case as significant.
Example:
The pattern:
TeX
will match:
LaTeX
but neither of the following:
Latex TExTroff
server the Prospero server to which the client connects when find or list commands are invoked. User settable, with a default value of localhost.
sortby Set the method of sorting to be applied to output from the find command. Typing the keyboard interrupt char- acter (generally Cntl-C on UNIX hosts) aborts a search. Unlike previous versions of the archie system, version 3 does not allow partial results. The output phase may be aborted by typing the abort character a second time. The five permitted methods (and their associated reverse orders) are:
none Unsorted (default; no reverse order, though rnone is accepted)
filename Sort files/directories by name, using lexical order (reverse order: rfilename)
hostname Sort on the archive hostname, in lexical order (reverse order: rhostname)
size Sort by size, largest files/directories first (reverse order: rsize)
time Sort by modification time, with the most recent file/directory names first (reverse order: rtime)
THE INTERACTIVE (TELNET) INTERFACE
The interactive interface accepts the following commands and variables in addtion to those listed above.
Commands stty [[<option> <character>] ...] This command allows the user to change the interpreta- tion of specified characters, in order to match their particular terminal type. At the moment only erase is recognized as an <option>. (Typically, <character> is a control character and may be specified as a pair of characters (e.g. control-h as the pair '^' followed by 'h'), the character itself (literal), or as a quoted pair or literal.
Without any arguments the command displays the current values of the recognized options.
mail [<address>] The output of the previous successful command (i.e. an invocation of find, list or whatis that produced out- put) is mailed to the specified electronic mail address. If no <address> is given the contents of the mailto variable are used. If this variable is not set then an error occurs, and nothing is mailed, although the output is still available to be mailed.
Example:
mail user1@hello.edu
Conventional Internet addressing styles are understood. BITNET sites should use the convention:
user@sitename.bitnet
UUCP addresses can be specified as
user@sitename.uucp
pager This command is included only for backward compatibil- ity. It has the same effect as set pager. Its use is discouraged and it will be removed in a future release.
nopager This command is included only for backward compatibil- ity. It has the same effect as unset pager. Its use is discouraged and it will be removed in a future release.
Variables autologout Set the length of idle time (in minutes) allowed before automatic logout (permissible range: 1-300; default: 60).
Example:
set autologout 45
logs the user out after 45 minutes of idle time.
pager Filter all output through the default pager (default: unset). When using the pager you may also want to set the term variable to your terminal type (see term vari- able).
Example:
set pager
status When set this variable will cause the system to report the position in the queue of your request on the server. In addition, it will display the estimated time to completion of your request. This estimate is based in an average of the amount of times similar queries have taken in the past several minutes. The variable also controls the display of a "spinner" during the database search, which indicates that we are awaiting results from the Prospero server. Set by default.
term Specify the type of terminal in use (and optionally, its size in rows and columns). This information is used by the pager.
The usage is:
set term <terminal-type> [<#rows> [<#columns>]]
The terminal type is mandatory, but the number of rows and columns is optional; specify either rows only, or both rows and columns (default: 24 rows, 80 columns). The default value for this variable is dumb. However it may be set automatically through the telnet protocol negotiation. Examples:
set term vt100 set term xterm 60 set term xterm 24 100
THE EMAIL INTERFACE
The archie email interface currently accepts the following commands in addition to those listed in the COMMANDS section above.
path <address> is an alias for
set mailto <address>
quit Ignore any further lines past this point in the mail. This is generally not needed, but can be used to prevent the system from interpreting signatures etc. as archie commands.
The Subject: line in incoming mail is processed as if it were part of the main message body.
A message not containing a valid request will be treated as a help request.
REGULAR EXPRESSIONS
Regular expressions follow the conventions of the ed(1) com- mand, allowing sophisticated pattern matching. In the fol- lowing discussion, the string containing a regular expres- sion will be called the ``pattern'', and the string against which it is to be matched is called the ``reference string''. Regular expressions imbue certain characters with special meaning, providing a quoting mechanism to remove this special meaning when required.
The rules governing regular expression are:
c A character c matches itself unless it has been assigned a special meaning as listed below. A special character loses its special meaning when preceded by the character '\'. This does not apply to '{', which is non-special until it is so treated. Thus, although '*' normally has special meaning, the string '\*' matches itself.
Example:
The pattern
acdef
matches any of the following:
s83acdeffff acdefsecs acdefsecs
but neither of the following:
accdef aacde1f
Example:
Normally the characters '*' and '$' are special, but the pattern
a\*bse\$
acts as above. Any reference string containing:
a*bse$
as a substring will be flagged as a match.
. A period (known as a wildcard character) matches any character except the newline character.
Example:
The pattern
....
will match any 4 characters in the reference string, except a newline character.
^ A caret (^) appearing at the beginning of a pattern requires that the reference string must start with the specified pattern (an escaped caret, or a caret appear- ing elsewhere in the pattern, is treated as a non- special character).
Example:
The pattern
^efghi
The pattern will match only those reference strings starting with efghi; thus, it will match either of the following:
efghi efghijlk
but not:
abcefghi
$ A dollar sign ($) appearing at the end of a pattern requires that the pattern appear at the end of a refer- ence string (an escaped dollar sign, or a dollar sign appearing elsewhere, is treated as a regular charac- ter).
Example:
The pattern
efghi$
Will match either of the following:
efghi abcdefghi
but not:
efghijkl
[string]
Match any single character within the brackets. The
caret (^) has a special meaning if it is the first
character in the series: the pattern will match any
character other than one in the list.
Example:
The pattern
[^abc]
Will match any character except one of:
a b c
To match a right bracket (]) in the list, put it first, as in:
[]ab01]
A caret appearing anywhere but the in first position is treated as a regular character. The minus (-) character is special within square brack- ets. It is used to define a range of ASCII characters to be matched. For example, the pattern:
[a-z]
matches any lower case letter. The minus can be made non-special by placing it first or last within the square brackets. The characters '$', '*' and '.' are not special within square brackets.
Example:
The pattern
[ab01]
matches a single occurrence of a character from the set:
a b 0 1
Example:
The pattern
[^ab01]
will match any single character other than one from the
set:
a b 0 1
Example :
The pattern
[a0-9b]
matches one of the characters:
a b
or a digit between 0 and 9, inclusive. Example :
The pattern
[^a0-9b.$]
matches any single character which is not in the set:
a b . $
or a digit between 0 and 9, inclusive.
- Match zero or more occurrences of an immediately
preceding regular expression.
Example:
The pattern
a*
matches zero or more occurrences of the character:
a
Example:
The pattern
[A-Z]*
matches zero or more occurrences of the upper case alphabet.
\{m\} Match exactly m occurrences of a preceding regular expression, where m is a non-negative integer between 0 and 255 (inclusive).
Example:
The pattern
ab\{3\}
matches any substring in the reference string consist- ing of the character `a' followed by exactly three `b' characters.
\{m,\} Match at least m occurrences of the preceding regular expression.
Example:
The pattern
ab\{3,\}
matches any substring in the reference string of the character `a' followed by at least three `b' charac- ters.
\{m,n\} Match between m and n occurrences of the preceding reg- ular expression (where n is a non-negative integer between 0 and 255, and n>m).
Example:
The pattern
ab\{3,5\}
matches any substring in the reference string consist- ing of the character `a' followed by at least three but at most five `b' characters.
Tips for Using Regular Expressions 1) When matching a substring it is not necessary to use the wildcard character to match the part of the refer- ence string preceding and following the substring.
Example:
The pattern
abcd
will match any reference string containing this pat- tern. It is not necessary to use
.*abcd.*
as the pattern.
2) In order to constrain a pattern to the entire reference pattern, use the construction:
^pattern$
3) The '[]' operator provides an easy mechanism to obtain case insensitivity. For example, to match the word:
hello
regardless of case, use the pattern:
[Hh][Ee][Ll][Ll][Oo]
THE ARCHIE DATABASE
The archie database subsystem maintains a list of about 900 Internet anonymous ftp(1) archive sites of approximately 2.1 million files containing 170 Gigabytes (that is, 170,000,000,000 bytes) of information. The current database requires about 250 MB of disk storage.
SEE ALSO
bitftp (1L), ftp(1), telnet(1), archie(1), xarchie(1)
AUTHORS
Bunyip Information Systems Inc., Montreal Canada (info@bunyip.com). Original manual page by R. P. C. Rodgers, UCSF School of Pharmacy, San Francisco, California 94143 (rodgers@maxwell.mmwb.ucsf.edu), Nelson H. F. Beebe (beebe@math.utah.edu), and Alan Emtage (bajan@bunyip.com).
archie>
/home/gen.uk/domains/wiki.gen.uk/public_html/data/pages/archive/internet/archie.man.txt · Last modified: 2002/06/22 04:02 by 127.0.0.1