GENWiki

Premier IT Outsourcing and Support Services within the UK

User Tools

Site Tools


rfc:rfc1625

Network Working Group M. St. Pierre Request for Comments: 1625 WAIS, Inc. Category: Informational J. Fullton

                                                                 CNIDR
                                                             K. Gamiel
                                                                 CNIDR
                                                            J. Goldman
                                               Thinking Machines Corp.
                                                              B. Kahle
                                                            WAIS, Inc.
                                                              J. Kunze
                                                           UC Berkeley
                                                             H. Morris
                                                            WAIS, Inc.
                                                      F. Schiettecatte
                                                         FS Consulting
                                                             June 1994
                       WAIS over Z39.50-1988

Status of this Memo

 This memo provides information for the Internet community.  This memo
 does not specify an Internet standard of any kind.  Distribution of
 this memo is unlimited.

1. Introduction

 The network publishing system, Wide Area Information Servers (WAIS),
 is designed to help users find information over a computer network.
 The principles guiding WAIS development are:
       1. A wide-area networked-based information system for searching,
          browsing, and publishing.
       2. Based on standards.
       3. Easy to use.
       4. Flexible and growth oriented.
 From this basis, a large group of developers, publishers, standards
 bodies, libraries, government agencies, schools, and users have been
 helping further the WAIS system.
 The WAIS software architecture has four main components: the client,
 the server, the database, and the protocol.  The WAIS client is a
 user-interface program that sends requests for information to local
 or remote servers.  Clients are available for most popular desktop
 environments.  The WAIS server is a program that services client

IIIR Working Group [Page 1] RFC 1625 WAIS over Z39.50-1988 June 1994

 requests, and is available on a variety of UNIX platforms.  The
 server generally runs on a machine containing one or more information
 sources, or WAIS databases.  The protocol, Z39.50-1988, is used to
 connect WAIS clients and servers and is based on the 1988 Version of
 the NISO Z39.50 Information Retrieval Service and Protocol Standard.
 The goal of the WAIS network publishing system is to create an open
 architecture of information clients and servers by using a standard
 computer-to-computer protocol that enables clients to communicate
 with servers.
 WAIS development began in October 1989 with the first Internet
 release occurring in April 1991.  From the beginning, WAIS committed
 to use the Z39.50-1988 standard as the information retrieval protocol
 between WAIS clients and servers.  The implementation is still in use
 today by existing WAIS clients and servers resulting in over 50,000
 users of Z39.50-1988 on the Internet.

2. Purpose

 The purpose of this memo is to initiate a discussion for a migration
 path of the WAIS technology from Z39.50-1988 Information Retrieval
 Service Definitions and Protocol Specification for Library
 Applications [1] to Z39.50-1992 [2] and then to Z39.50-1994 [3].  The
 purpose of this memo is not to provide a detailed implementation
 specification, but rather to describe the high-level design goals and
 functional assumptions made in the WAIS implementation of Z39.50-
 1988.  WAIS use of Z39.50-1992 and Z39.50-1994 standards will be the
 subject of future RFCs.

3. Historical Design Goals of WAIS

 As an aid to understanding the original WAIS implementation and its
 use of Z39.50-1988, the historical design goals of WAIS are presented
 in this section.  Included with each goal is a brief description of
 the assumptions used to meet these design goals.
       1. Provide users access to bibliographic and non-bibliographic
          information, including full-text and images.
 Because Z39.50-1988 grew out of the bibliographic community,
 additional assumptions with the protocol were required to serve non-
 bibliographic information.  They were also necessary to serve
 documents existing in multiple formats (e.g., rtf, postscript, gif,
 etc.).
       2. Keep the client/server interface simple and independent of
          changes in the functionality of the server.

IIIR Working Group [Page 2] RFC 1625 WAIS over Z39.50-1988 June 1994

 To achieve this, the text string entered by the user was transmitted
 to the server without parsing the string into a Type-1 RPN (reverse-
 polish notation) query, as is common for bibliographic applications.
 Instead WAIS defined a new Type-3 query containing the text string.
 In this way, knowledge of the Z39.50 Attributes supported by the
 server was no longer required by the client or the user, as is true
 of many existing Z39.50 implementations.  In addition, the client
 software did not require modification to support the evolving
 functionality of the server.
       3. Provide relevance feedback capability.
 Relevance feedback is the ability to select a document, or portion of
 a document, and find a set of documents similar to the selection.
 WAIS included documents used in relevance feedback as part of the
 Type-3 query.
       4. Permit the server to operate in a stateless manner.
 A WAIS server was designed to be "stateless", meaning that search
 result sets were not stored by the server.  In Z39.50 terms, the
 server exercised its right to unilaterally delete a result set as
 soon as it sent the search response.  For this reason, the Present
 Facility of Z39.50 was not used, and retrievals were performed using
 the Search Facility.  Relaxing this constraint in future
 implementations may prove the most prudent path.
       5. Provide the ability for a client to retrieve documents in
          pieces.
 Because retrieval of a portion of a document could be done several
 ways with Z39.50-1988, specific assumptions were made to implement
 this functionality.  Accessing a portion of a document was required
 for both retrieval and for relevance feedback.
       6. Run over TCP.
 The Z39.50-1988 standard was designed to run in the application layer
 using the presentation services provided by the Open Systems
 Interconnection (OSI) Reference Model.  Due to the popularity of
 TCP/IP and the Internet, WAIS was designed to run over TCP.  Use of
 Z39.50 over TCP is described in [4].

4. WAIS Implementation of Z39.50-1988

 By working with the Z39.50 Implementors Group (ZIG), the WAIS
 developers used a recommended subset of Z39.50-1988 and specific
 assumptions to fulfill its requirements.  Over time, many of these

IIIR Working Group [Page 3] RFC 1625 WAIS over Z39.50-1988 June 1994

 requirements have then gone into the definition of subsequent
 versions of Z39.50.  As new requirements become apparent, WAIS will
 document any additional assumptions and work with the ZIG in
 developing extensions.
 WAIS supported the Init and Search Facilities of Z39.50-1988.  Both
 search and retrieval were implemented using the Search Facility, as
 described in this section.
 Search was initiated by the client with a Search Request APDU
 (Application Protocol Data Unit) using a Type-3 query.  The query
 contained two main fields:
       1. The "seed words", or text, typed by the user.
       2. A list of document objects, where a document object is a
          full document, or portion thereof, to be used in relevance
          feedback.  Each document object contains a document
          identifier (Doc-ID) [5], type, chunk-code, and start and
          end locations.  The Doc-ID and type specify the location and
          format, respectively, of the document.  The chuck-code
          determines the unit of measure for the start and end
          locations.  Examples of chunk-codes used include
          byte, line, paragraph, and full document.  If the chunk code
          is a full document, the start and end locations are ignored.
 A Search Response APDU returned by the server contained a relevance
 ranked list of records, or WAIS Citations.  A WAIS Citation refers to
 a document on the server.  Each WAIS Citation contains the following
 fields:
       1. Headline - a set of words that convey the main idea of the
          document.
       2. Rank - the numerical score of the document based on its
          relevance to the query, normalized to a top score of 1000.
       3. List of available formats - e.g. text, postscript, tiff, etc.
       4. Doc-ID - the location of the document.
       5. Length - the length of the document in bytes.
 The number of WAIS Citations returned was limited by the preferred
 message size negotiated during the Init.
 Retrieval of a document was initiated by the client with a Search
 Request APDU using a Type-1 query.  The query contained up to four
 terms:
       1. Term: Doc-ID
          Use Attribute: system-control-number      code = "un"
          Relation Attribute: equal                 code = "re"

IIIR Working Group [Page 4] RFC 1625 WAIS over Z39.50-1988 June 1994

       2. Term: the requested document format
          Use Attribute: data-type                  code = "wt"
          Relation Attribute: equal                 code = "re"
       3. Term: the start location
          Use Attribute: paragraph, line, byte      code = "wp", "wl",
                                                           "wb"
          Relation Attribute: greater-than-or-equal code = "ro"
       4. Term: the end location
          Use Attribute: paragraph, line, byte      code = "wp", "wl",
                                                           "wb"
          Relation Attribute: less-than             code = "rl"
 Because full-text and images were often larger in size than the
 receive buffer of the client, clients were designed to optionally
 retrieve documents in chunks, specifying the start and end positions
 of the chunk in the query.  An example of a fully-specified retrieval
 query is:
 query = ( ( use = "un", relation = "re", term = <Doc-ID> )
           AND
           ( use = "wt", relation = "re", term = postscript )
           AND
           ( use = "wb", relation = "ro", term = 0 )
           AND
           ( use = "wb", relation = "ro", term = 2000 )
          )
 A retrieval response was issued by the server with a Search Response
 APDU. In this case a single record corresponding to the requested
 document, or portion thereof, was returned in the specified format.

5. Security Considerations

 Security issues are not discussed in this memo.

6. References

 [1] National Information Standards Organization (NISO).  American
     National Standard Z39.50, Information Retrieval Service
     Definition and Protocol Specifications for Library Applications,
     New Brunswick, NJ, Transaction Publishers; 1988.
 [2] ANSI/NISO Z30.50-1992 (version 2) Information Retrieval Service
     and Protocol: American National Standard, Information Retrieval
     Application Service Definition and Protocol Specification for
     Open Systems Interconnection, 1992.

IIIR Working Group [Page 5] RFC 1625 WAIS over Z39.50-1988 June 1994

 [3] Z39.50 Version 3: Draft 8", October 1993.  Maintenance Agency
     Reference: Z39.50MA-034.
 [4] Lynch, C., "Using the Z39.50 Information Retrieval Protocol
     in the Internet Environment", Work in Progress, November 1993.
 [5] "Document Identifiers, or International Standard Book Numbers
     for the Electronic Age", Brewster Kahle, Thinking Machines
     Corporation, see URL=<ftp://wais.com/pub/protocol/doc-ids.txt>,
     September 1991.

7. Authors' Addresses

 Margaret St. Pierre
 WAIS Incorporated
 1040 Noel Drive
 Menlo Park, California  94025
 Phone: (415) 327-WAIS
 Fax:   (415) 327-6513
 EMail: saint@wais.com
 Jim Fullton
 Clearinghouse for Networked Information
 Discovery & Retrieval
 3021 Cornwallis Road
 Research Triangle Park, North Carolina  27709-2889
 Phone: (919)-248-9247
 Fax:   (919)-248-1101
 EMail: jim.fullton@cnidr.org
 Kevin Gamiel
 Clearinghouse for Networked Information
 Discovery & Retrieval
 3021 Cornwallis Road
 Research Triangle Park, North Carolina  27709-2889
 Phone: (919)-248-9247
 Fax:   (919)-248-1101
 EMail: kevin.gamiel@cnidr.org

IIIR Working Group [Page 6] RFC 1625 WAIS over Z39.50-1988 June 1994

 Jonathan Goldman
 Thinking Machines Corporation
 1010 El Camino Real, Suite 310
 Menlo Park, California  94025
 Phone: (415) 329-9300 x229
 Fax:   (415) 329-9329
 EMail: jonathan@think.com
 Brewster Kahle
 WAIS Incorporated
 1040 Noel Drive
 Menlo Park, California  94025
 Phone: (415) 327-WAIS
 Fax:   (415) 327-6513
 EMail: brewster@wais.com
 John A. Kunze
 UC Berkeley
 289 Evans Hall
 Berkeley, California  94720
 Phone: (510) 642-1530
 Fax: (510) 643-5385
 EMail: jak@violet.berkeley.edu
 Harry Morris
 WAIS Incorporated
 1040 Noel Drive
 Menlo Park, California  94025
 Phone: (415) 327-WAIS
 Fax:   (415) 327-6513
 EMail: morris@wais.com
 Francois Schiettecatte
 FS Consulting
 435 Highland Avenue
 Rochester, New York  14620
 Phone: (716) 256-2850
 EMail: francois@wais.com

IIIR Working Group [Page 7]

/home/gen.uk/domains/wiki.gen.uk/public_html/data/pages/rfc/rfc1625.txt · Last modified: 1994/06/07 21:01 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki