GENWiki

Premier IT Outsourcing and Support Services within the UK

User Tools

Site Tools


rfc:rfc2517

Network Working Group R. Moats Request for Comments: 2517 R. Huber Category: Informational AT&T

                                                     February 1999
     Building Directories from DNS: Experiences from WWWSeeker

Status of this Memo

 This memo provides information for the Internet community.  It does
 not specify an Internet standard of any kind.  Distribution of this
 memo is unlimited.

Copyright Notice

 Copyright (C) The Internet Society (1999).  All Rights Reserved.

Abstract

 There has been much discussion and several documents written about
 the need for an Internet Directory.  Recently, this discussion has
 focused on ways to discover an organization's domain name without
 relying on use of DNS as a directory service.  This memo discusses
 lessons that were learned during InterNIC Directory and Database
 Services' development and operation of WWWSeeker, an application that
 finds a web site given information about the name and location of an
 organization.  The back end database that drives this application was
 built from information obtained from domain registries via WHOIS and
 other protocols.  We present this information to help future
 implementors avoid some of the blind alleys that we have already
 explored.  This work builds on the Netfind system that was created by
 Mike Schwartz and his team at the University of Colorado at Boulder
 [1].

1. Introduction

 Over time, there have been several RFCs [2, 3, 4] about approaches
 for providing Internet Directories.  Many of the earlier documents
 discussed white pages directories that supply mappings from a
 person's name to their telephone number, email address, etc.
 More recently, there has been discussion of directories that map from
 a company name to a domain name or web site.  Many people are using
 DNS as a directory today to find this type of information about a
 given company.  Typically when DNS is used, users guess the domain
 name of the company they are looking for and then prepend "www.".
 This makes it highly desirable for a company to have an easily

Moats & Huber Informational [Page 1] RFC 2517 Building Directories from DNS February 1999

 guessable name.
 There are two major problems here.  As the number of assigned names
 increases, it becomes more difficult to get an easily guessable name.
 Also, the TLD must be guessed as well as the name.  While many users
 just guess ".COM" as the "default" TLD today, there are many two-
 letter country code top-level domains in current use as well as other
 gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of additional
 gTLDs in the future.  As the number of TLDs in general use increases,
 guessing gets more difficult.
 Between July 1996 and our shutdown in March 1998, the InterNIC
 Directory and Database Services project maintained the Netfind search
 engine [1] and the associated database that maps organization
 information to domain names. This database thus acted as the type of
 Internet directory that associates company names with domain names.
 We also built WWWSeeker, a system that used the Netfind database to
 find web sites associated with a given organization.  The experienced
 gained from maintaining and growing this database provides valuable
 insight into the issues of providing a directory service.  We present
 it here to allow future implementors to avoid some of the blind
 alleys that we have already explored.

2. Directory Population

2.1 What to do?

 There are two issues in populating a directory: finding all the
 domain names (building the skeleton) and associating those domains
 with entities (adding the meat).  These two issues are discussed
 below.

2.2 Building the skeleton

 In "building the skeleton", it is popular to suggest using a variant
 of a "tree walk" to determine the domains that need to be added to
 the directory.  Our experience is that this is neither a reasonable
 nor an efficient proposal for maintaining such a directory.  Except
 for some infrequent and long-standing DNS surveys [5], DNS "tree
 walks" tend to be discouraged by the Internet community, especially
 given that the frequency of DNS changes would require a new tree walk
 monthly (if not more often).  Instead, our experience has shown that
 data on allocated DNS domains can usually be retrieved in bulk
 fashion with FTP, HTTP, or Gopher (we have used each of these for
 particular TLDs).  This has the added advantage of both "building the
 skeleton" and "adding the meat" at the same time.  Our favorite
 method for finding a server that has allocated DNS domain information
 is to start with the list maintained at

Moats & Huber Informational [Page 2] RFC 2517 Building Directories from DNS February 1999

 http://www.alldomains.com/countryindex.html and go from there.
 Before this was available, it was necessary to hunt for a registry
 using trial and error.
 When maintaining the database, existing domains may be verified via
 direct DNS lookups rather than a "tree walk." "Tree walks" should
 therefore be the choice of last resort for directory population, and
 bulk retrieval should be used whenever possible.

2.3 Adding the meat

 A possibility for populating a directory ("adding the meat") is to
 use an automated system that makes repeated queries using the WHOIS
 protocol to gather information about the organization that owns a
 domain.  The queries would be made against a WHOIS server located
 with the above method. At the conclusion of the InterNIC Directory
 and Database Services project, our backend database contained about
 2.9 million records built from data that could be retrieved via
 WHOIS.  The entire database contained 3.25 million records, with the
 additional records coming from sources other than WHOIS.
 In our experience this information contains many factual and
 typographical errors and requires further examination and processing
 to improve its quality.  Further, TLD registrars that support WHOIS
 typically only support WHOIS information for second level domains
 (i.e. ne.us) as opposed to lower level domains (i.e.
 windrose.omaha.ne.us).  Also, there are TLDs without registrars, TLDs
 without WHOIS support, and still other TLDs that use other methods
 (HTTP, FTP, gopher) for providing organizational information.  Based
 on our experience, an implementor of an internet directory needs to
 support multiple protocols for directory population.  An automated
 WHOIS search tool is necessary, but isn't enough.

3. Directory Updating: Full Rebuilds vs Incremental Updates

 Given the size of our database in April 1998 when it was last
 generated, a complete rebuild of the database that is available from
 WHOIS lookups would require between 134.2 to 167.8 days just for
 WHOIS lookups from a Sun SPARCstation 20. This estimate does not
 include other considerations (for example, inverting the token tree
 required about 24 hours processing time on a Sun SPARCstation 20)
 that would increase the amount of time to rebuild the entire
 database.
 Whether this is feasible depends on the frequency of database updates
 provided.  Because of the rate of growth of allocated domain names
 (150K-200K new allocated domains per month in early 1998), we
 provided monthly updates of the database. To rebuild the database

Moats & Huber Informational [Page 3] RFC 2517 Building Directories from DNS February 1999

 each month (based on the above time estimate) would require between 3
 and 5 machines to be dedicated full time (independent of machine
 architecture).  Instead, we checkpointed the allocated domain list
 and rebuild on an incremental basis during one weekend of the month.
 This allowed us to complete the update on between 1 and 4 machines (3
 Sun SPARCstation 20s and a dual-processor Sparcserver 690) without
 full dedication over a couple of days.  Further, by coupling
 incremental updates with periodic refresh of existing data (which can
 be done during another part of the month and doesn't require full
 dedication of machine hardware), older records would be periodically
 updated when the underlying information changes.  The tradeoff is
 timeliness and accuracy of data (some data in the database may be
 old) against hardware and processing costs.

4. Directory Presentation: Distributed vs Monolithic

 While a distributed directory is a desirable goal, we maintained our
 database as a monolithic structure.  Given past growth, it is not
 clear at what point migrating to a distributed directory becomes
 actually necessary to support customer queries.  Our last database
 contained over 3.25 million records in a flat ASCII file.  Searching
 was done via a PERL script of an inverted tree (also produced by a
 PERL script).  While admittedly primitive, this configuration
 supported over 200,000 database queries per month from our production
 servers.
 Increasing the database size only requires more disk space to hold
 the database and inverted tree. Of course, using database technology
 would probably improve performance and scalability, but we had not
 reached the point where this technology was required.

5. Security Considerations

 The underlying data for the type of directory discussed in this
 document is already generally available through WHOIS, DNS, and other
 standard interfaces.  No new information is made available by using
 these techniques though many types of search become much easier.  To
 the extent that easier access to this data makes it easier to find
 specific sites or machines to attack, security may be decreased.
 The protocols discussed here do not have built-in security features.
 If one source machine is spoofed while the directory data is being
 gathered, substantial amounts of incorrect and misleading data could
 be pulled in to the directory and be spread to a wider audience.

Moats & Huber Informational [Page 4] RFC 2517 Building Directories from DNS February 1999

 In general, building a directory from registry data will not open any
 new security holes since the data is already available to the public.
 Existing security and accuracy problems with the data sources are
 likely to be amplified.

6. Acknowledgments

 This work described in this document was partially supported by the
 National Science Foundation under Cooperative Agreement NCR-9218179.

7. References

 [1] M. F. Schwartz, C. Pu.  "Applying an Information
     Gathering Architecture to Netfind: A White Pages Tool for a
     Changing and Growing Internet", University of Colorado Technical
     Report CU-CS-656-93.  December 1993, revised July 1994.
     URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind
 [2] Sollins, K., "Plan for Internet Directory Services", RFC 1107,
     July 1989.
 [3] Hardcastle-Kille, S., Huizer, E., Cerf, V., Hobby, R. and S.
     Kent, "A Strategic Plan for Deploying an Internet X.500 Directory
     Service", RFC 1430, February 1993.
 [4] Postel, J. and  C. Anderson, "White Pages Meeting Report", RFC
     1588, February 1994.
 [5] M. Lottor, "Network Wizards Internet Domain Survey", available
     from http://www.nw.com/zone/WWW/top.html

Moats & Huber Informational [Page 5] RFC 2517 Building Directories from DNS February 1999

8. Authors' Addresses

 Ryan Moats
 AT&T
 15621 Drexel Circle
 Omaha, NE 68135-2358
 USA
 EMail:  jayhawk@att.com
 Rick Huber
 AT&T
 Room C3-3B30, 200 Laurel Ave. South
 Middletown, NJ 07748
 USA
 EMail: rvh@att.com

Moats & Huber Informational [Page 6] RFC 2517 Building Directories from DNS February 1999

9. Full Copyright Statement

 Copyright (C) The Internet Society (1999).  All Rights Reserved.
 This document and translations of it may be copied and furnished to
 others, and derivative works that comment on or otherwise explain it
 or assist in its implementation may be prepared, copied, published
 and distributed, in whole or in part, without restriction of any
 kind, provided that the above copyright notice and this paragraph are
 included on all such copies and derivative works.  However, this
 document itself may not be modified in any way, such as by removing
 the copyright notice or references to the Internet Society or other
 Internet organizations, except as needed for the purpose of
 developing Internet standards in which case the procedures for
 copyrights defined in the Internet Standards process must be
 followed, or as required to translate it into languages other than
 English.
 The limited permissions granted above are perpetual and will not be
 revoked by the Internet Society or its successors or assigns.
 This document and the information contained herein is provided on an
 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Moats & Huber Informational [Page 7]

/data/webs/external/dokuwiki/data/pages/rfc/rfc2517.txt · Last modified: 1999/02/06 00:07 (external edit)