GENWiki

Premier IT Outsourcing and Support Services within the UK

User Tools

Site Tools


rfc:rfc1439

Network Working Group C. Finseth Request for Comments: 1439 University of Minnesota

                                                            March 1993
                The Uniqueness of Unique Identifiers

Status of this Memo

 This memo provides information for the Internet community.  It does
 not specify an Internet standard.  Distribution of this memo is
 unlimited.

Abstract

 This RFC provides information that may be useful when selecting a
 method to use for assigning unique identifiers to people.

1. The Issue

 Computer systems require a way to identify the people associated with
 them.  These identifiers have been called "user names" or "account
 names."  The identifers are typically short, alphanumeric strings.
 In general, these identifiers must be unique.
 The uniqueness is usually achieved in one of three ways:
 1) The identifiers are assigned in a unique manner without using
 information associated with the individual.  Example identifiers are:
         ax54tv
         cs00034
 This method was often used by large timesharing systems.  While it
 achieved the uniqueness property, there was no way of guessing the
 identifier without knowing it through other means.
 2) The identifiers are assigned in a unique manner where the bulk of
 the identifier is algorithmically derived from the individual's name.
 Example identifers are:
         Craig.A.Finseth-1
         Finseth1
         caf-1
         fins0001
 3) The identifiers are in general not assigned in a unique manner:
 the identifier is algorithmically derived from the individual's name

Finseth [Page 1] RFC 1439 Uniqueness of Unique Identifiers March 1993

 and duplicates are handled in an ad-hoc manner.  Example identifiers
 are:
         Craig.Finseth
         caf
 Now that we have widespread electronic mail, an important feature of
 an identifier system is the ability to predict the identifier based
 on other information associated with the individual.  This other
 information is typically the person's name.
 Methods two and three make such predictions possible, especially if
 you have one example mapping from a person's name to the identifier.
 Method two relies on using some or all of the name and
 algorithmically varying it to ensure uniqueness (for example, by
 appending an integer).  Method three relies on using some or all of
 the name and selects an alternate identifier in the case of a
 duplication.
 For both methods, it is important to minimize the need for making the
 adjustments required to ensure uniqueness (i.e., an integer that is
 not 1 or an alternate identifier).  The probability that an
 adjustment will be required depends on the format of the identifer
 and the size of the organization.

2. Identifier Formats

 There are a number of popular identifier formats.  This section will
 list some of them and supply both typical and maximum values for the
 number of possible identifiers.  A "typical" value is the number that
 you are likely to run into in real life.  A "maximum" value is the
 largest number of possible (without getting extreme about it) values.
 All ranges are expressed as a number of bits.

2.1 Initials

 There are three popular formats based on initials: those with one,
 two, or three letters.  (The number of people with more than three
 initials is assumed to be small.)  Values:
         format                  typical         maximum
         I                       4               5
         II                      8               10
         III                     12              15

Finseth [Page 2] RFC 1439 Uniqueness of Unique Identifiers March 1993

 You can also think of these as first, middle, and last initials:
         I                       4               5
         F L                     8               10
         F M L                   12              15

2.2 Names

 Again, there are three popular formats based on using names: those
 with the first name, last name, and both first and last names.
 Values:
         format                  typical         maximum
         First                   8               14
         Last                    9               13
         First Last              17              27

2.3 Combinations

 I have seen these combinations in use ("F" is first initial, "M" is
 middle initial, and "L" is last initial):
         format                  typical         maximum
         F Last                  13              18
         F M Last                17              23
         First L                 12              19
         First M Last            21              32

2.4 Complete List

 Here are all possible combinations of nothing, initial, and full name
 for first, middle, and last.  The number of Middle names is assumed
 to be the same as the number of First names.  Values:
         format                  typical         maximum
         _ _ _                   0               0
         _ _ L                   4               5
         _ _ Last                9               13
         _ M _                   4               5
         _ M L                   5               10
         _ M Last                13              18
         _ Middle _              8               14
         _ Middle L              12              19

Finseth [Page 3] RFC 1439 Uniqueness of Unique Identifiers March 1993

         _ Middle Last           17              27
         F _ _                   4               5
         F _ L                   5               10
         F _ Last                13              18
         F M _                   5               10
         F M L                   12              15
         F M Last                17              23
         F Middle _              12              19
         F Middle L              16              24
         F Middle Last           21              32
         First _ _               8               14
         First _ L               12              19
         First _ Last            17              27
         First M _               12              19
         First M L               16              24
         First M Last            21              32
         First Middle _          16              28
         First Middle L          20              33
         First Middle Last       26              40

3. Probabilities of Duplicates

 As can be seen, the information content in these identifiers in no
 case exceeds 40 bits and the typical information content never
 exceeds 26 bits.  The content of most of them is in the 8 to 20 bit
 range.  Duplicates are thus not only possible but likely.
 The method used to compute the probability of duplicates is the same
 as that of the well-known "birthday" problem.  For a universe of N
 items, the probability of duplicates in X members is expressed by:
         N   N-1   N-2         N-(X-1)
         - x --- x --- x ... x -------
         N    N     N             N
 A program to compute this function for selected values of N is given
 in the appendix, as is its complete output.
 The "1%" column is the number of items (people) before an
 organization of that (universe) size has a 1% chance of a duplicate.
 Similarly for 2%, 5%, 10%, and 20%.

Finseth [Page 4] RFC 1439 Uniqueness of Unique Identifiers March 1993

         bits       universe     1%      2%      5%      10%     20%
          6                 64   2       3       4       5       6
          7                128   3       3       5       6       8
          8                256   3       4       6       8       12
          9                512   4       6       8       11      16
         10              1,024   6       7       11      16      22
         11              2,048   7       10      15      22      31
         12              4,096   10      14      21      30      44
         13              8,192   14      19      30      43      61
         14             16,384   19      27      42      60      86
         15             32,768   27      37      59      84      122
         16             65,536   37      52      83      118     172
         17            131,072   52      74      117     167     243
         18            262,144   74      104     165     236     343
         19            524,288   104     147     233     333     485
         20          1,048,576   146     207     329     471     685
         21          2,097,152   206     292     465     666     968
         22          4,194,304   291     413     657     941     1369
         23          8,388,608   412     583     929     1330    1936
         24         16,777,216   582     824     1313    1881    2737
         25         33,554,432   822     1165    1856    2660    3871
         26         67,108,864   1162    1648    2625    3761    5474
         27        134,217,728   1644    2330    3712    5319    7740
         28        268,435,456   2324    3294    5249    7522    10946
         29        536,870,912   3286    4659    7422    10637   15480
         30      1,073,741,824   4647    6588    10496   15043   21891
         31      2,147,483,648   6571    9316    14844   21273   30959
 For example, assume an organization were to select the "First Last"
 form.  This form has 17 bits (typical) and 27 bits (maximum) of
 information.  The relevant line is:
         17            131,072   52      74      117     167     243
 For an organization with 100 people, the probability of a duplicate
 would be between 2% and 5% (probably around 4%).  If the organization
 had 1,000 people, the probability of a duplicate would be much
 greater than 20%.

Appendix: Reuse of Identifiers and Privacy Issues

 Let's say that an organization were to select the format:
         First.M.Last-#
 as my own organization has.  Is the -# required, or can one simply
 do:

Finseth [Page 5] RFC 1439 Uniqueness of Unique Identifiers March 1993

         Craig.A.Finseth
 for the first one and
         Craig.A.Finseth-2
 (or -1) for the second?  The answer is "no," although for non-obvious
 reasons.
 Assume that the organization has made this selection and a third
 party wants to send e-mail to Craig.A.Finseth.  Because of the
 Electronic Communications Privacy Act of 1987, an organization must
 treat electronic mail with care.  In this case, there is no way for
 the third party user to reliably know that sending to Craig.A.Finseth
 is (may be) the wrong party.  On the other hand, if the -# suffix is
 always present and attempts to send mail to the non-suffix form are
 rejected, the third party user will realize that they must have the
 suffix in order to have a unique identifier.
 For similar reasons, identifiers in this form should not be re-used
 in the life of the mail system.

Appendix: Perl Program to Compute Probabilities

 #!/usr/local/bin/perl
 for $bits (6..31) {
         &Compute($bits);
         }
 exit(0);
 # ------------------------------------------------------------
 sub Compute {
         $bits = $_[0];
         $num = 1 << $bits;
         $cnt = $num;
         print "bits $bitsnumber $num:0;
         for ($prob = 1; $prob > 0.99; ) {
                 $prob *= $cnt / $num;
                 $cnt--;
                 }
         print "", $num - $cnt, "$prob0;
         for (; $prob > 0.98; ) {

Finseth [Page 6] RFC 1439 Uniqueness of Unique Identifiers March 1993

                 $prob *= $cnt / $num;
                 $cnt--;
                 }
         print "", $num - $cnt, "$prob0;
         for (; $prob > 0.95; ) {
                 $prob *= $cnt / $num;
                 $cnt--;
                 }
         print "", $num - $cnt, "$prob0;
         for (; $prob > 0.90; ) {
                 $prob *= $cnt / $num;
                 $cnt--;
                 }
         print "", $num - $cnt, "$prob0;
         for (; $prob > 0.80; ) {
                 $prob *= $cnt / $num;
                 $cnt--;
                 }
         print "", $num - $cnt, "$prob0;
         print "0;
         }

Appendix: Perl Program Output

 bits 6  number 64:
         2       0.984375
         3       0.95361328125
         4       0.90891265869140625
         5       0.85210561752319335938
         6       0.78553486615419387817
 bits 7  number 128:
         3       0.9766845703125
         3       0.9766845703125
         5       0.92398747801780700684
         6       0.88789421715773642063
         8       0.79999355674331695809
 bits 8  number 256:
         3       0.988311767578125
         4       0.97672998905181884766
         6       0.94268989971169503406
         8       0.89542306910786462204
         12      0.76969425214152431547

Finseth [Page 7] RFC 1439 Uniqueness of Unique Identifiers March 1993

 bits 9  number 512:
         4       0.98832316696643829346
         6       0.97102570187075798458
         8       0.94652632751096643648
         11      0.89748056780293572476
         16      0.78916761796439427457
 bits 10 number 1024:
         6       0.98543241551841020964
         7       0.97965839745873206645
         11      0.94753115178840541244
         16      0.88888866335604777014
         22      0.79677613655632184564
 bits 11 number 2048:
         7       0.98978773152834598203
         10      0.97823367137821537476
         15      0.94990722378677450166
         22      0.89298119682681720288
         31      0.79597589885472519455
 bits 12 number 4096:
         10      0.98906539062491305447
         14      0.97800426773009718762
         21      0.94994111694430838355
         30      0.89901365764115603874
         44      0.79312138620093930452
 bits 13 number 8192:
         14      0.98894703242829806733
         19      0.97932692503837115439
         30      0.94822407309193512681
         43      0.89545741661906652631
         61      0.7993625840767998314
 bits 14 number 16384:
         19      0.98961337517641645434
         27      0.97879319536756481668
         42      0.94876352395820107155
         60      0.89748107890372830209
         86      0.79973683158771624591
 bits 15 number 32768:
         27      0.98934263776790121181
         37      0.97987304880641035165
         59      0.94909471808051404373
         84      0.89899774209805793923
         122     0.79809378598190949816

Finseth [Page 8] RFC 1439 Uniqueness of Unique Identifiers March 1993

 bits 16 number 65536:
         37      0.98988724065590050216
         52      0.97996496661944154649
         83      0.94937874420413270737
         118     0.89996948010355670711
         172     0.79884228150816105618
 bits 17 number 131072:
         52      0.98993311138884398925
         74      0.97960010416289267088
         117     0.94952974978505377823
         167     0.89960828942716541956
         243     0.79894309171178368167
 bits 18 number 262144:
         74      0.98974844864797828503
         104     0.97977315557223210174
         165     0.94968621078621640041
         236     0.8995926348279144058
         343     0.7994422793765953994
 bits 19 number 524288:
         104     0.98983557888923057178
         147     0.97973841652874515962
         233     0.94974719445364064185
         333     0.89991342619657743729
         485     0.79936749144148444568
 bits 20 number 1048576:
         146     0.98995567500195758015
         207     0.97987072919607220989
         329     0.94983990872655321702
         471     0.89980857451706741656
         685     0.79974215234216872172
 bits 21 number 2097152:
         206     0.98998177463778547214
         292     0.97994400939715686771
         465     0.94985589918092261374
         666     0.89978055267663470396
         968     0.79994886751736571373
 bits 22 number 4194304:
         291     0.98999013137747737812
         413     0.97991951242142538714
         657     0.94991674892578203959
         941     0.89991652739633254399
         1369    0.79989205747440361716

Finseth [Page 9] RFC 1439 Uniqueness of Unique Identifiers March 1993

 bits 23 number 8388608:
         412     0.98995762604049764022
         583     0.97997846530691334888
         929     0.94991024716640248826
         1330    0.89999961063320443877
         1936    0.79987028265451087794
 bits 24 number 16777216:
         582     0.98997307486745211857
         824     0.97999203469417239809
         1313    0.94995516684099989835
         1881    0.89997049960675035152
         2737    0.79996700222056416063
 bits 25 number 33554432:
         822     0.98999408609360783906
         1165    0.9799956928177964155
         1856    0.9499899669674316538
         2660    0.8999664414095410736
         3871    0.79992328289672998132
 bits 26 number 67108864:
         1162    0.98999884535478044345
         1648    0.9799801637652703068
         2625    0.94997437525354821997
         3761    0.89999748465616635773
         5474    0.79993922903192515861
 bits 27 number 134217728:
         1644    0.9899880636014986024
         2330    0.97998730103356856969
         3712    0.94997727934463771504
         5319    0.89998552434244594167
         7740    0.79999591580103557309
 bits 28 number 268435456:
         2324    0.98999458855588851058
         3294    0.97999828329325222587
         5249    0.94998397932368705554
         7522    0.89998576049206902017
         10946   0.79999058777500076101
 bits 29 number 536870912:
         3286    0.98999717306002099626
         4659    0.97999160965267329004
         7422    0.94999720388831232487
         10637   0.89999506567702891591
         15480   0.7999860979665908145

Finseth [Page 10] RFC 1439 Uniqueness of Unique Identifiers March 1993

 bits 30 number 1073741824:
         4647    0.98999674474047760775
         6588    0.97999531736215383937
         10496   0.94999806770951356061
         15043   0.89999250738244507275
         21891   0.79999995570982085358
 bits 31 number 2147483648:
         6571    0.98999869761078929109
         9316    0.97999801528523688976
         14844   0.94999403283519279206
         21273   0.89999983631135749285
         30959   0.79999272222201334159

References

 Bruce Lansky (1984).  The Best Baby Name Book.  Deephaven, MN:
 Meadowbrook.  ISBN 0-671-54463-2.
 Lareina Rule (1988).  Name Your Baby.  Bantam.  ISBN 0-553-27145-8.

Security Considerations

 Security issues are not discussed in this memo.

Author's Address

 Craig A. Finseth
 Networking Services
 Computer and Information Services
 University of Minnesota
 130 Lind Hall
 207 Church St. SE
 Minneapolis, MN 55455-0134
 EMail: Craig.A.Finseth-1@umn.edu or
        fin@unet.umn.edu
 Phone: +1 612 624 3375
 Fax: +1 612 626 1002

Finseth [Page 11]

/home/gen.uk/domains/wiki.gen.uk/public_html/data/pages/rfc/rfc1439.txt · Last modified: 1993/03/24 22:20 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki