GENWiki

Premier IT Outsourcing and Support Services within the UK

User Tools

Site Tools


rfc:rfc1456

Network Working Group Vietnamese Standardization Working Group Request for Comments: 1456 May 1993

          Conventions for Encoding the Vietnamese Language
    VISCII: VIetnamese Standard Code for Information Interchange
           VIQR: VIetnamese Quoted-Readable Specification
                            Revision 1.1

Status of this Memo

 This memo provides information for the Internet community.  It does
 not specify an Internet standard.  Distribution of this memo is
 unlimited.

Abstract

 This document provides information to the Internet community on the
 currently used conventions for encoding Vietnamese characters into
 7-bit US ASCII and in an 8-bit form.  These conventions are widely
 used by the overseas Vietnamese who are on the Internet and are
 active in USENET.  This document only provides information and
 specifies no level of standard.

1. Introduction

 In this paper we describe two conventions for representing Vietnamese
 characters.  VISCII (pronounced "visky") is an 8-bit character
 encoding that is similar to that used with ISO-8859.  VIQR
 (pronounced "vicker") is a mnemonic encoding of Vietnamese characters
 into US ASCII for use on 7-bit systems.  There is substantial
 existing online freely distributable software that implements these
 conventions for UNIX and personal computers.  These encodings enable
 Vietnamese-language users to take full advantage of powerful tools
 already developed for the English-speaking world, eliminating
 unnecessary reinvention.  This paper describes these conventions in
 part so that MIME-compliant software might also support the
 Vietnamese language.
 NOTE: The accented Vietnamese letters are herein represented by their
 VIQR equivalents, offset by enclosing angle brackets.  For example,
 the single letter "a acute" is written as <a'>, where the apostrophe
 is the mnemonic symbol for the acute.

2. LINGUISTIC OVERVIEW

 As a romanized language, Vietnamese appears to lend itself readily to
 integration into existing English-based systems.  To cite a simple

Vietnamese Standardization Working Group [Page 1] RFC 1456 Conventions for Encoding Vietnamese May 1993

 example, consider implementing support for French in such systems.
 One can allocate code positions in the 8-bit space necessary for
 accented letters such as <e^> or <e'>, then provide a means for users
 to access these codes through the keyboard.  The required number of
 "extra" code positions is small (see, e.g., ISO-8859/Latin-1 [1]),
 and the relatively low frequency of occurrence of accented letters
 does not place heavy demand on efficient keyboard input schemes.  The
 same things cannot be said for Vietnamese, where both the number and
 occurrence frequency of accented letters are large.  Apart from the
 alphabetics already available in ASCII, Vietnamese requires an
 additional 134 combinations of a letter and diacritical symbols.
 Note that one can resort to a composite encoding scheme to reduce
 this requirement, but that would mean giving up on integration into
 today's computing platforms which for the most part do not support
 such schemes.  In addition, the heavy use of diacritical marks in
 Vietnamese text calls for a keyboard input scheme that does not
 require extra keystrokes such as a special "compose" key to generate
 accented letters.  Because of the large number of possible
 combinations, the scheme should also be easily learned and memorized.
 Finally, to integrate Vietnamese into current electronic mail systems
 which are still limited to 7 bits, there should be a representation
 for Vietnamese text that is readily readable in its 7-bit form.
 The Viet-Std group, an electronic standardization roundtable, has
 worked over the past few years to draft proposals addressing these
 issues.  This has culminated in the conventions to be described
 briefly in the next two sections.  The detailed technical
 considerations have been reported elsewhere [2].  In this memo we
 give a brief outline of the working standards and describe supporting
 software availability.

3. SPECIFICATION OF VISCII

 VISCII stands for VIetnamese Standard Code for Information
 Interchange, an 8-bit encoding specification.  Its salient features
 are:
  1.  Encoding of all Vietnamese letters as single units
      rather than separating base vowels and diacritical
      marks.
  2.  Retention of the complete ASCII graphics repertoire
      in order to facilitate integration.
  3.  Encoding the 6 least-often-used upper-case letters into
      6 least problematic C0 (control) characters.

Vietnamese Standardization Working Group [Page 2] RFC 1456 Conventions for Encoding Vietnamese May 1993

  4.  Character placement have been designed with
      consideration for Unix/X integration, ISO-8859/Latin-1
      compatibility, coexistence with a wide array of
      existing software, including provisions for single-
      and double-line drawing characters in the IBM graphic
      character set.
 The 8-bit VISCII encoding is shown below.  Because of the limitations
 of the 7-bit US ASCII character set, here we use the mnemonic form to
 represent Vietnamese glyphs.  See the VIQR specification below for
 clarification of how diacritical marks are applied.  The online
 PostScript version of reference [2] may also be useful as it does
 display each character correctly.
             Table 1.  VISCII 8-bit Encoding Table (v1.1)

*=======================================================================*

0x 1x 2x 3x 4x 5x 6x 7x 8x 9x Ax Bx Cx Dx Ex Fx
======================================================================
x0 nul dle sp 0 @ P ` p A. O` O~ o` A` DD a` dd
x1 soh dc1 ! 1 A Q a q A(' O? a(' o? A' u+' a' u+.
x2 A(? dc2 " 2 B R b r A(` O~ a(` o~ A O` a o`
x3 etx dc3 # 3 C S c s A(. O. a(. O+~ A~ O' a~ o'
x4 eot Y? $ 4 D T d t A' O+. a' O+ A? O a? o
x5 A(~ nak % 5 E U e u A` O+' a` o. A( a. a( o~
x6 A~ syn & 6 F V f v A? O+` a? o+` a(? y? u+~ o?
x7 bel etb ' 7 G W g w A. O+? a. o+? a(~ u+` a~ o.
x8 bs can ( 8 H X h x E~ I. e~ i. E` u+? e` u.
x9 ht Y~ ) 9 I Y i y E. O? e. U+. E' U` e' u`
xA lf sub * : J Z j z E' O. e' U+' E U' e u'
xB vt esc + ; K [ k { E` I? e` U+` E? y~ e? u~
xC ff fs , < L \ l E? U? e? U+? I` y. i` u?
xD cr gs - = M ] m } E~ U~ e~ o+ I' Y' i' y'
xE so Y. . > N n ~ E. U. e. o+' I~ o+~ i~ o+.
xF si us / ? O _ o DEL O' Y` o' U+ y` u+ i? U+~

*=======================================================================*

4. SPECIFICATION OF VIQR MNEMONICS

 VIQR, VIetnamese Quoted-Readable specification, is not an encoding
 convention but is rather a convention for typing, reading, and
 transferring Vietnamese data using only the 7-bit ASCII character
 set.  With VIQR, accented Vietnamese letters are represented by the
 vowel followed by ASCII characters whose appearances resemble those
 of the corresponding Vietnamese diacritical marks.  For example, the
 phrase "N<u+><o+'>c Vi<e^.>t Nam" is represented in 7-bits by
 "Nu+o+'c Vie^.t Nam".  The complete list of diacritical mark
 equivalents is given in Table 2.  There is also provision in the VIQR
 specification to prevent undesirable composition, for example, to

Vietnamese Standardization Working Group [Page 3] RFC 1456 Conventions for Encoding Vietnamese May 1993

 avoid getting "How are you?" composed into "How are yo<u?>".  For
 details, please see [2].  VIQR therefore serves the following
 purposes:
1.  It provides for a mnemonic, readable representation of
    Vietnamese in 7-bit form, which makes it easy to
    transfer Vietnamese electronic mail without special
    conversion.  The originator and recipient can
    communicate in Vietnamese without the need for an
    8-bit environment at any point in the data chain.
2.  It provides a bridge for translation between 7- and 8-bit
    environments.  In this context, typing in both 7-bit
    and 8-bit systems requires exactly the same keystrokes,
    the only difference is that the 8-bit user gets to see
    actual Vietnamese on-screen, whereas the 7-bit user
    sees a mnemonic representation thereof.  The same
    options are available for the 7-bit and 8-bit recipients
    of Vietnamese text.
 Because of its mnemonic nature, the VIQR typing method is easy to
 learn and remember.  In pure 8-bit environments, special-purpose
 software developers may wish to devise more efficient input schemes,
 but the intent is for all Vietnamese keyboard software to support the
 basic VIQR method to minimize learning time for Vietnamese who will
 already be familiar with the mnemonic method described here.
           Table 2.  VIQR Mnemonics for Vietnamese Diacritics
        *=====================================================*
        | Diacritic   | Char |  ASCII Code        | D<a^'>u   |
        |=====================================================|
        | breve       |  (   |  0x28, left paren  | tr<a(>ng  |
        | circumflex  |  ^   |  0x5E, caret       | m<u~>     |
        | horn        |  +   |  0x2B, plus sign   | m<o'>c    |
        |-------------+------+--------------------+-----------|
        | acute       |  '   |  0x27, apostrophe  | s<a('>c   |
        | grave       |  `   |  0x60, backquote   | huy<e^`>n |
        | hook above  |  ?   |  0x3F, question    | h<o?>i    |
        | tilde       |  ~   |  0x7E, tilde       | ng<a~>    |
        | dot below   |  .   |  0x2E, period      | n<a(.>ng  |
        |-------------+------+--------------------+-----------|
        | d bar       |  dd  |  (repeated d)      | <dd>      |
        | D bar       |  DD  |  (repeated D)      | <DD>      |
        *=====================================================*

Vietnamese Standardization Working Group [Page 4] RFC 1456 Conventions for Encoding Vietnamese May 1993

5. SUPPORTING SOFTWARE

 VISCII & VIQR have been successfully implemented on various
 platforms.  The work has been carried out primarily by the TriChlor
 software group, a non-profit spin-off from Viet-Std.  Software by
 other individuals and groups have also been developed.  In addition,
 commercial software entities have indicated that they would support
 the standards in the form of VISCII-compliant keyboards and fonts.
 The current software selection from the TriChlor group enables users
 to use Vietnamese on existing Unix, MS-DOS, and Windows systems,
 including such operations as Vietnamese file naming, Vietnamese
 keyboarding within any application, electronic mail and news filters
 for Unix, printing to various printer languages, incorporating
 Vietnamese in such document preparation systems as TeX, Word for
 Windows, WordPerfect, using Vietnamese in databases (e.g., Paradox)
 and spreadsheets (e.g., SC on Unix or Excel in Windows).
 Vietnamese-specific applications are also available and include a
 large song lyric database, several poetry collections in hypertext
 format, a Windows-based fortune teller, a text-based multiple-choice
 test program in Vietnamese, etc.  In short, software exists that
 supports thorough integration of Vietnamese into existing platforms,
 allowing Vietnamese users to take advantage of all the powerful tools
 already available in English-only environments.
 Translation between 8-bit VISCII 1.1 and other character sets,
 particularly ISO-10646/Unicode 1.1, has been included in the Plan 9
 operating systems' tcs utility that has been made available by Andrew
 Hume of AT&T Bell Laboratories.

6. MIME CONSIDERATIONS

 For use with MIME-compliant software, the value "VISCII" has been
 registered as a charset with the Internet Assigned Numbers Authority
 for the VISCII encoding convention described above, and the value
 "VIQR" has been registered with the Internet Assigned Numbers
 Authority as a charset for the VIQR mnemonic encoding convention
 described above.  Implementation of support for these two MIME
 character set types is not mandatory to comply with RFC-1341.  If the
 encoding conventions described above are used in MIME email or news,
 the appropriate MIME character set type value should be used to label
 the body-part containing such text.

7. SECURITY CONSIDERATIONS

 Security issues are not discussed in this memo.

Vietnamese Standardization Working Group [Page 5] RFC 1456 Conventions for Encoding Vietnamese May 1993

REFERENCES

   [1] International Organization for Standardization. ISO 8859/x: 8-
       bit International Code Sets.  ISO, 1977.
   [2] Viet-Std, "A Unified Framework for Vietnamese Information
       Processing-v1.1," published on the Internet, available for FTP
       from Sonygate.Sony.COM:tin/viet-std, September 1992.

Vietnamese Standardization Working Group [Page 6] RFC 1456 Conventions for Encoding Vietnamese May 1993

AUTHORS' ADDRESSES

 Cuong T. Nguyen
 Center for Integrated Systems
 CIS 062--MC 4070
 Stanford, CA 94305-4070
 Phone: (415) 725-3721
 Email: cuong@haydn.Stanford.EDU
 Hoc D. Ngo
 Vista Research, Inc.
 100 View St, Suite 200
 P.O. Box 998
 Mountain View, CA 94042
 Phone: (415) 966-1171
 Email:  uunet!vri280!hoc
 Cuong M. Bui
 National Semiconductor Corp.
 3388 Burgundy Dr.
 San Jose, CA 95132
 Phone: (408) 721-6873
 Email: bui@berlioz.nsc.com
 Thanh van Nguyen
 Roche Image Analysis Systems
 95 First Str Suite 110
 Los Altos, CA 94022
 Phone: 415-917-2022
 Fax:   415-917-2025
 Email: thanh@rias.com
 For more information, please contact the authors at:
 viet-std@haydn.stanford.edu

Vietnamese Standardization Working Group [Page 7]

/data/webs/external/dokuwiki/data/pages/rfc/rfc1456.txt · Last modified: 1993/05/06 20:10 (external edit)