GENWiki

Network Working Group U. Choi Request for Comments: 1557 K. Chon Category: Informational KAIST

                                                               H. Park
                                                   Solvit Chosun Media
                                                         December 1993

          Korean Character Encoding for Internet Messages

Status of this Memo

 This memo provides information for the Internet community.  This memo
 does not specify an Internet standard of any kind.  Distribution of
 this memo is unlimited.

Introduction

 This document describes the encoding method being used to represent
 Korean characters in both header and body part of the Internet mail
 messages [RFC822].  This encoding method was specified in 1991, and
 has since then been used.  It has now widely being used in Korean IP
 networks.

 This document also describes the name of the encoding method which is
 to be used in order to match the message header and body format of
 MIME [MIME1, MIME2].

 This document describes only the encoding method for plain text.
 Other text subtypes, rich text and similar forms of text, are beyond
 the scope of this document.

Description

 It is assumed that the starting code of the message is ASCII.  ASCII
 and Korean characters can be distinguished by use of the shift
 function.  For example, the code SO will alert us that the upcoming
 bytes will be a Korean character as defined in KSC 5601.  To return
 to ASCII the SI code is used.

 Therefore, the escape sequence, shift function and character set used
 in a message are as follows:

         SO           KSC 5601
         SI           ASCII
         ESC $ ) C    Appears once in the beginning of a line
                          before any appearance of SO characters.

Choi, Chon & Park [Page 1] RFC 1557 Korean Character Encoding December 1993

 The KSC 5601 [KSC5601] character set that includes Hangul, Hanja
 (Chinese ideographic characters), graphic and foreign characters,
 etc., is two bytes long for each character.

 For more information about Korean character sets please refer to the
 KSC 5601-1987 document.  Also, for more detailed information about
 the escape sequence and the shift function you can look for the ISO
 2022 [ISO2022] document.

Formal Syntax

 Where this document in its formal syntax does not agree with the
 description part, priority should be given to the formal syntax of
 the document.

 The notations used in this section of the document are according to
 those used in STD 11, RFC 822 [RFC822] with the same meaning.

(asterisk) has the following meaning :

l*m "anything"

 The above means that "anything" has to be used at least l times and
 at most m times.  Default values for l and m are 0 and infinitive,
 respectively.

 body            = *e-line *1( designator *( e-line / h-line ))

 designator      = ESC "$" ")" "C"

 e-line          = *text CRLF

 h-line          = *text 1*( segment *text ) CRLF

 segment         = SO 1*(one-of-94 one-of-94 SI

                                             ; ( Octal, Decimal.)

 ESC             = <ISO 2022 ESC, escape>    ; ( 33, 27.)

 SO              = <ASCII SO, shift out>     ; ( 16, 14.)

 SI              = <ASCII SI, shift in>      ; ( 17, 15.)

 SP              = <ASCII SP, space>         ; ( 40, 32.)

Choi, Chon & Park [Page 2] RFC 1557 Korean Character Encoding December 1993

 one-of-94       = <any char in 94-char set> ; (41-176, 33.-126.)

 CHAR            = <any ASCII character>     ; ( 0-177, 0.-127.)

 text            = <any CHAR, including bare CR & bare LF, but NOT
                    including CRLF, and not including ESC, SI, SO>

MIME and RFC 1522 Considerations

 The name to be used for the Hangul encoding scheme in the contents is
 "ISO-2022-KR".  This name when used in MIME message form would be:

              Content-Type: text/plain; charset=iso-2022-kr

 Since the Hangul encoding is done with 7 bit format in nature, the
 Content-Transfer-Encoding-header does not need to be used. However,
 while using the Hangul encoding, current Hangul message softwares
 does not support Base64 or Quoted-Printable encoding applied on
 already encoded Hangul messages.

 The Hangul encoded in the header part of the message is Korean EUC
 [EUC-KR].  In the EUC-KR encoding, the bytes with 8th bit set will be
 recognized as KSC-5601 characters.  To use Hangul in the header part,
 according to the method proposed in RFC 1522, the encoded Hangul are
 "B" or "Q" encoded. When doing so, the name to be used will be EUC-
 KR.

Background Information

 The Hangul encoding system is based on the ISO 2022 [ISO2022]
 environment according to its 4/4 announcement.  However, the Hangul
 encoding does not include the announcement's escape sequence.

 The KSC 5601 used in this document is, in definition, identical to
 the KSC 5601-1987, KSC 5601-1989 and KSC 5601-1992's 94x94 octet
 definition.  Therefore, any revision that refers to KSC-5601 after
 1992 is to be considered as having the same meaning.

 At present, the Hangul encoding system is based on the experience
 acquired from the former widely used "N-Byte Hangul" among UNIX
 users.  Actually, the encoding method, "N-Byte Hangul", using SO and
 SI was the encoding method used in SDN before KSC 5601 was made a
 national standard.

 This code is intended to be used for the information interchange of
 Hangul messages; any other use of the code is not considered
 appropriate.

Choi, Chon & Park [Page 3] RFC 1557 Korean Character Encoding December 1993

References

 [ASCII] American National Standards Institute, "Coded character set
         -- 7-bit American national standard code for information
         interchange", ANSI X3.4-1968

 [ISO2022] International Organization for Standardization (ISO),
           "Information processing -- ISO 7-bit and 8-bit coded
           character sets -- Code extension techniques",
           International Standard, 1986, Ref. No. ISO 2022-1986 (E).

 [KSC5601] Korea Industrial Standards Association, "Code for
           Information Interchange (Hangul and Hanja)," Korean
           Industrial Standard, 1987, Ref. No. KS C 5601-1987.

 [EUC-KR] Korea Industrial Standards Association, "Hangul Unix
          Environment," Korean Industrial Standard, 1992, Ref. No.
          KS C 5861-1992.

 [RFC822] Crocker, D., "Standard for the Format of ARPA Internet
          Text Messages", STD 11, RFC 822, UDEL, August 1982.

 [MIME1] Borenstein, N., and N. Freed, "MIME (Multipurpose
         Internet Mail Extensions): Part One: Mechanisms for
         Specifying and Describing the Format of Internet Message
         Bodies", RFC 1521, Bellcore, Innosoft, September 1993.

 [MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
         Part Two: Message Header Extensions for Non-ASCII Text",
         RFC 1522, University of Tennessee, September 1993.

Security Considerations

 Security issues are not discussed in this memo.

Acknowledgments

 The authors wants to thank all the people who assisted in writing
 this document.  In particular, we thank Erik von der Poel, Felix M.
 Villarreal, Ienup Sung, Kyoung Namgoong, and Kyuho Kim.

Choi, Chon & Park [Page 4] RFC 1557 Korean Character Encoding December 1993

Authors' Addresses

 Uhhyung Choi
 Korea Advanced Institute of Science and Technology
 Department of Computer Science
 Taejon, 305-701, Republic of Korea

 Phone: +82-42-869-8718
 Fax: +82-42-869-3510
 EMail: uhhyung@kaist.ac.kr

 Kilnam Chon
 Korea Advanced Institute of Science and Technology
 Department of Computer Science
 Taejon, 305-701, Republic of Korea

 Phone: +82-42-869-3514
 Fax: +82-42-869-3510
 EMail: chon@cosmos.kaist.ac.kr

 Hyunje Park
 Solvit Chosun Media, Inc.
 748-16 Yeoksam-Dong, Kangnam-Gu
 Seoul, 135-080, Republic of Korea

 Phone: +82-2-561-0361
 Fax: +82-2-569-4847
 EMail: hjpark@dino.media.co.kr

Choi, Chon & Park [Page 5]