.ASIA ZH IDN Language Table - Updated v1.1

.ASIA ZH IDN Language Table

Version: 1.1
Effective Date: 2011-05-04

Summary of IDN Policy Profile:
IDN LangaugeTag: ZH
IDN Language Description: Chinese
Minimum Length: A-Label: 3; U-Label: 1
Maximum Length: A-Label: 63
Valid Characters: 19,720
Additional Contextual Rules: A Domain Name Applied For must include at least one non-LDH character
IDN Variants: IDN Package includes Primary Domain and Preferred Variant(s) to be activated

Registry: DotAsia Organisation
Contact: DotAsia Admin Contact <admin@iana.whois.asia>
Contact: DotAsia Tech Contact <tech@iana.whois.asia>
Address: 15/F, 6 Knutsford Terrace, Tsim Sha Tsui, Hong Kong
TEL: +852.35202635
FAX: +852.35202634
Website: http://www.registry.asia

Relevant Policy Document URLs:
- .ASIA IDN Policies for CJK (Chinese, Japanese & Korean): http://dot.asia/policies/DotAsia-CJK-IDN-Policies-COMPLETE--2011-05-04.pdf
- .ASIA IDN Sunrise Policies: http://dot.asia/policies/DotAsia-IDN-Sunrise-Policies-COMPLETE--2011-03-11.pdf


This language table is developed for the implementation of Chinese IDN registrations at the .ASIA gTLD in coordination with the CDNC (Chinese Domain Name Consortium), based on the CDNC IDN Language tables from CNNIC (zh-CN: http://www.iana.org/domains/idn-tables/tables/cn_zh-cn_4.0.html) and TWNIC (zh-TW: http://www.iana.org/domains/idn-tables/tables/tw_zh-tw_4.0.1.html) respectively.

DotAsia understands that the importance of maintaining the integrity of Chinese IDN registrations through appropriate implementation of character variants for Simplified Chinese (SC) and Traditional Chinese (TC). Unlike in the case of ccTLDs (e.g. for .CN or .TW) where one form of the Chinese characters is predominantly used, .ASIA, as a gTLD is inherently global and requires the consideration of an environment where some of the users will be user SC (e.g. in Mainland China, Singapore, etc.), while others may be using TC (e.g. in Hong Kong, Macau, Taipei, etc.). Therefore, the DotAsia Chinese IDN Language Table implementation combines the 2 zh-CN and zh-TW tables into an integrated table for "ZH". More specifically, the following table includes 2 Preferred Variant Columns: 1. The zh-CN Preferred Variant column (Preferred SC); and 2. The zh-TW Preferred Variant column (Preferred TC).
Furthermore, in this version, before the coordinated updates from CDNC for zh-CN and zh-TW are implemented, some additional Hong Kong characters have been included into the table with special remarks. Certain Chinese characters which are commonly used in Hong Kong (and Cantonese speaking communities) were not included in the previous versions of the CDNC tables. This table took into consideration the Hong Kong Supplementary Character Set (HKSCS: http://www.ogcio.gov.hk/ccli/eng/hkscs/introduction.html) maintained by the Office of the Government Chief Information Officer (OGCIO) of the Hong Kong Special Administrative Region, which has been adopted into Unicode, along with the International Ideographs Core (IICORE: http://www.ogcio.gov.hk/ccli/eng/structure/iicore.html), which specifies a more commonly used subset of Chinese characters for day-to-day use, to add to the CDNC tables. Furthermore, considerations of character variants were studied based on Easily Confused Chinese Characters table (http://www.ogcio.gov.hk/ccli/unicode/structure/download/Easily_Confused_Chinese_Characters.pdf), accompanied by expert advice from language experts on the subject.

The following table is based on the RFC3743-defined format (except that, as explained above, 2 Preferred Variant columns are utilized), and is in compliance with CDNC policies, the ICANN Guidelines for IDN registration and for publication in the IANA Repository of IDN Practices.

Changes from Version 1.0 to 1.1:

Updated 6 clerical errors:
U+56ED;U+56ED;U+5712;U+8598 --updated-to--> U+56ED;U+56ED;U+5712;U+8597
U+7AC8;U+7076;U+7076;U+7AC4 --updated-to--> U+7AC8;U+7076;U+7076;U+7AC3
U+8544;U+840C;U+8544;U+8421 --updated-to--> U+8544;U+840C;U+8544;U+8420
U+8F6C;U+8F6C;U+8F49;U+8EE3 --updated-to--> U+8F6C;U+8F6C;U+8F49;U+8EE2
U+9A7F; +9A7F;U+9A5B;U+99C6 --updated-to--> U+9A7F; +9A7F;U+9A5B;U+99C5
U+9F84;U+9F84;U+9F61;U+9F63 --updated-to--> U+9F84;U+9F84;U+9F61;U+9F62


Our special thanks to the language experts from Hong Kong, Professors Lu Qin and K H Cheung, as well as the team at OGCIO for their work to make the addition of the HKSCS characters to the CDNC tables possible.


Please refer to the references cited respectively for the zh-CN and zh-TW tables for characters included in the CDNC table version 4. The following are 3 additional reference notes for the added HKSCS characters:

Reference 1 (HK) -- A total of 160 new entries were added to the CDNC table. These characters were added to the CDNC table based on the HKSCS and IICORE standards. 103 entries did not include further variants.

Reference 2 (HK:CV) -- 57 new entries added involved additional character variants.

Reference 3 (HK:U+xxxx) -- These characters are amended to existing CDNC entries as a result of character variant set intersection between newly added characters with variants (HK:CV) and existing characters. A total of 97 existing entries were affected.

Reference 4 (HKIRC:U+xxxx) -- 4 pairs of characters (total of 8 entries) were identified by HKIRC to be character variants, and have not been implemented into the CDNC tables yet.

The syntax of the following table is as follows:

VariantEntry = ValidCodePoint ";"
PreferredVariant(SC) ";" PreferredVariant(SC) ";"
CharacterVariant [ Comment ]
ValidCodePoint = CodePoint
RefList = RefNo 0*( "," RefNo )
PreferredVariant = CodePointSet 0*( "," CodePointSet )
CharacterVariant = CodePointSet 0*( "," CodePointSet )
CodePointSet = CodePoint 0*( SP CodePoint )
CodePoint = 4*8DIGIT [ "(" Reflist ")" ]
Comment = "#" *VCHAR

Full Table: [ PDF ] (10MB)

