'codepage'에 해당되는 글 3건
- 2009.02.18 Online sample of a CharSet property for conversion texts and files. 1
- 2009.02.18 문자셋 코드표 검색 사이트
- 2009.02.18 Code page conversion 2
Code Pages, Character Encodings from Software Vendors and Standards Bodies
Here you can find character set and code page information from software vendors (Microsoft, HP, IBM, Sun, etc.) and international standards organizations (e.g. ISO, ECMA, INCITS, etc.). Push any "button" and you will be taken either to the chart of a code page provided by the vendor, or the vendor's web page of links to code page charts. This gives you fast access to popular code pages, as well as access to more complete lists of code page charts.
Organization
The links are (mostly) organized by vendor or standard organization. Some code pages are listed redundantly, usually because the code page is being described by different vendors. Sometimes the difference is important. For example, one vendor's view of a code page may be different from another's. Certainly character conversion or mapping tables may be very different. Sometimes a code page has been updated and one vendor is still referring to an earlier version of the code page.
Character Encodings, Transformation Formats, Double-Byte, Multi-byte, UTF...
Note that a "code page" is also known by various other names: codepage, encoding, charset, character set, coded character set, (CCS), graphic character set, character map et al. Some of these have more specific names DBCS (double-byte character set), MBCS (multi-byte character set). Some encodings are the result of transformations, and are known as transformation formats, examples include Unicode UTF-8, UTF-16, UTF-32.
Unicode UTF-16 Surrogate Code Points, or Supplementary Characters
If you are interested in UTF-16 surrogate code points, or supplementary characters, see
Setting up Microsoft Windows NT, 2000 or Windows XP to Support Unicode Supplementary Characters and
Conversion Table: Unicode Surrogates to Scalar Value/UTF-32.
Other Unicode pages on this site that may be of interest include: Cheat Sheet: Unicode-Enabling Microsoft C/C++ Source Code,Hiragana Characters, Hebrew Characters, Benefits of the Unicode Standard, and the Compelling Unicode Demo.
Assorted Web Pages |
Unicode Charts |
Standards OrganizationsRFC 1556 defines ISO-8859-6-e, ISO-8859-6-i, ISO-8859-8-e,ISO-8859-8-i |
The Go To Guys |
Czyborra's Sitewww.czyborra.com/charsets is offline. Fortunately, Kevin Atkinson has mirrored it at aspell.net/charsets. These buttons now link to his mirror. Thanks Kevin. So vat's Unicode? Chicken soup? | Great Sites |
GB18030 Web Pages |
Hong Kong Supplementary Character Set (HKSCS) |
MARC Bibliographic |
Here are many transcoding tables expressed in XML files using theCharacter Mapping Markup Language (CharMapML, UTR 22). The encoding conversion data is used in the Internationalization Components for Unicode (ICU) open source library.
IBM ICUIBM Character Data |
IBM
|
IBM
|
In the following web pages, leadbytes are indicated by light gray background shading. Each of these leadbytes links to a new page showing the 256 character block associated with that leadbyte. Unused leadbytes are identified by a darker gray background. Microsoft
|
Microsoft Windows
|
Microsoft's
|
IBM DOS Code pages |
Microsoft OEM
|
IBM Asian Code pages |
The Unicode conversion filter offers conversions between the following code pages:
For more information on code pages, please see
- Microsoft Locale-Specific Code Page Information
- Character Sets And Code Pages At The Push Of A Button
Code-Page Identifiers
(*) The list of available code pages may be different on your system. You can install additional code pages using Control Panel\Regional Options.
Identifier | Name |
---|---|
037 | IBM EBCDIC - U.S./Canada |
437 | OEM - United States |
500 | IBM EBCDIC - International |
708 | Arabic - ASMO 708 |
709 | Arabic - ASMO 449+, BCON V4 |
710 | Arabic - Transparent Arabic |
720 | Arabic - Transparent ASMO |
737 | OEM - Greek (formerly 437G) |
775 | OEM - Baltic |
850 | OEM - Multilingual Latin I |
852 | OEM - Latin II |
855 | OEM - Cyrillic (primarily Russian) |
857 | OEM - Turkish |
858 | OEM - Multlingual Latin I + Euro symbol |
860 | OEM - Portuguese |
861 | OEM - Icelandic |
862 | OEM - Hebrew |
863 | OEM - Canadian-French |
864 | OEM - Arabic |
865 | OEM - Nordic |
866 | OEM - Russian |
869 | OEM - Modern Greek |
870 | IBM EBCDIC - Multilingual/ROECE (Latin-2) |
874 | ANSI/OEM - Thai (same as 28605, ISO 8859-15) |
875 | IBM EBCDIC - Modern Greek |
932 | ANSI/OEM - Japanese, Shift-JIS |
936 | ANSI/OEM - Simplified Chinese (PRC, Singapore) |
949 | ANSI/OEM - Korean (Unified Hangeul Code) -> EUC-KR |
950 | ANSI/OEM - Traditional Chinese (Taiwan; Hong Kong SAR, PRC) |
1026 | IBM EBCDIC - Turkish (Latin-5) |
1047 | IBM EBCDIC - Latin 1/Open System |
1140 | IBM EBCDIC - U.S./Canada (037 + Euro symbol) |
1141 | IBM EBCDIC - Germany (20273 + Euro symbol) |
1142 | IBM EBCDIC - Denmark/Norway (20277 + Euro symbol) |
1143 | IBM EBCDIC - Finland/Sweden (20278 + Euro symbol) |
1144 | IBM EBCDIC - Italy (20280 + Euro symbol) |
1145 | IBM EBCDIC - Latin America/Spain (20284 + Euro symbol) |
1146 | IBM EBCDIC - United Kingdom (20285 + Euro symbol) |
1147 | IBM EBCDIC - France (20297 + Euro symbol) |
1148 | IBM EBCDIC - International (500 + Euro symbol) |
1149 | IBM EBCDIC - Icelandic (20871 + Euro symbol) |
1200 | Unicode UCS-2 Little-Endian (BMP of ISO 10646) |
1201 | Unicode UCS-2 Big-Endian |
1250 | ANSI - Central European |
1251 | ANSI - Cyrillic |
1252 | ANSI - Latin I |
1253 | ANSI - Greek |
1254 | ANSI - Turkish |
1255 | ANSI - Hebrew |
1256 | ANSI - Arabic |
1257 | ANSI - Baltic |
1258 | ANSI/OEM - Vietnamese |
1361 | Korean (Johab) |
10000 | MAC - Roman |
10001 | MAC - Japanese |
10002 | MAC - Traditional Chinese (Big5) |
10003 | MAC - Korean |
10004 | MAC - Arabic |
10005 | MAC - Hebrew |
10006 | MAC - Greek I |
10007 | MAC - Cyrillic |
10008 | MAC - Simplified Chinese (GB 2312) |
10010 | MAC - Romania |
10017 | MAC - Ukraine |
10021 | MAC - Thai |
10029 | MAC - Latin II |
10079 | MAC - Icelandic |
10081 | MAC - Turkish |
10082 | MAC - Croatia |
12000 | Unicode UCS-4 Little-Endian |
12001 | Unicode UCS-4 Big-Endian |
20000 | CNS - Taiwan |
20001 | TCA - Taiwan |
20002 | Eten - Taiwan |
20003 | IBM5550 - Taiwan |
20004 | TeleText - Taiwan |
20005 | Wang - Taiwan |
20105 | IA5 IRV International Alphabet No. 5 (7-bit) |
20106 | IA5 German (7-bit) |
20107 | IA5 Swedish (7-bit) |
20108 | IA5 Norwegian (7-bit) |
20127 | US-ASCII (7-bit) |
20261 | T.61 |
20269 | ISO 6937 Non-Spacing Accent |
20273 | IBM EBCDIC - Germany |
20277 | IBM EBCDIC - Denmark/Norway |
20278 | IBM EBCDIC - Finland/Sweden |
20280 | IBM EBCDIC - Italy |
20284 | IBM EBCDIC - Latin America/Spain |
20285 | IBM EBCDIC - United Kingdom |
20290 | IBM EBCDIC - Japanese Katakana Extended |
20297 | IBM EBCDIC - France |
20420 | IBM EBCDIC - Arabic |
20423 | IBM EBCDIC - Greek |
20424 | IBM EBCDIC - Hebrew |
20833 | IBM EBCDIC - Korean Extended |
20838 | IBM EBCDIC - Thai |
20866 | Russian - KOI8-R |
20871 | IBM EBCDIC - Icelandic |
20880 | IBM EBCDIC - Cyrillic (Russian) |
20905 | IBM EBCDIC - Turkish |
20924 | IBM EBCDIC - Latin-1/Open System (1047 + Euro symbol) |
20932 | JIS X 0208-1990 & 0121-1990 |
20936 | Simplified Chinese (GB2312) |
21025 | IBM EBCDIC - Cyrillic (Serbian, Bulgarian) |
21027 | Extended Alpha Lowercase |
21866 | Ukrainian (KOI8-U) |
28591 | ISO 8859-1 Latin I |
28592 | ISO 8859-2 Central Europe |
28593 | ISO 8859-3 Latin 3 |
28594 | ISO 8859-4 Baltic |
28595 | ISO 8859-5 Cyrillic |
28596 | ISO 8859-6 Arabic |
28597 | ISO 8859-7 Greek |
28598 | ISO 8859-8 Hebrew |
28599 | ISO 8859-9 Latin 5 |
28605 | ISO 8859-15 Latin 9 |
29001 | Europa 3 |
38598 | ISO 8859-8 Hebrew |
50220 | ISO 2022 Japanese with no halfwidth Katakana |
50221 | ISO 2022 Japanese with halfwidth Katakana |
50222 | ISO 2022 Japanese JIS X 0201-1989 |
50225 | ISO 2022 Korean |
50227 | ISO 2022 Simplified Chinese |
50229 | ISO 2022 Traditional Chinese |
50930 | Japanese (Katakana) Extended |
50931 | US/Canada and Japanese |
50933 | Korean Extended and Korean |
50935 | Simplified Chinese Extended and Simplified Chinese |
50936 | Simplified Chinese |
50937 | US/Canada and Traditional Chinese |
50939 | Japanese (Latin) Extended and Japanese |
51932 | EUC - Japanese |
51936 | EUC - Simplified Chinese |
51949 | EUC - Korean |
51950 | EUC - Traditional Chinese |
52936 | HZ-GB2312 Simplified Chinese |
54936 | Windows XP: GB18030 Simplified Chinese (4 Byte) |
57002 | ISCII Devanagari |
57003 | ISCII Bengali |
57004 | ISCII Tamil |
57005 | ISCII Telugu |
57006 | ISCII Assamese |
57007 | ISCII Oriya |
57008 | ISCII Kannada |
57009 | ISCII Malayalam |
57010 | ISCII Gujarati |
57011 | ISCII Punjabi |
65000 | Unicode UTF-7 |
65001 | Unicode UTF-8 |