'codepage'에 해당되는 글 3건

2009.02.18 Online sample of a CharSet property for conversion texts and files. 1
2009.02.18 문자셋 코드표 검색 사이트
2009.02.18 Code page conversion 2

Online sample of a CharSet property for conversion texts and files.

http://www.motobit.com/util/charset-codepage-conversion.asp

Online sample of a CharSet property for conversion texts and files.

This online sample demonstrates functionality of ByteArray class for conversion between severalCodepages/CharSets. You can convert text or multibyte in any available code page to another code page or Unicode with this script.
The Form.SizeLimit is 1000000bytes. Please, do not post more source data.

Type some text to a textbox bellowCharset of this document and textbox is

or select a file and its charset as a source data:

Select character set of the source file:
or custom charset

Select destination character set:

Output data:
output to a textbox (as a string)
export to a file, filename:

Note: The source file is handled as a text data with specified character set. The textbox is handled as a string data, default character set for the textbox is the same as a charset of this document.

Posted by 삼스

문자셋 코드표 검색 사이트

http://www.i18nguy.com/unicode/codepages.html#msftiso

Character Sets And Code Pages At The Push Of A Button

Code Pages, Character Encodings from Software Vendors and Standards Bodies

Here you can find character set and code page information from software vendors (Microsoft, HP, IBM, Sun, etc.) and international standards organizations (e.g. ISO, ECMA, INCITS, etc.). Push any "button" and you will be taken either to the chart of a code page provided by the vendor, or the vendor's web page of links to code page charts. This gives you fast access to popular code pages, as well as access to more complete lists of code page charts.

Content and Product Globalization

Organization

The links are (mostly) organized by vendor or standard organization. Some code pages are listed redundantly, usually because the code page is being described by different vendors. Sometimes the difference is important. For example, one vendor's view of a code page may be different from another's. Certainly character conversion or mapping tables may be very different. Sometimes a code page has been updated and one vendor is still referring to an earlier version of the code page.

Character Encodings, Transformation Formats, Double-Byte, Multi-byte, UTF...

Note that a "code page" is also known by various other names: codepage, encoding, charset, character set, coded character set, (CCS), graphic character set, character map et al. Some of these have more specific names DBCS (double-byte character set), MBCS (multi-byte character set). Some encodings are the result of transformations, and are known as transformation formats, examples include Unicode UTF-8, UTF-16, UTF-32.

Unicode UTF-16 Surrogate Code Points, or Supplementary Characters

If you are interested in UTF-16 surrogate code points, or supplementary characters, see
Setting up Microsoft Windows NT, 2000 or Windows XP to Support Unicode Supplementary Characters and
Conversion Table: Unicode Surrogates to Scalar Value/UTF-32.

Other Unicode pages on this site that may be of interest include: Cheat Sheet: Unicode-Enabling Microsoft C/C++ Source Code,Hiragana Characters, Hebrew Characters, Benefits of the Unicode Standard, and the Compelling Unicode Demo.

TABLE OF CONTENTS
Unicode Standards Organizations Assorted web pages The Go To Guys Czyborra's Site Great Sites China's GB18030 Hong Kong Supplementary Character Set (HKSCS) Library of Congress MAchine Readable Catalog (MARC)	Microsoft's ISO code pages Microsoft Windows code pages Microsoft double-byte character sets Microsoft DOS code pages	IBM ICU Character Conversion Data IBM's ISO code pages IBM Windows code pages IBM Asian code pages IBM DOS code pages

Push A Button To Get Code Page Information
Assorted Web Pages I18n Guy's Hiragana Unicode Chart Dik Winter's Character Set History Piotr Trzcionkowski's Polish code page site (in Polish) Cyrillic.com Character Sets I18nGuru's Character Sets page VT320, VT102, VT52, Heath-19 DEC Terminals VT100, VT220, VT320 Kostis' Character Sets Kostis' Apple Macintosh Roman Japanese Encoding Differences Koichi Yasuoka's Character Tables	Unicode Charts Unicode Charts Unicode character name index UTF-32 (TR-19) Character Encoding Model (TR-17) Basic Latin Latin-1 Supplement Latin Extended-A Combining Diacritical Marks Greek Cyrillic Hebrew I18n Guy's Hebrew Unicode Chart Arabic Currency Symbols Hangul Jamo Hiragana I18n Guy's Hiragana Unicode Chart Katakana	Standards Organizations ISO INCITS ECMA Standards ISO 6429 = ECMA-48 (pdf) (Control codes) ISO/IEC International register of coded character sets to be used with escape sequences Links to many code page charts! IANA Character Set Registry RFC Index RFC 1555 Hebrew Character Encoding for Internet Messages RFC 1556 Handling of Bi-directional Texts in MIME RFC 1556 defines ISO-8859-6-e, ISO-8859-6-i, ISO-8859-8-e,ISO-8859-8-i Armenian Character Sets ArmSCII Thai TIS 620-2533 (in Thai 620-2533) Annotated reference to the Thai implementations
The Go To Guys Michael Everson's site Ken Lunde's CJK.inf Ken Lunde's Character set server Mark Davis's site	Czyborra's Site www.czyborra.com/charsets is offline. Fortunately, Kevin Atkinson has mirrored it at aspell.net/charsets. These buttons now link to his mirror. Thanks Kevin. Roman Czyborra's site Czyborra's Vendor Codepages Czyborra's Vietnamese page Czyborra's ASCII/ISO 646 page Czyborra's ISO 8859 Alphabet Soup So vat's Unicode? Chicken soup?	Great Sites Frank da Cruz's Character Sets Frank da Cruz's Character Set Tables Korpela's Tutorial on character code issues Korpela's Character and encoding site
GB18030 Web Pages ICU's Markus Scherer on GB18030 Sun on GB18030-2000 Microsoft GB18030 Support Package (in GB2312) (Adobe) Dirk Meyer's Summary of GB18030	Hong Kong Supplementary Character Set (HKSCS) Hong Kong Supplementary Character Set (HKSCS) Hong Kong ITF on ISO 10646	MARC Bibliographic MARC 21 MARC-8 MARC UCS (Unicode) MARC Code Tables
Here are many transcoding tables expressed in XML files using theCharacter Mapping Markup Language (CharMapML, UTR 22). The encoding conversion data is used in the Internationalization Components for Unicode (ICU) open source library. IBM ICU Character Conversion Data IBM Character Data IBM Code pages (Appendix F) IBM Character lists (Appendix I) IBM Sort Sequences (Appendix C)	IBM ISO Code Pages CP 00819 (ISO 8859-1) Latin Alphabet No. 1 CP 00813 (ISO 8859-7) Greece CP 00916 (ISO 8859-8) Hebrew CP 00920 (ISO 8859-9) Turkey	IBM Windows Code pages CP 01250 (Windows) Latin 2 CP 01252 (Windows) Latin 1 CP 01253 (Windows) Greek CP 01254 (Windows) Turkish CP 01255 (Windows) Hebrew CP 01256 (Windows) Arabic CP 01257 (Windows) Baltic Rim
In the following web pages, leadbytes are indicated by light gray background shading. Each of these leadbytes links to a new page showing the 256 character block associated with that leadbyte. Unused leadbytes are identified by a darker gray background. Microsoft Double-Byte Character Sets I18n Guy's Hiragana Unicode Chart Japanese Shift-JIS (CP 932) Conversion Problems CP932 & Unicode Simplified Chinese GBK (CP 936) Korean (CP 949) Traditional Chinese Big5 (CP 950) Hong Kong Character Set (HKSCS)	Microsoft Windows Code Pages Microsoft's Windows code pages Microsoft's Windows code pages by country Windows CP 1250 (Central Europe) Windows CP 1251 (Cyrillic) Windows CP 1252 (Latin I) Windows CP 1253 (Greek) Windows CP 1254 (Turkish) Windows CP 1255 (Hebrew) Windows CP 1256 (Arabic) Windows CP 1257 (Baltic) Windows CP 1258 (Viet Nam) Windows CP 874 (Thai)	Microsoft's ISO Code Page Charts Globalization site: GlobalDev ISO Code Pages at Microsoft's site ISO/IEC 8859-1 (Latin 1) ISO/IEC 8859-2 (Latin 2) ISO/IEC 8859-3 (Latin 3) ISO/IEC 8859-4 (Baltic) ISO/IEC 8859-5 (Cyrillic) ISO/IEC 8859-6 (Arabic) ISO/IEC 8859-7 (Greek) ISO/IEC 8859-8 (Hebrew) ISO/IEC 8859-9 (Turkish) ISO/IEC 8859-15 (Latin 9)
IBM DOS Code pages CP 00437 (IBM PC) USA CP 00850 (IBM PC) Multilingual CP 00851 (IBM PC) Greece CP 00852 Latin-2 PC CP 00855 (IBM PC) Cyrillic CP 00856 (IBM PC) Hebrew CP 00857 (IBM PC) Turkey CP 00860 (IBM PC) Portugal CP 00861 (IBM PC) Iceland CP 00862 (IBM PC) Israel CP 00863 (IBM PC) Canadian French CP 00864 (IBM PC) Arabic CP 00865 (IBM PC) Nordic CP 00866 (IBM PC) Cyrillic #2 CP 00869 (IBM PC) Greece CP 00870 Latin-2 Multilingual CP 00874 (IBM PC) Thai Extended	Microsoft OEM (DOS) Code Pages Microsoft's OEM code pages DOS CP 437 (US) DOS CP 720 (Arabic) DOS CP 737 (Greek) DOS CP 775 (Baltic) DOS CP 850 (Western Europe) DOS CP 852 (Central Europe) DOS CP 855 (Cyrillic) DOS CP 857 (Turkish) DOS CP 862 (Hebrew) DOS CP 866 (Cyrillic II)	IBM Asian Code pages I18n Guy's Hiragana Unicode Chart CP 00290 (EBCDIC) Japanese (Katakana) Non-extended CP 00290 (EBCDIC) Japanese (Katakana) Extended CP 00833 (EBCDIC) Korea Extended CP 00836 (EBCDIC) Simplified Chinese Extended CP 00891 (IBM PC) Korea CP 00895 Japan 7-Bit CP 00897 (IBM PC) Japan PC #1 CP 00903 (IBM PC) People's Republic of China (PRC) CP 00904 (IBM PC) Republic of China (ROC) CP 00905 (EBCDIC) Turkey Extended CP CP 01027 (EBCDIC) Japanese (Latin) Extended CP 01040 (IBM PC) Korean Extended CP 01041 (IBM PC) Japanese Extended CP 01042 (IBM PC) Simplified Chinese Extended CP 01043 (IBM PC) Traditional Chinese CP 01088 (IBM PC) Korean CP 01114 Traditional Chinese (Big5) CP 01115 Simplified Chinese (GB)

Posted by 삼스

Code page conversion

The Unicode conversion filter offers conversions between the following code pages:

For more information on code pages, please see

Code-Page Identifiers

(*) The list of available code pages may be different on your system. You can install additional code pages using Control Panel\Regional Options.

Identifier	Name
037	IBM EBCDIC - U.S./Canada
437	OEM - United States
500	IBM EBCDIC - International
708	Arabic - ASMO 708
709	Arabic - ASMO 449+, BCON V4
710	Arabic - Transparent Arabic
720	Arabic - Transparent ASMO
737	OEM - Greek (formerly 437G)
775	OEM - Baltic
850	OEM - Multilingual Latin I
852	OEM - Latin II
855	OEM - Cyrillic (primarily Russian)
857	OEM - Turkish
858	OEM - Multlingual Latin I + Euro symbol
860	OEM - Portuguese
861	OEM - Icelandic
862	OEM - Hebrew
863	OEM - Canadian-French
864	OEM - Arabic
865	OEM - Nordic
866	OEM - Russian
869	OEM - Modern Greek
870	IBM EBCDIC - Multilingual/ROECE (Latin-2)
874	ANSI/OEM - Thai (same as 28605, ISO 8859-15)
875	IBM EBCDIC - Modern Greek
932	ANSI/OEM - Japanese, Shift-JIS
936	ANSI/OEM - Simplified Chinese (PRC, Singapore)
949	ANSI/OEM - Korean (Unified Hangeul Code) -> EUC-KR
950	ANSI/OEM - Traditional Chinese (Taiwan; Hong Kong SAR, PRC)
1026	IBM EBCDIC - Turkish (Latin-5)
1047	IBM EBCDIC - Latin 1/Open System
1140	IBM EBCDIC - U.S./Canada (037 + Euro symbol)
1141	IBM EBCDIC - Germany (20273 + Euro symbol)
1142	IBM EBCDIC - Denmark/Norway (20277 + Euro symbol)
1143	IBM EBCDIC - Finland/Sweden (20278 + Euro symbol)
1144	IBM EBCDIC - Italy (20280 + Euro symbol)
1145	IBM EBCDIC - Latin America/Spain (20284 + Euro symbol)
1146	IBM EBCDIC - United Kingdom (20285 + Euro symbol)
1147	IBM EBCDIC - France (20297 + Euro symbol)
1148	IBM EBCDIC - International (500 + Euro symbol)
1149	IBM EBCDIC - Icelandic (20871 + Euro symbol)
1200	Unicode UCS-2 Little-Endian (BMP of ISO 10646)
1201	Unicode UCS-2 Big-Endian
1250	ANSI - Central European
1251	ANSI - Cyrillic
1252	ANSI - Latin I
1253	ANSI - Greek
1254	ANSI - Turkish
1255	ANSI - Hebrew
1256	ANSI - Arabic
1257	ANSI - Baltic
1258	ANSI/OEM - Vietnamese
1361	Korean (Johab)
10000	MAC - Roman
10001	MAC - Japanese
10002	MAC - Traditional Chinese (Big5)
10003	MAC - Korean
10004	MAC - Arabic
10005	MAC - Hebrew
10006	MAC - Greek I
10007	MAC - Cyrillic
10008	MAC - Simplified Chinese (GB 2312)
10010	MAC - Romania
10017	MAC - Ukraine
10021	MAC - Thai
10029	MAC - Latin II
10079	MAC - Icelandic
10081	MAC - Turkish
10082	MAC - Croatia
12000	Unicode UCS-4 Little-Endian
12001	Unicode UCS-4 Big-Endian
20000	CNS - Taiwan
20001	TCA - Taiwan
20002	Eten - Taiwan
20003	IBM5550 - Taiwan
20004	TeleText - Taiwan
20005	Wang - Taiwan
20105	IA5 IRV International Alphabet No. 5 (7-bit)
20106	IA5 German (7-bit)
20107	IA5 Swedish (7-bit)
20108	IA5 Norwegian (7-bit)
20127	US-ASCII (7-bit)
20261	T.61
20269	ISO 6937 Non-Spacing Accent
20273	IBM EBCDIC - Germany
20277	IBM EBCDIC - Denmark/Norway
20278	IBM EBCDIC - Finland/Sweden
20280	IBM EBCDIC - Italy
20284	IBM EBCDIC - Latin America/Spain
20285	IBM EBCDIC - United Kingdom
20290	IBM EBCDIC - Japanese Katakana Extended
20297	IBM EBCDIC - France
20420	IBM EBCDIC - Arabic
20423	IBM EBCDIC - Greek
20424	IBM EBCDIC - Hebrew
20833	IBM EBCDIC - Korean Extended
20838	IBM EBCDIC - Thai
20866	Russian - KOI8-R
20871	IBM EBCDIC - Icelandic
20880	IBM EBCDIC - Cyrillic (Russian)
20905	IBM EBCDIC - Turkish
20924	IBM EBCDIC - Latin-1/Open System (1047 + Euro symbol)
20932	JIS X 0208-1990 & 0121-1990
20936	Simplified Chinese (GB2312)
21025	IBM EBCDIC - Cyrillic (Serbian, Bulgarian)
21027	Extended Alpha Lowercase
21866	Ukrainian (KOI8-U)
28591	ISO 8859-1 Latin I
28592	ISO 8859-2 Central Europe
28593	ISO 8859-3 Latin 3
28594	ISO 8859-4 Baltic
28595	ISO 8859-5 Cyrillic
28596	ISO 8859-6 Arabic
28597	ISO 8859-7 Greek
28598	ISO 8859-8 Hebrew
28599	ISO 8859-9 Latin 5
28605	ISO 8859-15 Latin 9
29001	Europa 3
38598	ISO 8859-8 Hebrew
50220	ISO 2022 Japanese with no halfwidth Katakana
50221	ISO 2022 Japanese with halfwidth Katakana
50222	ISO 2022 Japanese JIS X 0201-1989
50225	ISO 2022 Korean
50227	ISO 2022 Simplified Chinese
50229	ISO 2022 Traditional Chinese
50930	Japanese (Katakana) Extended
50931	US/Canada and Japanese
50933	Korean Extended and Korean
50935	Simplified Chinese Extended and Simplified Chinese
50936	Simplified Chinese
50937	US/Canada and Traditional Chinese
50939	Japanese (Latin) Extended and Japanese
51932	EUC - Japanese
51936	EUC - Simplified Chinese
51949	EUC - Korean
51950	EUC - Traditional Chinese
52936	HZ-GB2312 Simplified Chinese
54936	Windows XP: GB18030 Simplified Chinese (4 Byte)
57002	ISCII Devanagari
57003	ISCII Bengali
57004	ISCII Tamil
57005	ISCII Telugu
57006	ISCII Assamese
57007	ISCII Oriya
57008	ISCII Kannada
57009	ISCII Malayalam
57010	ISCII Gujarati
57011	ISCII Punjabi
65000	Unicode UTF-7
65001	Unicode UTF-8

Posted by 삼스

고 투 더 멘토

'codepage'에 해당되는 글 3건