KOI8-U
Language(s) | Ukrainian, Russian, Bulgarian |
---|---|
Classification | 8-bit KOI, extended ASCII |
Extends | KOI8-B |
Based on | KOI8-R |
Other related encoding(s) | KOI8-RU, KOI8-F |
KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.
KOI8-RU is closely related, but adds Ў for Belarusian. In both, the letter allocations match those in KOI8-E, except for Ґ which is added to KOI8-F.
In Microsoft Windows, KOI8-U is assigned the code page number 21866. In IBM, KOI8-U is assigned code page/CCSID 1168.[1][2][3]
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on.[citation needed] Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.
KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Cyrillic letters are in pseudo-Latin alphabetic order rather than Cyrillic alphabetical order as in ISO 8859-5. This has the useful effect that if the eighth bit is stripped and the text is presented in any character set based on ASCII including the KOI8 sets themselves, the text is still reasonably human readable as a case-reversed transliteration. For instance, the "KOI" acronym "Код Обмена Информацией" becomes kOD oBMENA iNFORMACIEJ.
Character set
[edit]The following table shows the KOI8-U encoding.[1][4] Each character is shown with its equivalent Unicode code point.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ─ 2500 |
│ 2502 |
┌ 250C |
┐ 2510 |
└ 2514 |
┘ 2518 |
├ 251C |
┤ 2524 |
┬ 252C |
┴ 2534 |
┼ 253C |
▀ 2580 |
▄ 2584 |
█ 2588 |
▌ 258C |
▐ 2590 |
9x | ░ 2591 |
▒ 2592 |
▓ 2593 |
⌠ 2320 |
■ 25A0 |
∙ 2219 |
√ 221A |
≈ 2248 |
≤ 2264 |
≥ 2265 |
NBSP | ⌡ 2321 |
° 00B0 |
² 00B2 |
· 00B7 |
÷ 00F7 |
Ax | ═ 2550 |
║ 2551 |
╒ 2552 |
ё 0451 |
є 0454 |
╔ 2554 |
і 0456 |
ї 0457 |
╗ 2557 |
╘ 2558 |
╙ 2559 |
╚ 255A |
╛ 255B |
ґ 0491 |
╝ 255D |
╞ 255E |
Bx | ╟ 255F |
╠ 2560 |
╡ 2561 |
Ё 0401 |
Є 0404 |
╣ 2563 |
І 0406 |
Ї 0407 |
╦ 2566 |
╧ 2567 |
╨ 2568 |
╩ 2569 |
╪ 256A |
Ґ 0490 |
╬ 256C |
© 00A9 |
Cx | ю 044E |
а 0430 |
б 0431 |
ц 0446 |
д 0434 |
е 0435 |
ф 0444 |
г 0433 |
х 0445 |
и 0438 |
й 0439 |
к 043A |
л 043B |
м 043C |
н 043D |
о 043E |
Dx | п 043F |
я 044F |
р 0440 |
с 0441 |
т 0442 |
у 0443 |
ж 0436 |
в 0432 |
ь 044C |
ы 044B |
з 0437 |
ш 0448 |
э 044D |
щ 0449 |
ч 0447 |
ъ 044A |
Ex | Ю 042E |
А 0410 |
Б 0411 |
Ц 0426 |
Д 0414 |
Е 0415 |
Ф 0424 |
Г 0413 |
Х 0425 |
И 0418 |
Й 0419 |
К 041A |
Л 041B |
М 041C |
Н 041D |
О 041E |
Fx | П 041F |
Я 042F |
Р 0420 |
С 0421 |
Т 0422 |
У 0423 |
Ж 0416 |
В 0412 |
Ь 042C |
Ы 042B |
З 0417 |
Ш 0428 |
Э 042D |
Щ 0429 |
Ч 0427 |
Ъ 042A |
Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.
Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).
See also
[edit]References
[edit]- ^ a b "SBCS code page information - CPGID: 01168 / Name: Ukrainian KOI8-U". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017-02-18. Retrieved 2017-02-18. [1] [2]
- ^ "CCSID information document; CCSID 1168; KOI8-U". IBM. Archived from the original on 2017-02-18. Retrieved 2017-02-18.
- ^ International Components for Unicode (ICU), ibm-1168_P100-2002.ucm, 2002-12-03
- ^ Verdy, Philippe; Richter, Helmut (2016-01-04) [2008-10-13]. "KOI8-U.TXT". 2.0. Retrieved 2016-12-09.
Further reading
[edit]- Flohr, Guido (2016) [2006]. "Locale::RecodeData::KOI8_U - Conversion routines for KOI8-U". CPAN libintl-perl. 1.1. Archived from the original on 2017-01-15. Retrieved 2017-01-15.
- RFC 2319
- "KOI8-U (RFC 2319)". Kermit. Columbia University. Retrieved 2020-06-24.
- Leishner, Mark (2008) [1999-12-20]. "KOI8-U Belorussian/Ukrainian Cyrillic to Unicode 2.1 mapping table - Based on RFC 2319". Department of Mathematical Sciences, New Mexico State University. Archived from the original on 2017-02-19. Retrieved 2017-02-19.
- Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). "CYRILLIC ENCODING FAQ Version 1.3". 1.3. Archived from the original on 2017-02-18. Retrieved 2020-06-24.
External links
[edit]- Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
- Hohlov, Yu. E. "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
- Nechayev, Valentin (2013) [2001]. "Review of 8-bit Cyrillic encodings universe". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
- https://web.archive.org/web/20050206230944/http://www.net.ua/KOI8-U/