ABAP Character Sets
Application Server ABAP supports both Unicode and non-Unicode systems:
- Unicode systems are ABAP systems based on a Unicode character representation with a code page for Unicode.
- Nicht-Unicode-Systeme sind ABAP-Systeme mit Codepages für Single-Byte-Code und Double-Byte-Code Unicode-Zeichendarstellung basieren und denen ein entsprechendes Betriebssystem samt Datenbank zugrunde liegt.
For use in a Unicode system, a program must fulfill certain prerequisites and be identified as a Unicode program. Unicode programs work in both Unicode systems and non-Unicode systems. This makes non-Unicode programs redundant and obsolete.
Before Unicode, SAP used various different codes for representing characters in different fonts, such as ASCII, EBCDIC as single-byte code pages, or double-byte code pages:
- ASCII (American Standard Code for Information Interchange) encodes every character with one byte. This means that a maximum of 256 characters can be displayed (strictly speaking, standard ASCII only encodes one character using 7 bit and can therefore only represent 128 characters. The extension to 8 bit is introduced with ISO-8859). Examples of common code pages are ISO-8859-1 for Western European, or ISO-8859-5 for Cyrillic fonts.
- EBCDIC (Extended Binary Coded Decimal Interchange) also encodes each character using one byte, and can therefore also represent 256 characters. For example, EBCDIC 0697/0500 is an IBM format that has been used on the AS/400 platform (now known as IBM System i) for Western European fonts.
- Double byte code pages require between 1 and 2 bytes per character. This enables the representation of 65,536 characters, of which only 10,000 to 15,000 characters are normally used. For example, the code page SJIS is used for Japanese and BIG5 for traditional Chinese fonts.
Using these character sets, all languages can be handled individually in one AS ABAP. Difficulties arise if texts from different incompatible character sets are mixed in one central system. The exchange of data between systems with incompatible character sets can also lead to problems.
The solution to this problem is the use of a character set that includes all characters at once. This is realized by Unicode (ISO/IEC 10646) with the character set UCS. A variety of Unicode character formats is possible for the Unicode character set, for example UTF, in which a character can occupy between one and four bytes or UCS-2, where a character occupies two bytes. The system code page of a Unicode system is UTF-16 and the programming language supports the character format UCS-2. This mostly matches the UTF-16 format and includes all its characters except for those from the surrogate area. A restriction to UCS-2 means that a character is always assumed as having a length of two bytes. This generally only produces problems if character strings are truncated in the middle of a character representation or if individual characters from sets of characters are compared in character string processing.
Using Unicode offers the following benefits:
- A single Unicode system can cover all business processes in all countries concerned.
- Data can be transferred between different Unicode systems without loss of information.
- Unicode systems can show more than one language at once on a single user interface.
7.31 | 7.40 | 7.54