Skip to content

ABAP Keyword Documentation →  ABAP - Overview 

ABAP Character Set

Application Server ABAP supports only Unicode systems in the current release.

  • A Unicode system is an ABAP based on Unicode character representation with a code page for Unicode and which runs on a an appropriate operating system.

Unicode (ISO/IEC 10646) with the character set UCS covers all existing characters. A variety of Unicode character formats is possible for the Unicode character set, such as UTF (in which a character can occupy between one and four bytes) or UCS-2 (where a character occupies two bytes).

  • The ABAP programming language supports the character representation UCS-2, which fundamentally matches the UTF-16 representation and covers its characters (except the characters in the surrogate area).

A restriction to UCS-2 in ABAP means that a character is always assumed as having a length of two bytes. This generally only produces problems if character strings are truncated in the middle of a character representation from the UTF-16 surrogate area or if individual characters from sets of characters are compared in character string processing.

In a Unicode system, an ABAP program must have the ABAP language version Standard ABAP (Unicode). Programs with the obsolete language version Non-Unicode ABAP can no longer be used in a Unicode system.

Other versions: 7.31 | 7.40 | 7.54


Notes

  • Before Unicode, SAP used various different codes for representing characters in different fonts, such as ASCII, EBCDIC as single-byte code pages, or double-byte code pages:

  • ASCII (American Standard Code for Information Interchange) encodes every character with one byte. This means that a maximum of 256 characters can be displayed (strictly speaking, standard ASCII only encodes one character using 7 bit and can therefore only represent 128 characters. The extension to 8 bit was introduced in ISO-8859). Examples of common code pages are ISO-8859-1 for Western European, or ISO-8859-5 for Cyrillic fonts.

  • EBCDIC (Extended Binary Coded Decimal Interchange) also encodes each character using one byte, and can therefore also represent 256 characters. For example, EBCDIC 0697/0500 is an IBM format that has been used on the AS/400 platform (now known as IBM System i) for Western European fonts.

  • Double-byte code pages require between 1 and 2 bytes per character. This enables 65536 characters to be represented, of which only 10000 to 15000 characters are normally used. For example, the code page SJIS is used for Japanese and BIG5 for traditional Chinese fonts.
Using these character sets, it was possible to handle all languages individually in one AS ABAP. Problems generally occurred when texts from different incompatible character sets were mixed in a central system. The exchange of data between systems with incompatible character sets was also a potential source of problems.

  • In earlier non-Unicode systems, the system code pages were defined in the database table TCPDB. In non-Unicode single code page systems, there was only one system code page. In the obsolete MDMP systems, there were multiple system code pages.

  • Before Unicode support was introduced, many ABAP programmers assumed that one character corresponded to one byte. Therefore, before a non-Unicode system is converted to Unicode, ABAP programs must be changed wherever an explicit or implicit assumption is made about the internal length of a character. This mainly affects the following:
Before a program is switched to Unicode, the ABAP language version Standard ABAP (Unicode) or higher must be configured in the program attributes. For these versions, the Unicode checks run in non-Unicode systems too. The transaction UCCHECK supports the activation of these checks for existing programs. The program RSUNISCAN_FINAL can be used instead of transaction UCCHECK.