ABAP Keyword Documentation → ABAP − Reference → Processing External Data → ABAP File Interface → Statements for the ABAP File Interface → OPEN DATASET → OPEN DATASET - mode
OPEN DATASET - encoding
Other versions: 7.31 | 7.40 | 7.54
Syntax
... ENCODING { DEFAULT
| {UTF-8 [SKIPPING|WITH BYTE-ORDER MARK]}
| NON-UNICODE } ...
Alternatives
2. ... UTF-8 [SKIPPING|WITH BYTE-ORDER MARK]
3. ... NON-UNICODE
Effect
The additions after the mandatory addition ENCODING
define the character representation in which the content of a text file is handled.
Programming Guideline
Write text files in UTF-8 and with a byte order mark.
Note
It is best to always write files in UTF-8 (if all readers can process this format). Otherwise, the code page can depend on the text environment and it is difficult to identify the code page from the file content.
Alternative 1
... DEFAULT
Effect
If specified, DEFAULT
is the same as specifying UTF-8
.
Alternative 2
... UTF-8 [SKIPPING|WITH BYTE-ORDER MARK]
Addition
... SKIPPING|WITH BYTE-ORDER MARK
Effect
The characters in the file are handled in accordance with the Unicode character representation UTF-8.
Notes
- The class CL_ABAP_FILE_UTILITIES contains the method CHECK_UTF8 for determining whether a file is a UTF-8 file.
-
A UTF-16 file can only be opened as a binary file.
Example
Opens a text file as a UTF-8 file and writes a string containing German umlaut characters to the file.
The file is read to a byte string and this byte string is converted from UTF-8 to a character string.
This is done using an object created by the class CL_ABAP_CONV_CODEPAGE and the method CONVERT of the interface IF_ABAP_CONV_IN.
DATA(dset) = 'test.dat'.
OPEN DATASET dset FOR OUTPUT IN TEXT MODE ENCODING UTF-8.
TRANSFER 'ÄäÖöÜü' TO dset.
CLOSE DATASET dset.
DATA xstr TYPE xstring.
OPEN DATASET dset FOR INPUT IN BINARY MODE.
READ DATASET dset INTO xstr.
CLOSE DATASET dset.
cl_demo_output=>display(
cl_abap_conv_codepage=>create_in( )->convert( xstr ) ).
DELETE DATASET dset.
Addition
... SKIPPING|WITH BYTE-ORDER MARK
Effect
This addition defines how the byte order mark (BOM), with which a file encoded in the UTF-8 format can begin, is handled. The BOM is a sequence of three bytes that indicates that a file is encoded in UTF-8.
-
SKIPPING BYTE-ORDER MARK
is only permitted if the file is opened for reads or writes usingFOR INPUT
orFOR UPDATE
. If there is a BOM at the start of the file, this is ignored and the file pointer is set after it. Without the addition, the BOM is handled as normal file content. -
WITH BYTE-ORDER MARK
is only permitted if the file is opened for writing usingFOR OUTPUT
. When the file is opened, a BOM is inserted at the start of the file. Without the addition, no BOM is inserted.
The addition BYTE-ORDER MARK
cannot be used together with the AT POSITION
addition.
Notes
-
When opening UTF-8 files for reading, it is best to always enter the addition
SKIPPING BYTE-ORDER MARK
to prevent a BOM from being handled as file content. -
It is recommended that a file for reading is always opened as a UTF-8 file using the addition
WITH BYTE-ORDER MARK
(as long as all readers can process this format). -
The method CREATE_UTF8_FILE_WITH_BOM in the system class CL_ABAP_FILE_UTILITIES can be used to create a file with BOM.
Example
The binary content of the text file opened using WITH BYTE-ORDER MARK
is
EFBBBF616263. EFBBBF is specified as a BOM
at the start of the file. This is followed by the UTF-8 representation 616263 of the actual characters abc.
DATA(dset) = 'test.dat'.
OPEN DATASET dset FOR OUTPUT IN TEXT MODE
ENCODING UTF-8 WITH BYTE-ORDER MARK.
TRANSFER 'aaa' TO dset NO END OF LINE.
CLOSE DATASET dset.
DATA xstr TYPE xstring.
OPEN DATASET dset FOR INPUT IN BINARY MODE.
READ DATASET dset INTO xstr.
CLOSE DATASET dset.
cl_demo_output=>display( xstr ).
DELETE DATASET dset.
Alternative 3
... NON-UNICODE
Effect
The characters of the file are handled in accordance with the non-Unicode code page that would be assigned when reading or writing data in a non-Unicode system (as specified by the entry in the database table TCP0C in the current text environment).
Example
Writes German umlaut characters to a non-Unicode code page. This code page is then extracted from the database table TCP0C and used to open the file as a a legacy text file.
DATA(dset) = 'test.dat'.
OPEN DATASET dset FOR OUTPUT IN TEXT MODE ENCODING NON-UNICODE.
TRY.
TRANSFER 'ÄäÖöÜü' TO dset.
CATCH cx_sy_conversion_codepage INTO DATA(exc).
cl_demo_output=>display( 'Error writing to non-unicode codepage' ).
RETURN.
ENDTRY.
CLOSE DATASET dset.
SELECT SINGLE charco
FROM tcp0c
WHERE platform = @sy-opsys AND
langu = @sy-langu AND
country = ' '
INTO @DATA(cp).
DATA text TYPE string.
OPEN DATASET dset FOR INPUT IN LEGACY TEXT MODE CODE PAGE cp.
READ DATASET dset INTO text.
CLOSE DATASET dset.
cl_demo_output=>display( text ).
DELETE DATASET dset.