ABAP Keyword Documentation → ABAP − Reference → Processing External Data → ABAP File Interface → Statements for the ABAP File Interface → OPEN DATASET → OPEN DATASET - mode

OPEN DATASET - encoding

Quick Reference

Other versions: 7.31 | 7.40 | 7.54

Syntax


... ENCODING { DEFAULT 
       
     | {UTF-8 [SKIPPING|WITH BYTE-ORDER MARK]} 

             | NON-UNICODE } ...

Alternatives

1. ... DEFAULT

2. ... UTF-8 [SKIPPING|WITH BYTE-ORDER MARK]

3. ... NON-UNICODE

Effect

The additions after the mandatory addition ENCODING define the character representation in which the content of a text file is handled.

Programming Guideline

Write text files in UTF-8 and with a byte order mark.

Note

It is best to always write files in UTF-8 (if all readers can process this format). Otherwise, the code page can depend on the text environment and it is difficult to identify the code page from the file content.

Alternative 1

... DEFAULT

Effect

If specified, DEFAULT is the same as specifying UTF-8.

Alternative 2

... UTF-8 [SKIPPING|WITH BYTE-ORDER MARK]

Addition

... SKIPPING|WITH BYTE-ORDER MARK

Effect

The characters in the file are handled in accordance with the Unicode character representation UTF-8.

Notes

The class CL_ABAP_FILE_UTILITIES contains the method CHECK_UTF8 for determining whether a file is a UTF-8 file.
A UTF-16 file can only be opened as a binary file.

Example

Opens a text file as a UTF-8 file and writes a string containing German umlaut characters to the file. The file is read to a byte string and this byte string is converted from UTF-8 to a character string. This is done using an object created by the class CL_ABAP_CONV_CODEPAGE and the method CONVERT of the interface IF_ABAP_CONV_IN.

DATA(dset) = 'test.dat'. 

OPEN DATASET dset FOR OUTPUT IN TEXT MODE ENCODING UTF-8. 
TRANSFER 'ÄäÖöÜü' TO dset. 
CLOSE DATASET dset. 

DATA xstr TYPE xstring. 
OPEN DATASET dset FOR INPUT IN BINARY MODE. 
READ DATASET dset INTO xstr. 
CLOSE DATASET dset. 

cl_demo_output=>display( 
  cl_abap_conv_codepage=>create_in( )->convert( xstr ) ). 

DELETE DATASET dset.

Addition

... SKIPPING|WITH BYTE-ORDER MARK

Effect

This addition defines how the byte order mark (BOM), with which a file encoded in the UTF-8 format can begin, is handled. The BOM is a sequence of three bytes that indicates that a file is encoded in UTF-8.

SKIPPING BYTE-ORDER MARK
is only permitted if the file is opened for reads or writes using FOR INPUT or FOR UPDATE. If there is a BOM at the start of the file, this is ignored and the file pointer is set after it. Without the addition, the BOM is handled as normal file content.
WITH BYTE-ORDER MARK
is only permitted if the file is opened for writing using FOR OUTPUT. When the file is opened, a BOM is inserted at the start of the file. Without the addition, no BOM is inserted.

The addition BYTE-ORDER MARK cannot be used together with the AT POSITION addition.

Notes

When opening UTF-8 files for reading, it is best to always enter the addition SKIPPING BYTE-ORDER MARK to prevent a BOM from being handled as file content.
It is recommended that a file for reading is always opened as a UTF-8 file using the addition WITH BYTE-ORDER MARK (as long as all readers can process this format).
The method CREATE_UTF8_FILE_WITH_BOM in the system class CL_ABAP_FILE_UTILITIES can be used to create a file with BOM.

Example

The binary content of the text file opened using WITH BYTE-ORDER MARK is EFBBBF616263. EFBBBF is specified as a BOM at the start of the file. This is followed by the UTF-8 representation 616263 of the actual characters abc.

DATA(dset) = 'test.dat'. 

OPEN DATASET dset FOR OUTPUT IN TEXT MODE 
                 ENCODING UTF-8 WITH BYTE-ORDER MARK. 
TRANSFER 'aaa' TO dset NO END OF LINE. 
CLOSE DATASET dset. 

DATA xstr TYPE xstring. 
OPEN DATASET dset FOR INPUT IN BINARY MODE. 
READ DATASET dset INTO xstr. 
CLOSE DATASET dset. 

cl_demo_output=>display( xstr ). 

DELETE DATASET dset.

Alternative 3

... NON-UNICODE

Effect

The characters of the file are handled in accordance with the non-Unicode code page that would be assigned when reading or writing data in a non-Unicode system (as specified by the entry in the database table TCP0C in the current text environment).

Example

Writes German umlaut characters to a non-Unicode code page. This code page is then extracted from the database table TCP0C and used to open the file as a a legacy text file.

DATA(dset) = 'test.dat'. 

OPEN DATASET dset FOR OUTPUT IN TEXT MODE ENCODING NON-UNICODE. 
TRY. 
    TRANSFER 'ÄäÖöÜü' TO dset. 
  CATCH cx_sy_conversion_codepage INTO DATA(exc). 
    cl_demo_output=>display( 'Error writing to non-unicode codepage' ). 
    RETURN. 
ENDTRY. 
CLOSE DATASET dset. 

SELECT SINGLE charco 
       FROM tcp0c 
       WHERE platform = @sy-opsys AND 
             langu   = @sy-langu AND 
             country  = ' ' 
       INTO @DATA(cp). 

DATA text TYPE string. 
OPEN DATASET dset FOR INPUT IN LEGACY TEXT MODE CODE PAGE cp. 
READ DATASET dset INTO text. 
CLOSE DATASET dset. 

cl_demo_output=>display( text ). 

DELETE DATASET dset.