Unit synachar

DescriptionusesClasses, Interfaces, Objects and RecordsFunctions and ProceduresTypesConstantsVariables

Description

Charset conversion support

This unit contains a routines for lot of charset conversions.

It using built-in conversion tables or external Iconv library. Iconv is used when needed conversion is known by Iconv library. When Iconv library is not found or Iconv not know requested conversion, then are internal routines used for conversion. (You can disable Iconv support from your program too!)

Internal routines knows all major charsets for Europe or America. For East-Asian charsets you must use Iconv library!

uses

Overview

Functions and Procedures

function CharsetConversion(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar): AnsiString;
function CharsetConversionEx(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar; const TransformTable: array of Word): AnsiString;
function CharsetConversionTrans(Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar; const TransformTable: array of Word; Translit: Boolean): AnsiString;
function GetCurCP: TMimeChar;
function GetCurOEMCP: TMimeChar;
function GetCPFromID(Value: AnsiString): TMimeChar;
function GetIDFromCP(Value: TMimeChar): AnsiString;
function NeedCharsetConversion(const Value: AnsiString): Boolean;
function IdealCharsetCoding(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeSetChar): TMimeChar;
function GetBOM(Value: TMimeChar): AnsiString;
function StringToWide(const Value: AnsiString): WideString;
function WideToString(const Value: WideString): AnsiString;

Types

TMimeChar = (...);
TMimeSetChar = set of TMimeChar;

Constants

IconvOnlyChars: set of TMimeChar = [UTF_16, UTF_16LE, UTF_32, UTF_32LE, C99, JAVA, ISO_8859_16, KOI8_U, KOI8_RU, CP862, CP866, MAC, MACCE, MACICE, MACCRO, MACRO, MACCYR, MACUK, MACGR, MACTU, MACHEB, MACAR, MACTH, ROMAN8, NEXTSTEP, ARMASCII, GEORGIAN_AC, GEORGIAN_PS, KOI8_T, MULELAO, CP1133, TIS620, CP874, VISCII, TCVN, ISO_IR_14, JIS_X0201, JIS_X0208, JIS_X0212, GB1988_80, GB2312_80, ISO_IR_165, ISO_IR_149, EUC_JP, SHIFT_JIS, CP932, ISO_2022_JP, ISO_2022_JP1, ISO_2022_JP2, GB2312, CP936, GB18030, ISO_2022_CN, ISO_2022_CNE, HZ, EUC_TW, BIG5, CP950, BIG5_HKSCS, EUC_KR, CP949, CP1361, ISO_2022_KR, CP737, CP775, CP853, CP855, CP857, CP858, CP860, CP861, CP863, CP864, CP865, CP869, CP1125];
NoIconvChars: set of TMimeChar = [CP895, UTF_7mod];
Replace_None: array[0..0] of Word = (0);
Replace_Czech: array[0..59] of Word = ( $00E1, $0061, $010D, $0063, $010F, $0064, $010E, $0044, $00E9, $0065, $011B, $0065, $00ED, $0069, $0148, $006E, $00F3, $006F, $0159, $0072, $0161, $0073, $0165, $0074, $00FA, $0075, $016F, $0075, $00FD, $0079, $017E, $007A, $00C1, $0041, $010C, $0043, $00C9, $0045, $011A, $0045, $00CD, $0049, $0147, $004E, $00D3, $004F, $0158, $0052, $0160, $0053, $0164, $0054, $00DA, $0055, $016E, $0055, $00DD, $0059, $017D, $005A );

Variables

DisableIconv: Boolean = False;
IdealCharsets: TMimeSetChar = [ISO_8859_1, ISO_8859_2, ISO_8859_3, ISO_8859_4, ISO_8859_5, ISO_8859_6, ISO_8859_7, ISO_8859_8, ISO_8859_9, ISO_8859_10, KOI8_R, KOI8_U , GB2312, EUC_KR, ISO_2022_JP, EUC_TW ];

Description

Functions and Procedures

function CharsetConversion(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar): AnsiString;

Convert Value from one charset to another. See: CharsetConversionEx

function CharsetConversionEx(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar; const TransformTable: array of Word): AnsiString;

Convert Value from one charset to another with additional character conversion. see: Replace_None and Replace_Czech

function CharsetConversionTrans(Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar; const TransformTable: array of Word; Translit: Boolean): AnsiString;

Convert Value from one charset to another with additional character conversion. This funtion is similar to CharsetConversionEx, but you can disable transliteration of unconvertible characters.

function GetCurCP: TMimeChar;

Returns charset used by operating system.

function GetCurOEMCP: TMimeChar;

Returns charset used by operating system as OEM charset. (in Windows DOS box, for example)

function GetCPFromID(Value: AnsiString): TMimeChar;

Converting string with charset name to TMimeChar.

function GetIDFromCP(Value: TMimeChar): AnsiString;

Converting TMimeChar to string with name of charset.

function NeedCharsetConversion(const Value: AnsiString): Boolean;

return True when value need to be converted. (It is not 7-bit ASCII)

function IdealCharsetCoding(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeSetChar): TMimeChar;

Finding best target charset from set of TMimeChars with minimal count of unconvertible characters.

function GetBOM(Value: TMimeChar): AnsiString;

Return BOM (Byte Order Mark) for given unicode charset.

function StringToWide(const Value: AnsiString): WideString;

Convert binary string with unicode content to WideString.

function WideToString(const Value: WideString): AnsiString;

Convert WideString to binary string with unicode content.

Types

TMimeChar = (...);

Type with all supported charsets.

TMimeSetChar = set of TMimeChar;

Set of any charsets.

Constants

IconvOnlyChars: set of TMimeChar = [UTF_16, UTF_16LE, UTF_32, UTF_32LE, C99, JAVA, ISO_8859_16, KOI8_U, KOI8_RU, CP862, CP866, MAC, MACCE, MACICE, MACCRO, MACRO, MACCYR, MACUK, MACGR, MACTU, MACHEB, MACAR, MACTH, ROMAN8, NEXTSTEP, ARMASCII, GEORGIAN_AC, GEORGIAN_PS, KOI8_T, MULELAO, CP1133, TIS620, CP874, VISCII, TCVN, ISO_IR_14, JIS_X0201, JIS_X0208, JIS_X0212, GB1988_80, GB2312_80, ISO_IR_165, ISO_IR_149, EUC_JP, SHIFT_JIS, CP932, ISO_2022_JP, ISO_2022_JP1, ISO_2022_JP2, GB2312, CP936, GB18030, ISO_2022_CN, ISO_2022_CNE, HZ, EUC_TW, BIG5, CP950, BIG5_HKSCS, EUC_KR, CP949, CP1361, ISO_2022_KR, CP737, CP775, CP853, CP855, CP857, CP858, CP860, CP861, CP863, CP864, CP865, CP869, CP1125];

Set of charsets supported by Iconv library only.

NoIconvChars: set of TMimeChar = [CP895, UTF_7mod];

Set of charsets supported by internal routines only.

Replace_None: array[0..0] of Word = (0);

null character replace table. (Usable for disable charater replacing.)

Replace_Czech: array[0..59] of Word = ( $00E1, $0061, $010D, $0063, $010F, $0064, $010E, $0044, $00E9, $0065, $011B, $0065, $00ED, $0069, $0148, $006E, $00F3, $006F, $0159, $0072, $0161, $0073, $0165, $0074, $00FA, $0075, $016F, $0075, $00FD, $0079, $017E, $007A, $00C1, $0041, $010C, $0043, $00C9, $0045, $011A, $0045, $00CD, $0049, $0147, $004E, $00D3, $004F, $0158, $0052, $0160, $0053, $0164, $0054, $00DA, $0055, $016E, $0055, $00DD, $0059, $017D, $005A );

Character replace table for remove Czech diakritics.

Variables

DisableIconv: Boolean = False;

By this you can generally disable/enable Iconv support.

IdealCharsets: TMimeSetChar = [ISO_8859_1, ISO_8859_2, ISO_8859_3, ISO_8859_4, ISO_8859_5, ISO_8859_6, ISO_8859_7, ISO_8859_8, ISO_8859_9, ISO_8859_10, KOI8_R, KOI8_U , GB2312, EUC_KR, ISO_2022_JP, EUC_TW ];

Default set of charsets for IdealCharsetCoding function.


Generated by PasDoc 0.9.0 on 2012-04-23 21:38:58