Upload
octavio-burden
View
226
Download
0
Embed Size (px)
Citation preview
1
NAAM
Oracle Character setsAino Andriessen
2
Demo1
4
nls_length_semantics
Intializatie parameterCHAR of BYTE (default)Van toepassing op multi byte character setsDefinieert het type voor de lengte van character
kolommen en variabelenalter session set nls_length_semantics=CHAR;
niet met terugwerkende kracht ev pl/sql recompile alter system
5
nls_length_semantics 2
lengte van karakter kolommen en variabelen expliciet opgeven create table demo (naam varchar2(4 char)) create table demo (naam varchar2(4 byte))
t_naam varchar2(4 char); t_naam demo2.naam%TYPE
6
Demo2
8
Character encoding
9
Character set
Character set definieert de 'mapping' tussen binary/headecimale code en het character UTF8 WE8MSWIN1252 WE8ISO8859P1 JA16EUC US7ASCII WE8DEC ...
Code pages IBM / windows terminologie ~ analoog met character set code page per language
10
Character sets 2
ASCII 1 byte 128 karakters standaard letters uit het engels zonder accenten
ISO 8859 en latin-1 1 byte (8 bit) 256 karakters
CP-1252 Windows variant op latin 1
UTF8 variabel, multibyte max 4 bytes ~100000 karakters
• ~1 miljoen beschikbaar meertalig ascii codes zijn gelijk
11
Voorbeelden
Character Set Hexadecimale code - Euro
AL32UTF8 E282AC
WE8MSWIN1252 80
ASCII -
WE8ISO8859P1 -
WE8ISO8859P15 164
Character Set Hexadecimale code - é
AL32UTF8 C3A9 (50089)
WE8MSWIN1252 E9 (233)
ASCII -
WE8ISO8859P1 E9
WE8ISO8859P15 E9
12
Unicode / UTF 8 example
The image shows the number of bytes needed to store different kinds of characters in the UTF-8 character set. The ASCII characters (C, t, and d) require one byte. The Latin and Greek characters (á, ö, and Ø) require 2 bytes. The Asian character requires 3 bytes. The supplementary character (treble clef sign) requires 4 bytes of storage.
13
Diakrieten en speciale tekens
Diakrieten zijn accenten die bij (boven, onder of zelfs door) een letter gezet worden om de uitspraak van een letter te veranderen en daarmee taaleigen klanken van een (gewijzigde) letter te voorzien. àÿęňĜş etc.
Speciale tekens ßæ¿
14
Diakrieten en speciale tekens
Single byte character sets 1 byte voor samengesteld karakter Niet alle combinaties mogelijk code pages
UTF-8 diakriet heeft eigen codering samengesteld karakter heeft eigen codering
• meestal (altijd) samenstelling van oorspronkelijke karakter + diakriet
15
Database functies
Character functies substr - substrb - substrc - substr2 instr - ... length - lengthb
chr (n) Returns a character corresponding to the number passed in as the argument in the
database character set select chr (50089) from dual;
dump Returns a VARCHAR2 value containing the datatype code, length in bytes, and internal
representation of expr. The returned result is always in the database character set. select dump (naam, 1017) from demo2;
convert Converts a character string from one character set to another
utl_raw select utl_raw.cast_to_raw(naam) from demo2;
unistr() Converts the characters in x to the national language character set select (unistr('Ren\00e9')) from dual;
16
Demo3
18
nls_lang
Client character setWhen the client NLS_LANG character set is set to
the same value as the database character set, Oracle assumes that the data being sent or received are of the same (correct) encoding, so no conversions or validations may occur for performance reasons. The data is just stored as delivered by the client, bit by bit.
19
nls lang 2
language_country.character set american_america.UTF8 dutch_the netherlands.WE8MSWIN1252 american_THE NETHERLANDS.WE8MSWIN1252
Environment variable, nls_lang
Verschil in Windows GUI (WE8MSWIN1252) en command line (WE8PC850)
Wordt niet door Java clients gebruikt
20
Demo4
22
National character set
Support for another character set next to the database character set
e.g to allow japanese in a MSWIN1252 or ISO8859 character set
Less necessary in a UTF8 database
Multibytenvarchar, nclob etc.
23
Case
TELETEX karakterset bestaat niet meer in Oracle
select convert(naam,’TELETEX’,’UTF8’) from tabel;
Locale builder
25
sql> select name from emp
sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw (name)) from emp@db
sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw@db (name)) from emp@db
sql> select name from emp@db
26
Vraag
Diacrietloos zoeken
Case insensitive zoeken
27
Summary
nls_lenght_semanticsAlways explicitly define a character column with its
type (CHAR or BYTE)Oracle performs automatic character set
conversion wysinawyg
Use a Java clientWorking with character sets can be confusing
UTF8 is often the preferred character set
28
Referenties
Unicode en Ultraedit http://www.ultraedit.com/support/tutorials_power_tips/
ultraedit/unicode.html
nls_lang http://www.oracle.com/technology/tech/globalization/
htdocs/nls_lang%20faq.htm
Oracle globalization support http://download.oracle.com/docs/cd/B28359_01/
server.111/b28298/toc.htm
Wikipedia