At first we check character-set on both databases:
SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET’;Source db - RU8PC866 (this was set in that database before unicode invented :) )
Remote db - AL32UTF8
So, we face the problem as Unicode data retrieved from non-Unicode database. The same question marks appear all the time and client's NLS_LANG variable does not help.
Let see what those questions marks are:
SELECT dump(convert(name, 'RU8PC866' , 'AL32UTF8' ) , 1016) from remotetable@db_link; Typ=1 Len=5 CharacterSet=AL32UTF8: 3f,3f,3f,3f,3f Typ=1 Len=6 CharacterSet=AL32UTF8: 3f,3f,3f,3f,3f,3f Typ=1 Len=6 CharacterSet=AL32UTF8: 3f,3f,3f,3f,3f,3fWe did explicit conversion where hexadecimal 3F is 63 in decimal and question mark symbol in ASCII. That's it.
dump function returns a varchar2 value that includes the datatype code, the length in bytes, and the internal representation of the expression. In our case AL32UTF8.
convert function converts a character string from one character set to another. In our case from AL32UTF8 into RU8PC866. The same conversion takes place implicitly when we select remote table.
Let’s try another conversion:
SELECT dump(convert(name, 'UTF8' , 'AL32UTF8' ) , 1016) from remotetable@db_link; Typ=1 Len=15 CharacterSet=AL32UTF8: e1,83,93,e1,83,94,e1,83,9b,e1,83,9c,e1,83,90 Typ=1 Len=18 CharacterSet=AL32UTF8: e1,83,a4,e1,83,a0,e1,83,98,e1,83,93,e1,83,9d,e1,83,9c Typ=1 Len=18 CharacterSet=AL32UTF8: e1,83,92,e1,83,a0,e1,83,98,e1,83,92,e1,83,9d,e1,83,9aHere, we see correct codes in UTF-8.
For example: code “e1,83,93” corresponds to Georgian character ”დ” (don). However UTF8 representation has a little problem with our further development. The reason is that we are going to use function UNISTR which takes input argument in UCS-2 ( UTF-16 nowadays) format as two code units, the first from the high-surrogates range (U+D800 to U+DBFF), and the second from the low-surrogates range (U+DC00 to U+DFFF). In UTF-8 representation “e1,83,93” equals to pair 10, D3 in UTF-16.
So, correct conversion retrieved in oracle's AL16UTF16:
SELECT dump(convert(name, AL16UTF16, 'AL32UTF8' ) , 1016) from remotetable@db_link; Typ=1 Len=10 CharacterSet=AL32UTF8: 10,d3,10,d4,10,db,10,dc,10,d0 Typ=1 Len=12 CharacterSet=AL32UTF8: 10,e4,10,e0,10,d8,10,d3,10,dd,10,dc Typ=1 Len=12 CharacterSet=AL32UTF8: 10,d2,10,e0,10,d8,10,d2,10,dd,10,daAt this point we simply use UNISTR passing to it a value returned from dump function.
For example: UNISTR ('\10D3') results Georgian character "დ" (don).
Only problem left is to parse dump results in well formatted string of codes understandable for function UNISTR.
For this purpose we created a small function taking a string as an argument, performing transformation and returning formatted code sequence for UNISTR function
pseudo-function:
f(Typ=1 Len=12 CharacterSet=AL32UTF8: 10,d2,10,e0,10,d8,10,d2,10,dd,10,da) = "\10d2\10e0\10d8\10d2\10dd\10da".Here is the real function:
CREATE OR REPLACE function PORTA_MNP_APP.unihex (pString varchar2) return varchar2 is v_pos1 number; v_pos2 number; v_res varchar2(2000); v_res1 varchar2(2000); v_char char(1); v_char1 varchar2(2000); v_len number; begin if pString is null then return null; end if; SELECT dump(convert(pString, 'AL16UTF16' , 'AL32UTF8' ) , 16) into v_res from dual; v_res := v_res || ','; v_pos1 :=instr(v_res, ':',1); v_len:=length( v_res ); v_pos2:=0; FOR i IN v_pos1+2 .. v_len LOOP v_char := substr( v_res, i, 1 ); if v_char =',' then if length(v_char1)=1 then v_res1:=v_res1 || '0' || v_char1; v_char1:=''; v_pos2:=v_pos2+1; else v_res1:=v_res1 || v_char1; v_char1:=''; end if; else v_char1:=v_char1 || v_char; if mod(v_pos2, 4)=0 and v_pos2<> v_len then v_res1:=v_res1 || '\'; end if; v_pos2:=v_pos2+1; end if; END LOOP; return v_res1; exception WHEN OTHERS THEN --DBMS_OUTPUT.Put_Line(pString); return null; end; /After compiled we created a view in remote database as follows:
CREATE OR REPLACE VIEW remoteview ( ID, REQUEST_ID, CUSTOMER_TYPE, NAME, SURNAME, CUSTOMER_FULLNAME, ADDRESS, ) AS SELECT ID, REQUEST_ID, CUSTOMER_TYPE, unihex (NAME) NAME, unihex (SURNAME) SURNAME, unihex (CUSTOMER_FULLNAME) CUSTOMER_FULLNAME, unihex (ADDRESS) ADDRESS, FROM np_history_custinfo;Finally, in source database session we run:
select unistr(name), unistr(surname), unistr(customer_fullname), unistr(address) FROM remoteview@db_linkOutput is pretty fine and formatted well in Georgian language. Unfortunately we can’t post final result here as it has sensitive data. :)
Thanks,
No comments:
Post a Comment