Google it ....

Showing posts with label unicode. Show all posts
Showing posts with label unicode. Show all posts

Friday, January 25, 2013

Textual description of firstImageUrl

Retrieve Unicode data in non-unicode database using db_link (Georgian)

Recently we created a database link on user’s request wondered to run ad-hoc query with complex join including demographic data (in Georgian language) residing on the remote database. Db link provided, user issued a query and got the results shown on picture 1. Question marks, caused by wrong character conversion.



















At first we check character-set on both databases:
SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET’;
Source db - RU8PC866 (this was set in that database before unicode invented :) )
Remote db  - AL32UTF8
So, we face the problem as Unicode data retrieved from non-Unicode database. The same question marks appear all the time and client's NLS_LANG variable does not help.
Let see what those questions marks are:
SELECT dump(convert(name, 'RU8PC866' , 'AL32UTF8' ) , 1016) from remotetable@db_link;

Typ=1 Len=5 CharacterSet=AL32UTF8: 3f,3f,3f,3f,3f
Typ=1 Len=6 CharacterSet=AL32UTF8: 3f,3f,3f,3f,3f,3f
Typ=1 Len=6 CharacterSet=AL32UTF8: 3f,3f,3f,3f,3f,3f
We did explicit conversion where hexadecimal 3F is 63 in decimal and question mark symbol in ASCII. That's it.
dump function returns a varchar2 value that includes the datatype code, the length in bytes, and the internal representation of the expression. In our case AL32UTF8.
convert function converts a character string from one character set to another. In our case from AL32UTF8 into RU8PC866.  The same conversion takes place implicitly when we select remote table.
Let’s try another conversion:
SELECT dump(convert(name, 'UTF8' , 'AL32UTF8' ) , 1016) from remotetable@db_link;

Typ=1 Len=15 CharacterSet=AL32UTF8: e1,83,93,e1,83,94,e1,83,9b,e1,83,9c,e1,83,90
Typ=1 Len=18 CharacterSet=AL32UTF8: e1,83,a4,e1,83,a0,e1,83,98,e1,83,93,e1,83,9d,e1,83,9c
Typ=1 Len=18 CharacterSet=AL32UTF8: e1,83,92,e1,83,a0,e1,83,98,e1,83,92,e1,83,9d,e1,83,9a
Here, we see correct codes in UTF-8.
For example: code “e1,83,93” corresponds to Georgian character ”დ” (don). However UTF8 representation has a little problem with our further development. The reason is that we are going to use function UNISTR which takes input argument in UCS-2  ( UTF-16 nowadays) format  as two code units, the first from the high-surrogates range (U+D800 to U+DBFF), and the second from the low-surrogates range (U+DC00 to U+DFFF). In UTF-8 representation “e1,83,93” equals to pair 10, D3 in UTF-16.
So, correct conversion retrieved in oracle's AL16UTF16:
SELECT dump(convert(name, AL16UTF16, 'AL32UTF8' ) , 1016) from remotetable@db_link;

Typ=1 Len=10 CharacterSet=AL32UTF8: 10,d3,10,d4,10,db,10,dc,10,d0
Typ=1 Len=12 CharacterSet=AL32UTF8: 10,e4,10,e0,10,d8,10,d3,10,dd,10,dc
Typ=1 Len=12 CharacterSet=AL32UTF8: 10,d2,10,e0,10,d8,10,d2,10,dd,10,da
At this point we simply use UNISTR passing to it a value returned from dump function.
For example: UNISTR ('\10D3') results Georgian character "დ" (don).
Only problem left is to parse dump results in well formatted string of codes understandable for function UNISTR.
For this purpose we created a small function taking a string as an argument, performing transformation and returning formatted code sequence for UNISTR function
pseudo-function:
f(Typ=1 Len=12 CharacterSet=AL32UTF8: 10,d2,10,e0,10,d8,10,d2,10,dd,10,da) = 
"\10d2\10e0\10d8\10d2\10dd\10da".
Here is the real function:
CREATE OR REPLACE function PORTA_MNP_APP.unihex (pString varchar2)
   return varchar2
is
   v_pos1 number;
   v_pos2 number;
   v_res varchar2(2000); 
   v_res1 varchar2(2000); 
   v_char  char(1);
   v_char1  varchar2(2000);
   v_len number; 
 begin

 if  pString is null  then 
    return null;
 end if; 

 SELECT dump(convert(pString, 'AL16UTF16' , 'AL32UTF8' ) , 16) into v_res from dual; 

v_res := v_res || ','; 
v_pos1 :=instr(v_res, ':',1); 
v_len:=length( v_res ); 
v_pos2:=0; 

FOR i IN v_pos1+2 .. v_len  LOOP 
     v_char := substr( v_res, i, 1 ); 
     if v_char =',' then 
      if  length(v_char1)=1 then 
         v_res1:=v_res1 || '0' || v_char1; 
         v_char1:=''; 
          v_pos2:=v_pos2+1;  
              else 
            v_res1:=v_res1 ||  v_char1; 
            v_char1:='';
            end if; 
      else 
        v_char1:=v_char1 || v_char; 
            if  mod(v_pos2, 4)=0 and v_pos2<> v_len then   
                    v_res1:=v_res1 || '\'; 
               end if;  
            v_pos2:=v_pos2+1;  

       end if; 
  END LOOP; 

  return v_res1; 

 exception  
    WHEN OTHERS THEN  
      --DBMS_OUTPUT.Put_Line(pString); 
      return null;
   end;
/
After compiled we created a view in remote database as follows:
CREATE OR REPLACE VIEW remoteview
(
ID,
REQUEST_ID,
CUSTOMER_TYPE,
NAME,
SURNAME,
CUSTOMER_FULLNAME,
ADDRESS,
)
AS
SELECT ID,
REQUEST_ID,
CUSTOMER_TYPE,
unihex (NAME) NAME,
unihex (SURNAME) SURNAME,
unihex (CUSTOMER_FULLNAME) CUSTOMER_FULLNAME,
unihex (ADDRESS) ADDRESS,
FROM np_history_custinfo;
Finally, in source database session we run:
select unistr(name), unistr(surname), unistr(customer_fullname), unistr(address)
FROM remoteview@db_link
Output is pretty fine and formatted well in Georgian language. Unfortunately we can’t post final result here as it has sensitive data. :)
Thanks,