HI + IM = Nulli

Nulli experts share their Human Information + Identity Management knowledge

UTF-8 and Oracle Access Manager 10g

OAM supports UTF-8 in incoming data, and can generate HTML pages encoded with UTF-8, but what about internally? Is UTF-8 data available in plugins? In HTTP header variables?

We tested 10.1.4.3 on Windows and were surprised that our UTF-8 data was being interpretted incorrectly in our managed plugins (though exec ppp plugins worked as expected).

The character Û ( U with a circumflex) has a code point value of 219 (all numbers are decimal). In UTF-8 this is encoded as the bytes 195 & 155. However, when this text reaches our plugin it appears as Û (A with tilde & single right-pointing angle quotation mark). In .NET Strings are in unicode, so we know something is happening with the identity server to re-interpret the bytes 195 & 155 as some other encoding and then to provide us that String as unicode. That encoding turns out to be Windows-1252 - the default code page on our Windows system. 195 is Ã, while 155 is ›. Luckily there is a simple workaround – we get the Windows-1252 byte value of the string and then interpet those bytes at UTF-8.

Encoding encoding_1252 = Encoding.GetEncoding("Windows-1252");
string utf8String = Encoding.UTF8.GetString(encoding_1252.GetBytes(windows1252String))


Using Reflector I can see a few calls to StringToHGlobalAnsi in the managed library, and I would guess a similar call like PtrToStringAnsi is used for converting between unmanaged and managed memory, and this may be a cause of the issue.

This issue also exists in the Access Server. If you want to send a UTF-8 attribute value in a header, OAM is smart enough to base 64 encode it (according to RFC 2047 ). So our value should be encoded useing this format "=?UTF-8?B?" base64-encoded-text "?=". Unfortunately, the text to be encoded is incorrect – the access server is B64 encoding the Windows-1252 interpretation of the UTF-8 bytes. You'll need to B64 decode the header text and then use the re-encoding code shown earlier to get the real value.

One thing to note is that if your default code page is something other then Windows-1252, you'll proably have to interpret the string using that code page.

Disclaimer: This information is provided "AS IS" without warranty of any kind, either expressed or implied. The entire risk as to the quality and performance of the information is with you.
Post a Comment: