| Author | Message | 
		
		  | BCBS | 
			  
				|  Posted: Sun Jul 18, 2010 12:58 am    Post subject: [RESOLVED] Unable to store/retrieve Chinese char's in db |   |  | 
		
		  |  Apprentice
 
 
 Joined: 12 Jul 2006Posts: 37
 
 
 | 
			  
				| I am trying to work with a message that contains Chinese characters. All I want to do is insert an xml message (whole) in a database as CLOB. 
 Process for insert into database, retrieval from database is happening without exceptions. But after I retrieve the message and examine, the Chinese characters seem to be damaged.
 
 Environment:
 Message Broker 6.0.10
 Oracle 10G (support for UTF-8 characters enabled)
 
 Insert:
 DECLARE msgBitStream BLOB InputRoot.BLOB.BLOB;
 INSERT  INTO Database.AGGR_DATA_T (
 ....
 AGGR_MSG_BODY_IMG,
 ..........
 ) VALUES (
 ........
 msgBitStream,
 .........
 );
 _________________
 _________________________________
 
 Last edited by BCBS on Thu Jul 22, 2010 1:16 pm; edited 1 time in total
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | fjb_saper | 
			  
				|  Posted: Sun Jul 18, 2010 5:18 am    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 18 Nov 2003Posts: 20767
 Location: LI,NY
 
 | 
			  
				| Well, 
 Before you insert your BLOB into the CLOB field of the DB which CCSID ist it in? Hint: if the CCSID of the BLOB is not 1208, you may want to make it so before inserting it into the DB.
  _________________
 MQ & Broker admin
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | smdavies99 | 
			  
				|  Posted: Sun Jul 18, 2010 5:27 am    Post subject: |   |  | 
		
		  |  Jedi Council
 
 
 Joined: 10 Feb 2003Posts: 6076
 Location: Somewhere over the Rainbow this side of Never-never land.
 
 | 
			  
				| You need to trace/dump your message (in hex format) from inside broker. This will tell you if the data you are receiving is correctly formatted.
 
 As other posts on this topic in this forum will show, you need to make sure that the data your receive is really correctly formatted before it gets into broker.
 
 For example,
 A Webapp (portal) portled gets a character field from the user. The default CCSID for that is ISO 8859-1 ( aka 817). This is wrapped in a SOAP UTF-8 envelope but the data in the character is not converted from 817 to 1208 before the SOAP message is sent.
 Most of the time, this is not a problem but there are cases (eg GBP sign & Euro) that make the receiving flow throw its toys out of the pram.
 
 So, please check that the Chinese Characters are correctly escaped in the incoming UTF-8 message ( the Endian also matters!). It does mean decoding the Hex though but that is the only way to be sure.
 _________________
 WMQ User since 1999
 MQSI/WBI/WMB/'Thingy' User since 2002
 Linux user since 1995
 
 Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | rekarm01 | 
			  
				|  Posted: Sun Jul 18, 2010 5:55 am    Post subject: Re: Unable to store/retrieve Chinese char's in database |   |  | 
		
		  | Grand Master
 
 
 Joined: 25 Jun 2008Posts: 1415
 
 
 | 
			  
				| 
   
	| BCBS wrote: |  
	| But after I retrieve the message and examine, the Chinese characters seem to be damaged. |  How are they damaged?  Be more specific.
 
 
 
   
	| BCBS wrote: |  
	| Environment: Message Broker 6.0.10
 Oracle 10G (support for UTF-8 characters enabled)
 |  How was support for UTF-8 enabled in Oracle 10G?  Be more specific.
 
 Maybe this topic will help.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | rekarm01 | 
			  
				|  Posted: Sun Jul 18, 2010 6:11 am    Post subject: Re: Unable to store/retrieve Chinese char's in database |   |  | 
		
		  | Grand Master
 
 
 Joined: 25 Jun 2008Posts: 1415
 
 
 | 
			  
				| 
   
	| smdavies99 wrote: |  
	| For example, A Webapp (portal) portled gets a character field from the user. The default CCSID for that is ISO 8859-1 (aka 817).
 |  aka 819.
 
 
   
	| smdavies99 wrote: |  
	| So, please check that the Chinese Characters are correctly escaped in the incoming UTF-8 message (the Endian also matters!). |  UTF-8 doesn't have endianness.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | smdavies99 | 
			  
				|  Posted: Sun Jul 18, 2010 6:19 am    Post subject: |   |  | 
		
		  |  Jedi Council
 
 
 Joined: 10 Feb 2003Posts: 6076
 Location: Somewhere over the Rainbow this side of Never-never land.
 
 | 
			  
				| indeed utf-8 does not have endianess. But once you escape into double byte (or 4 byte) you do. _________________
 WMQ User since 1999
 MQSI/WBI/WMB/'Thingy' User since 2002
 Linux user since 1995
 
 Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | BCBS | 
			  
				|  Posted: Sun Jul 18, 2010 11:06 am    Post subject: |   |  | 
		
		  |  Apprentice
 
 
 Joined: 12 Jul 2006Posts: 37
 
 
 | 
			  
				| Thanks all for your responses. 
 
 
   
	| fjb_saper wrote: |  
	| Well, 
 Before you insert your BLOB into the CLOB field of the DB which CCSID ist it in? Hint: if the CCSID of the BLOB is not 1208, you may want to make it so before inserting it into the DB.
  |  Yes the CCSID is 1208, and ENCODING is 273. To make sure these values are correct, I just ran the message thru simple flow including a trace node, and it clearly showed these values in InputRoot.Properties.
 
 Also, I am wondering, here in the forum, I am trying to post the big response (wanted to post ESQL for retrieval), and every time I am getting webpage error (service may be temporarily down...). Any idea? Its been long time I visited this site, wondering if there is any limit on the size of the posting.
 _________________
 _________________________________
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | smdavies99 | 
			  
				|  Posted: Sun Jul 18, 2010 12:01 pm    Post subject: |   |  | 
		
		  |  Jedi Council
 
 
 Joined: 10 Feb 2003Posts: 6076
 Location: Somewhere over the Rainbow this side of Never-never land.
 
 | 
			  
				| Have you checked that the HEX coding of the message (BLOB.BLOB) is correct? 
 I've experience the same issue as you when I've tried to post code/trace output etc.
 When I hit the submit button the error occurs.
 
 Perhaps this is an issue for the site admins?
 
 I normally post from Firefox.(in case it is browser specific)
 _________________
 WMQ User since 1999
 MQSI/WBI/WMB/'Thingy' User since 2002
 Linux user since 1995
 
 Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | rekarm01 | 
			  
				|  Posted: Sun Jul 18, 2010 3:57 pm    Post subject: Re: Unable to store/retrieve Chinese char's in database |   |  | 
		
		  | Grand Master
 
 
 Joined: 25 Jun 2008Posts: 1415
 
 
 | 
			  
				| 
   
	| BCBS wrote: |  
	| Yes the CCSID is 1208, and ENCODING is 273. |  ENCODING is for binary numeric data.  It's not relevant for character data (such as XML).
 
 
 
   
	| BCBS wrote: |  
	| To make sure these values are correct, I just ran the message thru simple flow including a trace node, and it clearly showed these values in InputRoot.Properties. |  To make sure these values are correct, it's necessary to check them against the actual bytes in InputRoot.BLOB.BLOB, (as smdavies99 suggests).
 
 
 
   
	| BCBS wrote: |  
	| I am trying to post the big response (wanted to post ESQL for retrieval), and every time I am getting webpage error (service may be temporarily down...). Any idea? Its been long time I visited this site, wondering if there is any limit on the size of the posting. |  Whether there is a site-imposed limit or not, try to pare down the posts to just the relevant bits, as a courtesy to the other participants whose help you're asking for.
 
 Enclose any ESQL, usertrace data, or other formatted data in [code] tags to make it easier to read.
 
 Try not to post the entire contents of Root.BLOB.BLOB; the portions that represent the XML declaration and any non-ASCII data are usually sufficient.
 
 Add line breaks to the posted output as needed, to minimize horizontal scrolling.  Use the 'Preview' button before posting.
 
 
 
   
	| rekarm01 wrote: |  
	| How was support for UTF-8 enabled in Oracle 10G? Be more specific. 
 Maybe this topic will help.
 |  Did that topic help?
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | kimbert | 
			  
				|  Posted: Mon Jul 19, 2010 12:12 am    Post subject: |   |  | 
		
		  |  Jedi Council
 
 
 Joined: 29 Jul 2003Posts: 5543
 Location: Southampton
 
 | 
			  
				| 
  I hesitate to disagree with someone who clearly knows more about this subject than I do, but... 
	| Quote: |  
	| ENCODING is for binary numeric data. It's not relevant for character data (such as XML). |  
 http://en.wikipedia.org/wiki/UTF-16/UCS-2#Byte_order_encoding_schemes
 
 I'm fairly sure that endianness does need to be factored into the equation when encoding and decoding streams of UTF-16 and UTF-32 data. Sometimes the encoding implies the endianness ( UTF-16BE, UTF-16LE) and sometimes it gets supplied by a BOM. But in the absence of either of those, the endianness must be supplied in some other way.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | rekarm01 | 
			  
				|  Posted: Mon Jul 19, 2010 3:37 am    Post subject: |   |  | 
		
		  | Grand Master
 
 
 Joined: 25 Jun 2008Posts: 1415
 
 
 | 
			  
				| 
   
	| kimbert wrote: |  
	| 
  I hesitate to disagree with someone who clearly knows more about this subject than I do, but... 
	| Quote: |  
	| ENCODING is for binary numeric data. It's not relevant for character data (such as XML). |  |  To be more precise, for the WMB Root.Properties header (and other related headers):
 Related headers should have a Format field, to identify which bytes represent numeric data, and which message bytes represent character data.  Messages with Format=MQFMT_STRING contain only character data, so the Encoding field is not relevant.  This should include any incoming (or outgoing) XML messages; XML parsers encode any numeric or other binary data as characters.the Encoding field describes the byte encoding for numeric data in a message (binary integers, packed-decimal integers, and floating-point numbers)the CodedCharSetId field describes the byte encoding for character data in a message (similar to the XML encoding declaration, or MIME/SGML charset parameter, or the Unicode character map)
 
 
 
   
	| kimbert wrote: |  
	| http://en.wikipedia.org/wiki/UTF-16/UCS-2#Byte_order_encoding_schemes 
 I'm fairly sure that endianness does need to be factored into the equation when encoding and decoding streams of UTF-16 and UTF-32 data. Sometimes the encoding implies the endianness ( UTF-16BE, UTF-16LE) and sometimes it gets supplied by a BOM. But in the absence of either of those, the endianness must be supplied in some other way.
 |  The CCSID describes the character encoding scheme (including endianness):
 ccsid=1208: UTF-8ccsid=1200: UTF-16BEccsid=1202: UTF-16LEccsid=1204: UTF-16 with BOMccsid=1232: UTF-32BEccsid=1234: UTF-32LEccsid=1236: UTF-32 with BOM
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | BCBS | 
			  
				|  Posted: Mon Jul 19, 2010 9:42 am    Post subject: |   |  | 
		
		  |  Apprentice
 
 
 Joined: 12 Jul 2006Posts: 37
 
 
 | 
			  
				| Hello ALL, I have been little late in responding or trying out all the valuable suggestions provided byyou guys. Its Monday morning and I am fresh to work on this.
 
 I will update you all with my findings. Thanks again for great help on this so far.
 _________________
 _________________________________
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | BCBS | 
			  
				|  Posted: Thu Jul 22, 2010 1:30 pm    Post subject: |   |  | 
		
		  |  Apprentice
 
 
 Joined: 12 Jul 2006Posts: 37
 
 
 | 
			  
				| The issue resolved after we changed the settings on MB server. 
 
 
   
	| rekarm01 wrote: |  
	| Did that topic help? |  Yes. Its the root cause. Thanks much 'rekarm01'
 
 Solution:
 1) MB’s ODBC data source definition updated - ColumnSizeAsCharacter=1.
 2) Set the variable NLS_LANG in the broker environment to NLS_LANG=AMERICAN_AMERICA.AL32UTF8
 3) No changes on DB except characterset for the database is set to UTF8.
 
 
 
   
	| smdavies99 wrote: |  
	| Have you checked that the HEX coding of the message (BLOB.BLOB) is correct? |  Before we made the changes, the hex coding was not matching for store with retrieval '
  ' After the change, it matched  '
  ' 
 Once again, thank you all for your valuable input and suggestions.
 _________________
 _________________________________
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  |  |