MQSeries.net :: View topic - ResetContentDescriptor node error parsing CCSID 1200

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » ResetContentDescriptor node error parsing CCSID 1200

Goto page Previous 1, 2, 3 Next

ResetContentDescriptor node error parsing CCSID 1200

« View previous topic :: View next topic »

Author

Message

angka

Posted: Wed Mar 25, 2009 1:41 am Post subject:

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

I added the 0xFFFE and it works.. But base on W3C standard, it shld be able to work also by reading the <?xm of the xml document.

it states quoted from "http://www.w3.org/TR/REC-xml/#charencoding":
Without a Byte Order Mark:

3C 00 3F 00 UTF-16LE or little-endian ISO-10646-UCS-2 or other encoding with a 16-bit code unit in little-endian order and ASCII characters encoded as ASCII values (the encoding declaration must be read to determine which)

Why is MQ message broker not following this standard?

Any ideas?

Thanks

mqjeff

Posted: Wed Mar 25, 2009 2:03 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Does the <?xml tag include the *correct* encoding declaration?

rekarm01

Posted: Wed Mar 25, 2009 9:09 pm Post subject: Re: ResetContentDescriptor node error parsing CCSID 1200

Grand Master

Joined: 25 Jun 2008
Posts: 1415

angka wrote:

Btw the xml document encoding is set as utf-16.

The XML parsers do not use the xml encoding declaration to parse the input message; they use the input ccsid.

angka wrote:

running a trace cannot capture anything because it is all in BLOB

A BLOB is useful for comparing the message bytes against the message headers; for example, with utf-16, they can at least show the byte order.

angka wrote:

I added the 0xFFFE and it works.. But base on W3C standard, it shld be able to work also by reading the <?xm of the xml document.

it states quoted from "http://www.w3.org/TR/REC-xml/#charencoding":

... actually, that's quoted from http://www.w3.org/TR/REC-xml/#sec-guessing:

Quote:

Without a Byte Order Mark:

3C 00 3F 00 : UTF-16LE or little-endian ISO-10646-UCS-2 or other encoding with a 16-bit code unit in little-endian order and ASCII characters encoded as ASCII values (the encoding declaration must be read to determine which)

"3C 00 3F 00 ..."? Is that how the input message bitstream starts?

angka wrote:

Why is MQ message broker not following this standard?

That portion of the XML standard describes how to "guess" a suitable character encoding, in the absence of any other information, in order to be able to read the XML declaration. It is non-normative; the message broker is free to ignore it, and implement some other method of "guessing" the character encoding, (such as using the input ccsid in the message header).

The input message, however, should follow the normative part of the XML standard:

Quote:

Entities encoded in UTF-16 MUST ... begin with the Byte Order Mark ...

[Edit: The underlying issue is that there are some differences between how .NET and IBM MQ interpret ccsid=1200. windows-1200 is little-endian, but ibm-1200 is a bit more complicated.]

Last edited by rekarm01 on Sun Nov 26, 2017 2:32 pm; edited 1 time in total

angka

Posted: Wed Mar 25, 2009 10:57 pm Post subject:

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

The CCSID is 1200 when it reaches the Broker. I stop the flow and look at it in from the queue.

The message CCSID is 1200 and the encoding declaration is UTF-16. 1200 is .net default CCSID but Windows is in little endian so there will be conflict base on "ccsid=1200 (UTF-16BE) - big-endian (no BOM)"? If so why does .net set the default to 1200?

ya "3C 00 3F 00 ..." is how the input message bitstreams start.

Thanks.

rekarm01

Posted: Thu Mar 26, 2009 2:19 am Post subject: Re: ResetContentDescriptor node error parsing CCSID 1200

Grand Master

Joined: 25 Jun 2008
Posts: 1415

angka wrote:

1200 is .net default CCSID but Windows is in little endian so there will be conflict base on "ccsid=1200 (UTF-16BE)"?

If adding a BOM fixes the problem, then do that. Or try changing the CCSID in .NET to 1202 (little-endian), or 1204 (BOM-endian).

Failing that, the not-so-recommended solution, (last resort, really), would be for the message flow to try to repair incoming messages itself, before parsing them. For example, the message flow could implement auto-detection of character encodings, as described in the XML standard, overwriting ccsids, and adding/removing BOMs, as needed.

[Edit: In this case, it's the WMB broker that recognizes the BOM and alternate ccsids, not IBM MQ or .NET]

Last edited by rekarm01 on Sun Nov 26, 2017 3:22 pm; edited 1 time in total

angka

Posted: Thu Mar 26, 2009 2:28 am Post subject:

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

But MQ doesnt ve CCSID 1202 and 1204. Thanks

rekarm01

Posted: Sat Mar 28, 2009 1:45 pm Post subject: Re: ResetContentDescriptor node error parsing CCSID 1200

Grand Master

Joined: 25 Jun 2008
Posts: 1415

angka wrote:

But WMQ doesn't have CCSID 1202 and 1204.

And ... ? Is that causing any problems? If so, more details would have helped here.

To be more precise, WMQ doesn't support character conversion to/from CCSIDs 1202 or 1204, but that shouldn't prevent it from getting and putting unconverted messages. Fortunately, applications can use WMB, (or perhaps .NET, database, or even the OS itself), to carry out such conversions when needed, instead of WMQ.

Ultimately, applications are responsible for keeping track of the character encoding associated with messages, as the messages move from point to point.

angka

Posted: Tue Mar 31, 2009 1:32 am Post subject:

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

I understand that string in .net are using utf-16 and in little endian and when i set the CCSID of the message as 1200 it does not convert little endian to big endian.

and that is why when WMB try to parse the message, it reads in the CCSID 1200 but the message body is not in big Endian and so it fails.

now my question is what CCSID should the sender set so tt it will tell the xml parser in WMB tt the message content is in utf-16 and in little Endian?

Thanks

kimbert

Posted: Wed Apr 01, 2009 12:56 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Please read this: http://unicode.org/faq/utf_bom.html
It was the first hit when I Googled for 'UTF-16 BOM'

angka

Posted: Wed Apr 01, 2009 6:52 pm Post subject:

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

I had read this before. What i need now is a CCSID with little endian.

Thanks

kimbert

Posted: Wed Apr 01, 2009 11:25 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

http://www-01.ibm.com/software/globalization/cdra/appendix_k.jsp

angka

Posted: Thu Apr 02, 2009 1:36 am Post subject:

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

Saw this before but MQ does not support those CCSID. Or is there a way to change .net string to BE instead?

Thanks

kimbert

Posted: Thu Apr 02, 2009 2:01 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Message broker supports code pages 1202/1203 ( 'UTF16-LE'). What's the problem?

angka

Posted: Fri Apr 03, 2009 2:04 am Post subject:

Chevalier

Joined: 20 Sep 2005
Posts: 406

Hi,

I already tested using WMB to change the CCSID to 1202., it works. However, I would like the Sender to input the CCSID which represents the message they are sending.

Actually the receiving QMgr where the WMB resides is a Data Exchange. and if i change the CCSID for the sender just for the xml parsing, i will need to change it back to 1200 to route the message to other subscriber. Besides, i will need to know what type of encoding the message is for each and every publisher who connects and send messages to me.

If MQ supports 1202 I will only need to tell those who publish to me to set their CCSID accordingly to their message encoding.

Thanks.

rekarm01

Posted: Sun Apr 05, 2009 3:40 pm Post subject: Re: ResetContentDescriptor node error parsing CCSID 1200

Grand Master

Joined: 25 Jun 2008
Posts: 1415

angka wrote:

I understand that string in .net are using utf-16 and in little endian

.NET strings themselves don't really have an endian order. For UTF-16, endian refers to the byte order that results from a character->byte encoding scheme. .NET strings consist of characters, not bytes; they don't use a character->byte encoding scheme. Other methods do, when they convert strings to a sequence of bytes.

angka wrote:

now my question is what CCSID should the sender set so tt it will tell the xml parser in WMB tt the message content is in utf-16 and in little Endian?

So adding a BOM works, or setting ccsid=1202 works. Do either of these options cause problems for the other subscribers?

angka wrote:

Saw this before but MQ does not support those CCSID.

MQ accepts any 16-bit unsigned ccsid, (even user-defined ccsids).

MQ does not support conversion to/from every combination of ccsids. Don't use MQ convert-on-get in those cases.

.NET MQMessage read/write methods do not support conversion to/from every ccsid. Don't use MQMessage convert-on-read/convert-on-write methods in those cases.

[Edit: Because .NET, IBM MQ, WMB, and the CDRA, all interpret endianness for ccsid=1200 differently, there probably aren't any better options for UTF-16.]

Last edited by rekarm01 on Mon Nov 27, 2017 3:47 am; edited 2 times in total

Display posts from previous:

Goto page Previous 1, 2, 3 Next

Page 2 of 3

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » ResetContentDescriptor node error parsing CCSID 1200

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP