Author |
Message
|
angka |
Posted: Wed Mar 25, 2009 1:41 am Post subject: |
|
|
Chevalier
Joined: 20 Sep 2005 Posts: 406
|
Hi,
I added the 0xFFFE and it works.. But base on W3C standard, it shld be able to work also by reading the <?xm of the xml document.
it states quoted from "http://www.w3.org/TR/REC-xml/#charencoding":
Without a Byte Order Mark:
3C 00 3F 00 UTF-16LE or little-endian ISO-10646-UCS-2 or other encoding with a 16-bit code unit in little-endian order and ASCII characters encoded as ASCII values (the encoding declaration must be read to determine which)
Why is MQ message broker not following this standard?
Any ideas?
Thanks |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Mar 25, 2009 2:03 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Does the <?xml tag include the *correct* encoding declaration? |
|
Back to top |
|
 |
rekarm01 |
Posted: Wed Mar 25, 2009 9:09 pm Post subject: Re: ResetContentDescriptor node error parsing CCSID 1200 |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
angka wrote: |
Btw the xml document encoding is set as utf-16. |
The XML parsers do not use the xml encoding declaration to parse the input message; they use the input ccsid.
angka wrote: |
running a trace cannot capture anything because it is all in BLOB |
A BLOB is useful for comparing the message bytes against the message headers; for example, with utf-16, they can at least show the byte order.
angka wrote: |
I added the 0xFFFE and it works.. But base on W3C standard, it shld be able to work also by reading the <?xm of the xml document.
it states quoted from "http://www.w3.org/TR/REC-xml/#charencoding": |
... actually, that's quoted from http://www.w3.org/TR/REC-xml/#sec-guessing:
Quote: |
Without a Byte Order Mark:
3C 00 3F 00 : UTF-16LE or little-endian ISO-10646-UCS-2 or other encoding with a 16-bit code unit in little-endian order and ASCII characters encoded as ASCII values (the encoding declaration must be read to determine which) |
"3C 00 3F 00 ..."? Is that how the input message bitstream starts?
angka wrote: |
Why is MQ message broker not following this standard? |
That portion of the XML standard describes how to "guess" a suitable character encoding, in the absence of any other information, in order to be able to read the XML declaration. It is non-normative; the message broker is free to ignore it, and implement some other method of "guessing" the character encoding, (such as using the input ccsid in the message header).
The input message, however, should follow the normative part of the XML standard:
Quote: |
Entities encoded in UTF-16 MUST ... begin with the Byte Order Mark ... |
[Edit: The underlying issue is that there are some differences between how .NET and IBM MQ interpret ccsid=1200. windows-1200 is little-endian, but ibm-1200 is a bit more complicated.]
Last edited by rekarm01 on Sun Nov 26, 2017 2:32 pm; edited 1 time in total |
|
Back to top |
|
 |
angka |
Posted: Wed Mar 25, 2009 10:57 pm Post subject: |
|
|
Chevalier
Joined: 20 Sep 2005 Posts: 406
|
Hi,
The CCSID is 1200 when it reaches the Broker. I stop the flow and look at it in from the queue.
The message CCSID is 1200 and the encoding declaration is UTF-16. 1200 is .net default CCSID but Windows is in little endian so there will be conflict base on "ccsid=1200 (UTF-16BE) - big-endian (no BOM)"? If so why does .net set the default to 1200?
ya "3C 00 3F 00 ..." is how the input message bitstreams start.
Thanks. |
|
Back to top |
|
 |
rekarm01 |
Posted: Thu Mar 26, 2009 2:19 am Post subject: Re: ResetContentDescriptor node error parsing CCSID 1200 |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
angka wrote: |
1200 is .net default CCSID but Windows is in little endian so there will be conflict base on "ccsid=1200 (UTF-16BE)"? |
If adding a BOM fixes the problem, then do that. Or try changing the CCSID in .NET to 1202 (little-endian), or 1204 (BOM-endian).
Failing that, the not-so-recommended solution, (last resort, really), would be for the message flow to try to repair incoming messages itself, before parsing them. For example, the message flow could implement auto-detection of character encodings, as described in the XML standard, overwriting ccsids, and adding/removing BOMs, as needed.
[Edit: In this case, it's the WMB broker that recognizes the BOM and alternate ccsids, not IBM MQ or .NET]
Last edited by rekarm01 on Sun Nov 26, 2017 3:22 pm; edited 1 time in total |
|
Back to top |
|
 |
angka |
Posted: Thu Mar 26, 2009 2:28 am Post subject: |
|
|
Chevalier
Joined: 20 Sep 2005 Posts: 406
|
Hi,
But MQ doesnt ve CCSID 1202 and 1204. Thanks |
|
Back to top |
|
 |
rekarm01 |
Posted: Sat Mar 28, 2009 1:45 pm Post subject: Re: ResetContentDescriptor node error parsing CCSID 1200 |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
angka wrote: |
But WMQ doesn't have CCSID 1202 and 1204. |
And ... ? Is that causing any problems? If so, more details would have helped here.
To be more precise, WMQ doesn't support character conversion to/from CCSIDs 1202 or 1204, but that shouldn't prevent it from getting and putting unconverted messages. Fortunately, applications can use WMB, (or perhaps .NET, database, or even the OS itself), to carry out such conversions when needed, instead of WMQ.
Ultimately, applications are responsible for keeping track of the character encoding associated with messages, as the messages move from point to point. |
|
Back to top |
|
 |
angka |
Posted: Tue Mar 31, 2009 1:32 am Post subject: |
|
|
Chevalier
Joined: 20 Sep 2005 Posts: 406
|
Hi,
I understand that string in .net are using utf-16 and in little endian and when i set the CCSID of the message as 1200 it does not convert little endian to big endian.
and that is why when WMB try to parse the message, it reads in the CCSID 1200 but the message body is not in big Endian and so it fails.
now my question is what CCSID should the sender set so tt it will tell the xml parser in WMB tt the message content is in utf-16 and in little Endian?
Thanks |
|
Back to top |
|
 |
kimbert |
Posted: Wed Apr 01, 2009 12:56 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5543 Location: Southampton
|
|
Back to top |
|
 |
angka |
Posted: Wed Apr 01, 2009 6:52 pm Post subject: |
|
|
Chevalier
Joined: 20 Sep 2005 Posts: 406
|
Hi,
I had read this before. What i need now is a CCSID with little endian.
Thanks |
|
Back to top |
|
 |
kimbert |
Posted: Wed Apr 01, 2009 11:25 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5543 Location: Southampton
|
|
Back to top |
|
 |
angka |
Posted: Thu Apr 02, 2009 1:36 am Post subject: |
|
|
Chevalier
Joined: 20 Sep 2005 Posts: 406
|
Hi,
Saw this before but MQ does not support those CCSID. Or is there a way to change .net string to BE instead?
Thanks |
|
Back to top |
|
 |
kimbert |
Posted: Thu Apr 02, 2009 2:01 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5543 Location: Southampton
|
Message broker supports code pages 1202/1203 ( 'UTF16-LE'). What's the problem? |
|
Back to top |
|
 |
angka |
Posted: Fri Apr 03, 2009 2:04 am Post subject: |
|
|
Chevalier
Joined: 20 Sep 2005 Posts: 406
|
Hi,
I already tested using WMB to change the CCSID to 1202., it works. However, I would like the Sender to input the CCSID which represents the message they are sending.
Actually the receiving QMgr where the WMB resides is a Data Exchange. and if i change the CCSID for the sender just for the xml parsing, i will need to change it back to 1200 to route the message to other subscriber. Besides, i will need to know what type of encoding the message is for each and every publisher who connects and send messages to me.
If MQ supports 1202 I will only need to tell those who publish to me to set their CCSID accordingly to their message encoding.
Thanks. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Apr 05, 2009 3:40 pm Post subject: Re: ResetContentDescriptor node error parsing CCSID 1200 |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
angka wrote: |
I understand that string in .net are using utf-16 and in little endian |
.NET strings themselves don't really have an endian order. For UTF-16, endian refers to the byte order that results from a character->byte encoding scheme. .NET strings consist of characters, not bytes; they don't use a character->byte encoding scheme. Other methods do, when they convert strings to a sequence of bytes.
angka wrote: |
now my question is what CCSID should the sender set so tt it will tell the xml parser in WMB tt the message content is in utf-16 and in little Endian? |
So adding a BOM works, or setting ccsid=1202 works. Do either of these options cause problems for the other subscribers?
angka wrote: |
Saw this before but MQ does not support those CCSID. |
MQ accepts any 16-bit unsigned ccsid, (even user-defined ccsids).
MQ does not support conversion to/from every combination of ccsids. Don't use MQ convert-on-get in those cases.
.NET MQMessage read/write methods do not support conversion to/from every ccsid. Don't use MQMessage convert-on-read/convert-on-write methods in those cases.
[Edit: Because .NET, IBM MQ, WMB, and the CDRA, all interpret endianness for ccsid=1200 differently, there probably aren't any better options for UTF-16.]
Last edited by rekarm01 on Mon Nov 27, 2017 3:47 am; edited 2 times in total |
|
Back to top |
|
 |
|