MQSeries.net :: View topic - java.io.UTFDataFormatException whilst getting a message

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Java / JMS » java.io.UTFDataFormatException whilst getting a message

Goto page Previous 1, 2

java.io.UTFDataFormatException whilst getting a message

« View previous topic :: View next topic »

Author

Message

Vitor

Posted: Thu Sep 07, 2006 1:14 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

AH - idiot this end was looking in the 6.0 book, which has been rewritten somewhat. Got them now.

I think the best thing here is for us to agree that's you're right and I should just sit in a corner for a while writing out "I must not comment on Java matters" 100 times. Maybe that field IS the CCSID of the MQMD - don't see how but what do I know about Java? Really?

As a wise man says - "Grand Master just means posts too much". In this case once too often. Apologies.

_________________
Honesty is the best policy.
Insanity is the best defence.

deepu4u

Posted: Thu Sep 07, 2006 1:17 am Post subject:

Apprentice

Joined: 20 Jun 2005
Posts: 37

Sorry, If I hurt ur feelin.
Well, I'm workin over 5.3 version.
CCSID is a variable in MQEnvironment class.

Vitor

Posted: Thu Sep 07, 2006 1:20 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

No, no, my bad. Shouldn't talk on subjects I know nothing about! Certainly shouldn't have assumed you were on 6.0!!
_________________
Honesty is the best policy.
Insanity is the best defence.

fjb_saper

Posted: Thu Sep 07, 2006 2:06 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

deepu4u wrote:

Sorry, If I hurt ur feelin.
Well, I'm workin over 5.3 version.
CCSID is a variable in MQEnvironment class.

Well CCSID comes up a little bit everywhere depending on what effect you are going for.

Behavior of MQ is ruled by 3 variables in the MQMD part: Format, ccsid, encoding.

Encoding determines the way numeric information is stored (IEEE, big endian, little endian, etc...)

CCSID and Format rule text behavior.

CCSID on the qmgr object as specified in the MQEnvironment is to tell the QMGR the CCSID of the client. This will enable the qmgr to make the automatic translation for a msg with format MQSTR from the MQMD CCSID to the client's CCSID.

If you leave the CCSID value at 0 for the client , the qmgr you are connected to will assume you have it's own CCSID as in "dis qmgr ccsid"

Enjoy

_________________
MQ & Broker admin

deepu4u

Posted: Thu Sep 07, 2006 8:43 pm Post subject:

Apprentice

Joined: 20 Jun 2005
Posts: 37

Hi...
Please let me know if I got it wrong..

Let say
Client Enc = c
QM Enc = q
Message Enc = m

So whenever client put a message over a queue then QM would translate character field in the MQMD header from c to q. Keeping the payload unchanged ie. keep the character payload in m.

When client get a message from a queue then QM will translate the MQMD field of message to client encoding ie from q to c.

Why would QM would translate the message payload when MQMD has the encoding of payload. Any application which want to use this message can see the encoding from MQMD.

Quote:

CCSID on the qmgr object as specified in the MQEnvironment is to tell the QMGR the CCSID of the client. This will enable the qmgr to make the automatic translation for a msg with format MQSTR from the MQMD CCSID to the client's CCSID.

fjb_saper

Posted: Fri Sep 08, 2006 2:37 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

deepu4u wrote:

I understand you are using encoding as in xml encoding =. This translates into CCSID in MQ. MQ Encoding is a different animal and applies only to numeric representation and not to character sets.

fjb_saper wrote:

Because that is part of the MQ functionality if you request it (convert option on MQGET). Not everybody has a character set translator...
If you happen to read the JMS spec (or broker documentation) it says for soap over JMS that the JMS "encoding" here CCSID takes precedence over whatever is put into the soap/xml header.

Enjoy

_________________
MQ & Broker admin

simon.starkie

Posted: Wed Sep 20, 2006 8:50 am Post subject:

Disciple

Joined: 24 Mar 2002
Posts: 180

I got one of these also. My code was doing a "validate without schema" (i.e. external schema name is blank).

java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence.
at org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.kp.nps.common.xmlutils.XMLParserImpl.validateXml(XMLParserImpl.java:157)
at org.kp.nps.common.xmlutils.XMLParserImpl.validateNoNamespaceWithInputFile(XMLParserImpl.java:253)

Last edited by simon.starkie on Sat Sep 23, 2006 9:07 am; edited 4 times in total

simon.starkie

Posted: Sat Sep 23, 2006 9:09 am Post subject: Yes, there was definitly a non-UTF-8 compliant byte!

Disciple

Joined: 24 Mar 2002
Posts: 180

Well, the XML document definitly contains bad data.
Using XVI32, I saw a x'C2' in the middle of one of the spasRqrComment.commentText nodes.

Removal of the offending X'C2' byte from the XML message solved the
java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence
problem.

A x'C2' byte is the character "B" in EBCDIC on the mainframe. Since the data originated on a mainframe this seems likely that this particular byte was not converted somewhere along the line before it was stored in the Oracle staging table that the middleware layer (my code) extracts from.
And X'C2' is apparently allowed in a Oracle column defined as VARCHAR2, so my SELECT statement receives the String containing the X'C2' byte which is a Java VO. My code then transforms the VO to an XML document which I then validate with Xerces but the X'C2' byte raises the UTFDataFormatException.

So the short term solution was to remove the bad X'C2' byte from the Oracle table and re-run the middleware extract (actually, in this case, editting the XML to remove the X'C2' from the XML document and placing it back on the Queue with RFHUTIL was quicker).

The longer term solution for me will be to:
1. Try to persuade the owners of the upstream (from me) application to validate their commentString (varchar) field for valid characters before storing it in Oracle. In this case, the field involved was just freeform user defined actuarial comments, so there is no apparent need for non-UTF-8 compliant special characters such as umlauts, etc. A simple check for A-Z, a-z and 0-9 should be sufficient.
2. Enhance my middleware J2EE code to provide more information about exactly which field in the XML message fails durng validation. This will help identify exactly which field in these large XML documents are involved during problem determination, should there be any future re-occurrences of this type of problem.
3. Enhance my middleware error management system so it can tolerate message with invalid content without breaking the parser. This will probably involve wrapping the original message with CDATA tags. This change can be implemented via a new error management system JAR which all of the middleware apps use for exception processing. This should avoid System Exceptions in the error management layer which currently percolate back to the middleware application layer causing more serious problems such as the MDB Listener gets stopped as required when System Exceptions are thrown.

fjb_saper

Posted: Sat Sep 23, 2006 10:30 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Well I wonder why that is....
Is Oracle sending UTF-8 data but not setting the CCSID of the message to 1208 ?
Is Oracle pretending to send UTF-8 data but has non UTF-8 stuff embedded in it?

Are you not requesting the data with CCSID 1208?

Please tell us what is going on.

Thanks.
F.J.
_________________
MQ & Broker admin

Display posts from previous:

Goto page Previous 1, 2

Page 2 of 2

MQSeries.net Forum Index » IBM MQ Java / JMS » java.io.UTFDataFormatException whilst getting a message

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP