MQSeries.net :: View topic - Japanese characters problem with fixed length message set

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Japanese characters problem with fixed length message set

Goto page Previous 1, 2, 3 Next

Japanese characters problem with fixed length message set

« View previous topic :: View next topic »

Author

Message

er_pankajgupta84

Posted: Tue Feb 09, 2010 9:25 pm Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

I agree with you. But I cannot restricted my message set on Characters for length as Source (MAINFRAME) is fixed by bytes. So mainframe will always send me a fixed set of bytes. For example: 100 bytes...it may have 100 characters or 80 characters or any number.

if i receive 100 bytes from mainframe and one character in it is 2 byte long i.e. 99 characters...then..when i add one byte (say space) to it then it works fine. But I want to measure length in bytes.

The question over here is: Can we specify lengths in bytes in Regular exp used in defining data pattern. If so the that may solve the problem.

fjb_saper

Posted: Tue Feb 09, 2010 9:31 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

Seems what you have is a freeform text field with a limited number of bytes allocatable, and probably space filled.

This should be relatively easy, as long as your field does not bust the constraints...

If you are sending to the mainframe, just make sure the number of bytes does not bust the max for the field... Best would probably be to set the CCSID to the one the MF is reading it with... and no it's not going to be 37...

The big question is what do you do if the data is larger than the field??

Have fun

_________________
MQ & Broker admin

kimbert

Posted: Wed Feb 10, 2010 1:46 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

I think i missed this stuff.

My message set definition consists of RECORDS which further have fields. Each RECORD is fixed length and I have specified the length of each field in the field itself.

Thanks - I was going to ask you for the details about your message format and how you modelled it.

Quote:

I have used "Use Pattern" as Data Element Separator for each RECORD. As each RECORD is a fixed length record so in pattern I have given the number of characters to be read.

That's a strange way to model this data. Why not use Data Element Separation="Fixed Length"?

Quote:

I think this is forcing broker to fail for length validation even if I specify "Bytes" as length Unit on the fields.

Maybe...
You ( and Vitor ) should be aware that the maxLength constraint always and only applies to the parsed characters. The length units property always and only applies to the extraction of the raw data.

Quote:

I cannot change the Data Element Separator of my records as there is no other available.

There are many other Data Element Separation types...but I'm sure you know that. Would you care to explain what you *really* meant

Quote:

I only specified the length on the field itself. Its a CSV message set. Along with that I do have a MAX length constraint which defines it to be 20 but there is no MIN length constraint.

Please quote the full text of the 'validation' error that you are getting. If it's a maxLength facet, then you should be OK. You can get the full text from a user trace, or from Windows Event Viewer.

er_pankajgupta84

Posted: Wed Feb 10, 2010 6:49 am Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

I use "Use Data Pattern" as Data element separator (DES) at record level because of following reasons:

1. Each record is fixed length but have multiple occurrences.
2. There is no tag present at the beginning of record.
3. Each record can be identified by its 5th to 10th character with 1-5 character can be any digit.

That's why i use data pattern like [0-9]{5}00010.{220}

Within the record I have use "Fixed length" as DES to identify fields.

Quote:

You ( and Vitor ) should be aware that the maxLength constraint always and only applies to the parsed characters. The length units property always and only applies to the extraction of the raw data.

Does this mean that the length will come in picture while parsing only and max length will come in picture when I put validation to "content and Value" or "Content" in the RCD node.

I am getting the error - "Not all buffer was used" and user traces also gives the same error.

er_pankajgupta84

Posted: Wed Feb 10, 2010 9:15 am Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

I found this on : http://www.regular-expressions.info/unicode.html

All Unicode regex engines discussed in this tutorial treat any single Unicode code point as a single character. When this tutorial tells you that the dot matches any single character, this translates into Unicode parlance as "the dot matches any single Unicode code point". In Unicode, Ã can be encoded as two code points: U+0061 (a) followed by U+0300 (grave accent). In this situation, . applied to Ã will match a without the accent. ^.$ will fail to match, since the string consists of two code points. ^..$ matches Ã .

This will make us to think how Broker's Regular expression matches a dot (.)
Because if a DOT matches a unicode point then my regular expression [0-9]{5}00010.{220} should work with bytes not with characters.

I did another POC just to test the working of "Length Unit" and it went fine. CSV message definition will evaluate based on bytes if Length Unit is specified as Bytes.[/quote]

kimbert

Posted: Wed Feb 10, 2010 2:25 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

2. There is no tag present at the beginning of record.
3. Each record can be identified by its 5th to 10th character with 1-5 character can be any digit.

That's why i use data pattern like [0-9]{5}00010.{220}

Fair enough - Use Data Pattern is the correct choice in those circumstances.

Quote:

Does this mean that the length will come in picture while parsing only and max length will come in picture when I put validation to "content and Value" or "Content" in the RCD node.

Correct. Parsing and validation are completely separate in message broker.

Quote:

I am getting the error - "Not all buffer was used" and user traces also gives the same error.

I hate to say it, but it would have saved a lot of time if you had mentioned that in your first post. That error is nothing to do with maxLength validation. The parser is complaining that it has finished parsing, but it has not used up all of the input bitstream.

Quote:

This will make us to think how Broker's Regular expression matches a dot (.)
Because if a DOT matches a unicode point then my regular expression [0-9]{5}00010.{220} should work with bytes not with characters.

All regular expressions work on *characters*. Never on bytes. A code point is not a byte - check this: http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf#G2212

The next step is for you to take a debug-level user trace. It should contain a record of everything that the TDS parser did with your message. By looking at that, you should be able to work out what has gone wrong.

er_pankajgupta84

Posted: Wed Feb 10, 2010 4:15 pm Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

Thanks Kimbert,

Your comments gave me the logical reasons of all the exception I am getting.

I will update you with the trace as I do not have access to run a trace. Btw I know the reasons for all the exceptions that is happening.

But here is some more detail about this problem.

1. The data that mainframe is sending has some japanese characters.
2. Those japanese characters are encoded with ccsid 1399. So when mainframe is sending the data its in 1399 (Which is Japanese Ebcdic).
3. I receive a fixed length (in terms of bytes) in broker.
4. I am converting the message on MQInput node.

So far all good.

Now the problematic part:

I cannot keep length fixed by characters in broker as mainframe is sending the data that is fixed by bytes i.e. by using "Bytes" as Length Unit.

Consider a field defined as 20 bytes in Broker message set. We receive 20 bytes for it from mainframe. Those have japanese characters as well. Now when I convert those 20 bytes from 1399 to 1208 the number of bytes i receive are different depending upon the data.

For eg: some japanese characters that are represented in 2 bytes in 1399 may take 2 bytes with different HEX code in 1208 and some 2 byte characters in 1399 take 3 bytes in 1208. This will cause a variable length in terms of bytes also.

So I cannot keep keep my message set even fixed by bytes.

So this problem has become worse. We don't see any way to keep it fixed length. I think we have to go with variable length.

There is one option that we are going to try with Fixed length i.e. doing the conversion in Mainframe from 1399 to 1208 and then making it fixed by bytes in length. But that is again just a ray of hope.

Any suggestions??

smdavies99

Posted: Thu Feb 11, 2010 12:33 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

er_pankajgupta84 wrote:

Thanks Kimbert,

4. I am converting the message on MQInput node.

If you were to read some of the other posts of this type here, you will find that one of the piseces of advice given is NOT to convert in the MQInput Node.

Have you tried this?
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

er_pankajgupta84

Posted: Thu Feb 11, 2010 7:08 am Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

That's one of the option I have tried.
But why not to convert on MQINput node type. What I know is that If broker is suppose to do the conversion then it can be done either at the Mqinput node or on the QM channels. Doing conversion on the channel would effect all the messages which is not idol in many situation.

Other option is to do the conversion on the Source side. If we have to do so then are losing on Message broker utility.

Can some body comment on this.

Anyway this is not relevant to my problem that i looking to solve here.

kimbert

Posted: Thu Feb 11, 2010 7:30 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

Any suggestions?

Not sure that I can add anything. I think you understand exactly why the problem is occurring now. What more can we do?

fjb_saper

Posted: Thu Feb 11, 2010 7:00 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

er_pankajgupta84 wrote:

Setting the convert flag on the MQInput nodes tells the qmgr to convert to its CCSID before passing the data to the broker. In your case this is counter productive because I doubt very much that the converted data still fits your cobol copy book... The main reason being the conversion from fixed byte length to variable byte length and positional parsing.

So you do no conversion at all. The broker will get the CCSID for parsing from the InputRoot.properties.CodedCharSetId.

Have fun

_________________
MQ & Broker admin

er_pankajgupta84

Posted: Fri Feb 12, 2010 7:03 am Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

Either I am not able to understand what you want to say or you might be on some different track.

Quote:

Setting the convert flag on the MQInput nodes tells the qmgr to convert to its CCSID before passing the data to the broker.

Why will it convert to QMgr CCSID. We can always specify the to which CCSID you want to convert the data. The thing to note here is that the underlying data must specify its original CCSID correctly. This is working fine in our case as well.

Quote:

In your case this is counter productive because I doubt very much that the converted data still fits your cobol copy book... The main reason being the conversion from fixed byte length to variable byte length and positional parsing.

When you do conversion then number of bytes in new character set may increase or decrease. What's why people recommend to parse by character instead of bytes and that's the default behavior of broker as well. But we can make broker to parse by byte as well.

Now come to my problem:

Data that is coming from mainframe is neither fixed by byte nor character when it has non-ASCII data. They can only send data fixed by byte in there ccsid. foreg either 37 or 1399.

As a solution to this we are trying following things on the mainframe side:

1. Try to make message fixed by character. This would be the optimal solution and can work with any ccsid.
2. If first is not possible then try to get data fixed by bytes in ccsid -1208 from main frame.
3. if first two are not possible then process the data in ccsid -1399 in broker and convert the final output to 1208. Final output is XML so there is no length constraint
4. If all above is not possible then use delimiter instead of fixed length record.

kimbert

Posted: Fri Feb 12, 2010 8:10 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

3. is the simplest and most natural solution.
Why bother converting on the mainframe ( before you send ) or in the broker's QM ( on the other end of the channel ) when broker is easily able to parse the message in its original code page.
As you rightly say, broker can set the code page of the output to anything.

This is exactly what mesage broker is designed for

er_pankajgupta84

Posted: Fri Feb 12, 2010 12:22 pm Post subject:

Master

Joined: 14 Nov 2008
Posts: 203
Location: charlotte,NC, USA

Well there are other problems related to 1399.

1. We are Auditing the data i.e. storing the incoming and outgoing payload in readable format in DB. This is a common service and I won't work with data in 1399.

2. Similarly for exception logging we are storing the failed payload in db in readable format. Data in 1399 would cause problem for that utility too.

kimbert

Posted: Fri Feb 12, 2010 2:11 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Auditing and exception logging are *outputs* from your flow. Use Unicode for your outputs. If you want to store the exact original bit stream, store it as a BLOB ( and store the code page with it, so that you can re-parse it )

Display posts from previous:

Goto page Previous 1, 2, 3 Next

Page 2 of 3

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Japanese characters problem with fixed length message set

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP