Author |
Message
|
zpat |
Posted: Tue Dec 22, 2015 11:06 pm Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5867 Location: UK
|
maurito wrote: |
zpat wrote: |
When I say character, I mean each byte.
I assume the OP knows what he is looking for? |
so let's suppose the input contains the euro currency sign, in UTF-8 = x'e282ac' , what are you proposing to do by looking at the bytes individually ?
x'e2' = a with a circumflex accent in ascii. |
Me?, I am not proposing to do anything.
But don't be pendantic - when I say at the character level - of course you can look for a string of 3 bytes. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
maurito |
Posted: Wed Dec 23, 2015 1:25 am Post subject: |
|
|
Partisan
Joined: 17 Apr 2014 Posts: 358
|
zpat wrote: |
maurito wrote: |
zpat wrote: |
When I say character, I mean each byte.
I assume the OP knows what he is looking for? |
so let's suppose the input contains the euro currency sign, in UTF-8 = x'e282ac' , what are you proposing to do by looking at the bytes individually ?
x'e2' = a with a circumflex accent in ascii. |
Me?, I am not proposing to do anything.
But don't be pendantic - when I say at the character level - of course you can look for a string of 3 bytes. |
I am not being pedantic. I just want you to decide which :
when you say character, you mean byte then you want to look at a string of 3 bytes ( maybe a character ? ).
Why a string of 3 bytes ? why not 2 bytes if the input is in UTF-16 ? How would you know ? why not a single byte ? |
|
Back to top |
|
 |
zpat |
Posted: Wed Dec 23, 2015 5:02 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5867 Location: UK
|
UTF multi-byte characters can be detected by their bit pattern. That's how they are detected by any other software!!!!
Stop asking me what the OP wants to do. I am just pointing out that ESQL can handle any string manipulation desired. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
maurito |
Posted: Wed Dec 23, 2015 5:23 am Post subject: |
|
|
Partisan
Joined: 17 Apr 2014 Posts: 358
|
zpat wrote: |
UTF multi-byte characters can be detected by their bit pattern. That's how they are detected by any other software!!!!
Stop asking me what the OP wants to do. I am just pointing out that ESQL can handle any string manipulation desired. |
The only thing I am asking you is to learn the difference between character and byte as you seem to use the two terms in a very loose manner, and as if they were interchangeable. |
|
Back to top |
|
 |
zpat |
Posted: Wed Dec 23, 2015 6:15 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5867 Location: UK
|
If you ask for the 3rd character from a string - you will get the 3rd byte in most programming languages.
The terms are often used interchangeably, but when talking in pedantic terms you are right. _________________ Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Dec 23, 2015 6:29 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
zpat wrote: |
If you ask for the 3rd character from a string - you will get the 3rd byte in most programming languages.
The terms are often used interchangeably, but when talking in pedantic terms you are right. |
If you're asking for the 3rd character from a normal string, yes.
If you're asking for the 3rd character from a Unicode string, you shouldn't get the 3rd byte... Which I think is maurito's point -that if you access the 3rd byte of a unicode, you probably aren't getting a character. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed Dec 23, 2015 11:36 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
The original question has got me thinking.
As someone who has to deal with Arabic on a daily basis I got to wonder if there was a way to detect the precence of Arabic in a message easily.
So if we assume that the original message was delivered in a CCSID that can handle both English and Arabic then:-
1) Create an exception Handler
Then
2) Parse the message using a CCSID that can't handle anything but English, sat ISO-8851-1 or ISO-8851-15
3) If the message parses correctly then no further action is needed because there are no Arabic/chinese/korean/Thai/etc characters in it.
4) If the parse fails then look at individual fields for arabic characters.
this should eliminate looking at every element in the message for every message received
Just my 2p/2c worth. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
|