| Author | 
		  Message
		 | 
		
		  | Testo | 
		  
		    
			  
				 Posted: Tue Feb 01, 2005 12:13 pm    Post subject: SOLVED - XML message: UTF-8 and invalid character | 
				     | 
			   
			 
		   | 
		
		
		    Centurion
 
 Joined: 26 Feb 2003 Posts: 120 Location: Italy - Milan 
  | 
		  
		    
			  
				I'm receiving a large XML file (2.3 MB) and I have to wrap it in a soap envelope to be passed to a .NET web service.
 
 
The message I am receiving is defined as UTF-8 but it has inside some UTF-16 characters within it.
 
 
Now, .NET environment does not mind so much about that but the WBIMB CSD 4 parser does... so, I'm wondering what is the best approach now:
 
 
- transform it in a UTF-16 message with the XML Transformation node? This would probably double its size...
 
 
- parse it as a BLOB, then navigate a while within it with SUBSTRING (I just need to look for a certain tag to get the an ID to be passed to the web service together with the whole message)?
 
 
- escape it? I would not know how actually...
 
 
Any architecturale hint would be more than appreciated.
 
 
Thanks in advance,
 
Andrea Tedone
 
IBM IT Specialist
  Last edited by Testo on Wed Feb 02, 2005 8:06 am; edited 1 time in total | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | JLRowe | 
		  
		    
			  
				 Posted: Tue Feb 01, 2005 2:44 pm    Post subject:  | 
				     | 
			   
			 
		   | 
		
		
		    Yatiri
 
 Joined: 25 May 2002 Posts: 664 Location: South East London 
  | 
		  
		    
			  
				| Please detail the problem the parser is having with UTF-16 documents, how is the encoding specified in the XML declaration? | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | Testo | 
		  
		    
			  
				 Posted: Tue Feb 01, 2005 2:49 pm    Post subject: details | 
				     | 
			   
			 
		   | 
		
		
		    Centurion
 
 Joined: 26 Feb 2003 Posts: 120 Location: Italy - Milan 
  | 
		  
		    
			  
				Hi timna.
 
 
The broker is not able to parse the XML message because it has an invalid character (i.e. '£') within it. The encoding, in the XML decl., is defined as UTF-8 despite the '£' is a character valid in the UTF-16 encoding domain. For this reason, in fact, if you save the XML tree in an XML file and open it for instance with Internet Explorer, this character is represented with a little box, because the web browser is not able to interpret it.
 
 
Cheers,
 
Andrea | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | JLRowe | 
		  
		    
			  
				 Posted: Wed Feb 02, 2005 4:31 am    Post subject:  | 
				     | 
			   
			 
		   | 
		
		
		    Yatiri
 
 Joined: 25 May 2002 Posts: 664 Location: South East London 
  | 
		  
		    
			  
				| Is the declaration right or wrong then? If it says UTF-8 and there is UTF-16 encoding within the document then perhaps it is wrong. | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | Testo | 
		  
		    
			  
				 Posted: Wed Feb 02, 2005 7:16 am    Post subject: Additional info | 
				     | 
			   
			 
		   | 
		
		
		    Centurion
 
 Joined: 26 Feb 2003 Posts: 120 Location: Italy - Milan 
  | 
		  
		    
			  
				Some additional info.
 
 
The invalid character in the XML message seems to be caused by the WBIMB.
 
 
I will explain better the scenario: message flow A calls a webservice .NET receiving a 2.3 MB response in UTF-8 encoding and with only valid characters.
 
 
Then this message is put on a queue, taken from message flow B that takes this input, wraps it in a SOAP envelope and then calls another .NET webservice.
 
 
Ok, the WBIMB parser, while moving this large XMLNS message from one flow to another, seems not to respect the UTF-8 encoding then modifying one single character.
 
 
Any similar experience?!
 
 
Cheers,
 
Andrea | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | jefflowrey | 
		  
		    
			  
				 Posted: Wed Feb 02, 2005 7:57 am    Post subject:  | 
				     | 
			   
			 
		   | 
		
		
		   Grand Poobah
 
 Joined: 16 Oct 2002 Posts: 19981
  
  | 
		  
		    
			  
				I would look at how message flow A is writing the data to the queue, and how message flow B is reading the data from the queue. _________________ I am *not* the model of the modern major general. | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | Testo | 
		  
		    
			  
				 Posted: Wed Feb 02, 2005 8:06 am    Post subject: SOLVED! | 
				     | 
			   
			 
		   | 
		
		
		    Centurion
 
 Joined: 26 Feb 2003 Posts: 120 Location: Italy - Milan 
  | 
		  
		    
			  
				It was simply a question of CCSID. 
 
 
As the XML response from the .NET WS is UTF-8, we forced the CCSID to 1208 instead of the default 437 of our Windows 2003 Server.
 
 
Thanks to a couple of my collegues that put me in the right way...
 
 
Cheers
 
Andrea | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | kirani | 
		  
		    
			  
				 Posted: Fri Feb 04, 2005 12:09 am    Post subject:  | 
				     | 
			   
			 
		   | 
		
		
		   Jedi Knight
 
 Joined: 05 Sep 2001 Posts: 3779 Location: Torrance, CA, USA 
  | 
		  
		    
			  
				We had similar problems when working with .NET and WBIMB. The solution was similar to yours Andrea. _________________ Kiran
 
 
 
IBM Cert. Solution Designer & System Administrator - WBIMB V5
 
IBM Cert. Solutions Expert - WMQI
 
IBM Cert. Specialist - WMQI, MQSeries
 
IBM Cert. Developer - MQSeries
 
 
 | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | Testo | 
		  
		    
			  
				 Posted: Fri Feb 04, 2005 12:23 am    Post subject: Hope | 
				     | 
			   
			 
		   | 
		
		
		    Centurion
 
 Joined: 26 Feb 2003 Posts: 120 Location: Italy - Milan 
  | 
		  
		    
			  
				Kiran, I hope you didn't spend one working day to solve it as we did!!!   
 
 
Cheers,
 
Andrea | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | kirani | 
		  
		    
			  
				 Posted: Fri Feb 04, 2005 12:25 am    Post subject:  | 
				     | 
			   
			 
		   | 
		
		
		   Jedi Knight
 
 Joined: 05 Sep 2001 Posts: 3779 Location: Torrance, CA, USA 
  | 
		  
		    
			  
				yeah, but the folks over here spent time in blaming MQ    _________________ Kiran
 
 
 
IBM Cert. Solution Designer & System Administrator - WBIMB V5
 
IBM Cert. Solutions Expert - WMQI
 
IBM Cert. Specialist - WMQI, MQSeries
 
IBM Cert. Developer - MQSeries
 
 
 | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | martinrydman | 
		  
		    
			  
				 Posted: Fri Feb 04, 2005 1:11 am    Post subject:  | 
				     | 
			   
			 
		   | 
		
		
		    Centurion
 
 Joined: 30 Jan 2004 Posts: 139 Location: Gothenburg, Sweden 
  | 
		  
		    
			  
				Hi,
 
 
I'm just glad to hear that even grand masters struggle with these darn code page issues. No matter how long I do this work, I feel like I'll never stop tripping over one CCSID issue or other
 
 
Why can't everybody use Swedish?   
 
 
/Martin | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | Testo | 
		  
		    
			  
				 Posted: Fri Feb 04, 2005 1:22 am    Post subject: Common problems over IT generations... | 
				     | 
			   
			 
		   | 
		
		
		    Centurion
 
 Joined: 26 Feb 2003 Posts: 120 Location: Italy - Milan 
  | 
		  
		    
			  
				The IBM IT Architect Carlo Randone (.NET/WS/Interoperability guru) working with me on the project says that once retired, he will write a book with the common problems, affecting over and over and over the IT population: date fields & encoding and CCSID issues.
 
 
    
 
 
Cheers,
 
Andrea | 
			   
			 
		   | 
		
		
		  | Back to top | 
		  
		  	
		   | 
		
		
		    | 
		
		
		  | 
		    
		   |