| Author | Message | 
		
		  | Avocado's number | 
			  
				|  Posted: Fri Apr 12, 2013 4:00 am    Post subject: Issue in parsing characters such as °, ³ (superscript) |   |  | 
		
		  | Newbie
 
 
 Joined: 12 Apr 2013Posts: 4
 
 
 | 
			  
				| I am using a CSV file as an input to a FileInput node in my flow. The records present in the file are inserted into the database (DB2 9.7). 
 One of the records present in the CSV carries an entry °C. Initially, I used the MRM parser at the FileInput node and the record inserted into the database looked like -C without the °. Later, I switched to the DFDL parser for the same file and used a FileOutput node to verify the output. This time, the output generated was °C, however, an insert into the database changed to C.
 Is there a way to represent special characters such as ° or ³, as is, in the database via a message flow?
 Thanks for your help!
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | McueMart | 
			  
				|  Posted: Fri Apr 12, 2013 4:28 am    Post subject: |   |  | 
		
		  |  Chevalier
 
 
 Joined: 29 Nov 2011Posts: 490
 Location: UK...somewhere
 
 | 
			  
				| Sounds like your database can't deal with the  ° character to me! What Character set do you have set for the field which you are storing the data in? |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | fjb_saper | 
			  
				|  Posted: Fri Apr 12, 2013 4:30 am    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 18 Nov 2003Posts: 20767
 Location: LI,NY
 
 | 
			  
				| you'd have to store the message in UTF-8 to be sure that all special characters are conserved... 
 Have fun
  _________________
 MQ & Broker admin
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | rekarm01 | 
			  
				|  Posted: Sun Apr 14, 2013 11:26 am    Post subject: Re: Issue in parsing characters such as °, ³ (superscript) |   |  | 
		
		  | Grand Master
 
 
 Joined: 25 Jun 2008Posts: 1415
 
 
 | 
			  
				| 
   
	| Avocado's number wrote: |  
	| Is there a way to represent special characters such as ° or ³, as is, in the database via a message flow? |  That depends on the DB2 code set used when creating the database, and the database column type.  Check the DB2 documentation for more information.
 
 
 
   
	| fjb_saper wrote: |  
	| you'd have to store the message in UTF-8 to be sure that all special characters are conserved |  It's true that UTF-8 will preserve non-ASCII characters, but DB2 string functions use byte indexing, not character indexing, which can affect position and length calculations in certain circumstances.  The InfoCenter has more information, here and here.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | fjb_saper | 
			  
				|  Posted: Sun Apr 14, 2013 8:56 pm    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 18 Nov 2003Posts: 20767
 Location: LI,NY
 
 | 
			  
				| I was being purposefully being generic there. In the particular case of DB2 you can store the text as BLOB in UTF-8. This way no risk to do a positional byte instead of character as you'd have to import to WMB before going positional on it (DFDL?) 
 Have fun
  _________________
 MQ & Broker admin
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | Avocado's number | 
			  
				|  Posted: Sun Apr 14, 2013 10:25 pm    Post subject: |   |  | 
		
		  | Newbie
 
 
 Joined: 12 Apr 2013Posts: 4
 
 
 | 
			  
				| The database was using a code page of 1252. This has been changed to 1208 and now the representation of °C has become ┬░C. Also, the data type of the column in DB2 is CHARACTER. Any pointers, please? |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | smdavies99 | 
			  
				|  Posted: Sun Apr 14, 2013 11:31 pm    Post subject: |   |  | 
		
		  |  Jedi Council
 
 
 Joined: 10 Feb 2003Posts: 6076
 Location: Somewhere over the Rainbow this side of Never-never land.
 
 | 
			  
				| Did you delete the row and re-insert it with the new DB CCSID? Just changing the DB CCSID won't fix the problem. (AFAIK)
 _________________
 WMQ User since 1999
 MQSI/WBI/WMB/'Thingy' User since 2002
 Linux user since 1995
 
 Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | Avocado's number | 
			  
				|  Posted: Mon Apr 15, 2013 1:34 am    Post subject: |   |  | 
		
		  | Newbie
 
 
 Joined: 12 Apr 2013Posts: 4
 
 
 | 
			  
				| I dropped and re-created the database to change the codepage, then created the tables and tried the insert via my flow. 
 Here is a summary of what I have tried so far-
 1. Parsed the input file correctly against the DFDL test client parser.
 2. Attached trace node after the FileInput node- Does not show the °C character correctly.
 3. Flow wired to a FileOutput node- The output file has the correct character for °C.
 4. Flow wired to a Compute node for database insert- Does not show the °C character correctly.
 5. Changed the database codepage to 1208- Does not show the °C character correctly.
 6. Changed ODBC settings to configure 'ConnectCodepage' for DSN to use 1208- Does not show the °C character correctly.
 7. A direct insert of values into DB2 from control centre- Value is represented correctly in database.
 8. Hard- coded '°C' against the column value in insert statement in ESQL- Value is represented correctly in database.
 9. Passed '°C' in a character variable to the insert statement in ESQL- Value is represented correctly in database.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | kimbert | 
			  
				|  Posted: Mon Apr 15, 2013 2:20 am    Post subject: |   |  | 
		
		  |  Jedi Council
 
 
 Joined: 29 Jul 2003Posts: 5543
 Location: Southampton
 
 | 
			  
				| 
  There's your problem. It looks as if the DFDL parser has been told to use the wrong text encoding when parsing this field. 
	| Quote: |  
	| 1. Parsed the input file correctly against the DFDL test client parser. 2. Attached trace node after the FileInput node- Does not show the °C character correctly.
 |  That's one theory, anyway. A debug-level user trace should reveal what the DFDL parser is doing.
 The most important question at this point is: what is the text encoding of the input file?
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | Avocado's number | 
			  
				|  Posted: Mon Apr 15, 2013 3:58 am    Post subject: |   |  | 
		
		  | Newbie
 
 
 Joined: 12 Apr 2013Posts: 4
 
 
 | 
			  
				| The text encoding for the input file is UTF-8. However, the text encoding for this particular field was set to <dynamically set>. I changed this to UTF-8 and I am able to receive a correct representation of all special characters now (both in the flow and in the database). Thanks much for your inputs.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | kimbert | 
			  
				|  Posted: Mon Apr 15, 2013 4:55 am    Post subject: |   |  | 
		
		  |  Jedi Council
 
 
 Joined: 29 Jul 2003Posts: 5543
 Location: Southampton
 
 | 
			  
				| I'm glad you got it working. 
 'Dynamically set' causes the DFDL property 'encoding' to be set to '$dfdl:encoding'. That is a reference to a predefined variable. In WMB, the predefined variable $dfdl:encoding is set to the 'transport encoding'. In MQ that means the CCSID from the MQMD header. For a FileInput node it is probably the platform default encoding.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  |  |