| Author | Message | 
		
		  | yaakovd | 
			  
				|  Posted: Thu Feb 11, 2010 9:16 am    Post subject: Urgent! Very big XML processing |   |  | 
		
		  | Partisan
 
 
 Joined: 20 Jan 2003Posts: 319
 Location: Israel
 
 | 
			  
				| Hi 
 I have few scenarios require processing/generation of huge XML files 300-700 MB. According to my experience even 4 MB XML requires huge memeory allocation in MB.
 
 Will appreciate best practice and patterns to handle:
 
 1. Reading huge XML (in portions?)
 2. Generation of big XML (e.g. from flat file)
 3. Sorting within XML or generated output
 
 Additional fact - client is Windows oriented and preferrably uses starter edition (limited to 2 CPU and single exeqution group).
 _________________
 Best regards.
 Yaakov
 SWG, IBM Commerce, Israel
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | Gaya3 | 
			  
				|  Posted: Thu Feb 11, 2010 9:28 am    Post subject: Re: Urgent! Very big XML processing |   |  | 
		
		  |  Jedi
 
 
 Joined: 12 Sep 2006Posts: 2493
 Location: Boston, US
 
 | 
			  
				| 
   
	| yaakovd wrote: |  
	| Hi 
 I have few scenarios require processing/generation of huge XML files 300-700 MB. According to my experience even 4 MB XML requires huge memeory allocation in MB.
 
 Will appreciate best practice and patterns to handle:
 
 1. Reading huge XML (in portions?)
 2. Generation of big XML (e.g. from flat file)
 3. Sorting within XML or generated output
 
 Additional fact - client is Windows oriented and preferrably uses starter edition (limited to 2 CPU and single exeqution group).
 |  
 XML in portions or splitting the same xml in to number of portions, but here we have to understand about the XML business Data.
 
 say if you are getting number of records in a single XML, we could think of dividing those.
 _________________
 Regards
 Gayathri
 -----------------------------------------------
 Do Something Before you Die
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | Vitor | 
			  
				|  Posted: Thu Feb 11, 2010 10:28 am    Post subject: Re: Urgent! Very big XML processing |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 11 Nov 2005Posts: 26093
 Location: Texas, USA
 
 | 
			  
				| 
   
	| yaakovd wrote: |  
	| I have few scenarios require processing/generation of huge XML files 300-700 MB. According to my experience even 4 MB XML requires huge memeory allocation in MB. |  
 This has been discussed a few times in here (The Search Facility Is Your Friend) and there's a developerworks article somewhere that talks about this.
 
 In summary, make sure you have the parsing set to on demand, don't use [index] to access the XML (which you shouldn't really be doing anyway) and prune the tree once you've processed a given section.
 
 Have fun.
  _________________
 Honesty is the best policy.
 Insanity is the best defence.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | Vitor | 
			  
				|  Posted: Thu Feb 11, 2010 10:30 am    Post subject: Re: Urgent! Very big XML processing |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 11 Nov 2005Posts: 26093
 Location: Texas, USA
 
 | 
			  
				| 
   
	| Gaya3 wrote: |  
	| say if you are getting number of records in a single XML, we could think of dividing those. |  
 You'd still need to bring the entire message in so that you could PROPOGATE the individual records. But yes, this is a good way of handling the situation if there's no affinity between XML stanzas & doesn't contradict what I said above (in this example you'd remove the given record once it was propogated).
 _________________
 Honesty is the best policy.
 Insanity is the best defence.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | kimbert | 
			  
				|  Posted: Thu Feb 11, 2010 1:45 pm    Post subject: |   |  | 
		
		  |  Jedi Council
 
 
 Joined: 29 Jul 2003Posts: 5543
 Location: Southampton
 
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | Vitor | 
			  
				|  Posted: Thu Feb 11, 2010 1:54 pm    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 11 Nov 2005Posts: 26093
 Location: Texas, USA
 
 | 
			  
				| 
   
	| kimbert wrote: |  
	| http://www-128.ibm.com/developerworks/websphere/library/techarticles/0505_storey/0505_storey.html |  
 This time I must remember to bookmark this!
  _________________
 Honesty is the best policy.
 Insanity is the best defence.
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | yaakovd | 
			  
				|  Posted: Thu Feb 11, 2010 3:50 pm    Post subject: |   |  | 
		
		  | Partisan
 
 
 Joined: 20 Jan 2003Posts: 319
 Location: Israel
 
 | 
			  
				| Hi ALL 
 thanks for replies and basics of working with mesage tree.
 It really helps with 5 MB messages.
 
 Of course I tried to find something helpfull on search.
 
 My question if anybody had experience working with 500 MB?
   
 Any idea how long it may take on 2 CPU / 8 GB WIN machine if at all...
 I can think also about SAX based input plugin...
 _________________
 Best regards.
 Yaakov
 SWG, IBM Commerce, Israel
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | fjb_saper | 
			  
				|  Posted: Thu Feb 11, 2010 7:16 pm    Post subject: |   |  | 
		
		  |  Grand High Poobah
 
 
 Joined: 18 Nov 2003Posts: 20767
 Location: LI,NY
 
 | 
			  
				| 
   
	| yaakovd wrote: |  
	| Hi ALL 
 thanks for replies and basics of working with mesage tree.
 It really helps with 5 MB messages.
 
 Of course I tried to find something helpfull on search.
 
 My question if anybody had experience working with 500 MB?
   
 Any idea how long it may take on 2 CPU / 8 GB WIN machine if at all...
 I can think also about SAX based input plugin...
 |  In my experience a 500 MB message seldom contains a single atomic transaction. Cut your message down to single atomic transaction size and put those into the input queue of the real flow...
 
 If you cannot use a file input node, do like Jeff & Vitor said
  see their link  . Parsing on demand only, use references and prune each parsed node from the tree after propagation. 
 Have fun
  _________________
 MQ & Broker admin
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | jzhang2009 | 
			  
				|  Posted: Fri Feb 12, 2010 5:05 pm    Post subject: re: large XML |   |  | 
		
		  | Newbie
 
 
 Joined: 12 Feb 2010Posts: 1
 
 
 | 
			  
				| Have you looked at vtd-xml, sounds like you definitely want to check it out? |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | Amitha | 
			  
				|  Posted: Sat Feb 13, 2010 6:01 am    Post subject: |   |  | 
		
		  |  Voyager
 
 
 Joined: 20 Nov 2009Posts: 80
 Location: Newyork
 
 | 
			  
				| VTD-XML seems to improve XML parsing performance and memory usage compared to DOM or SAX. I think WMB XMLNSC parser is very good in performance and it is a C++ engine.In my view VTD-XML Parser is something which WESB can make use of, not WMB. |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  | newtobroker | 
			  
				|  Posted: Sat Feb 13, 2010 10:23 am    Post subject: |   |  | 
		
		  | Novice
 
 
 Joined: 04 Feb 2010Posts: 23
 
 
 | 
			  
				| one option that we are trying is to dynamically delete the tags of huge xmls as we complete its processing... not sure if it applies to your business requirement. 
 Thanks,
 c*
 |  | 
		
		  | Back to top |  | 
		
		  |  | 
		
		  |  |