ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Large volumes in Transmission Queue

Post new topic  Reply to topic Goto page Previous  1, 2
 Large volumes in Transmission Queue « View previous topic :: View next topic » 
Author Message
fjb_saper
PostPosted: Wed Feb 04, 2009 7:38 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

Vitor wrote:
PeterPotkay wrote:
So you are saying that when the new flows are active, dequeue rates across multiple queues on the QM, even unrelated queues, drops? And when the flows are stopped, the rates return to normal?


After an apparently random period of normality, yes. It's a subtle effect as we generate far more audit than we do useful messages, but that seems to be the case. Once dequeue rate falls on this queue (easily noticed by the rapidly increasing depth), dequeue rates drop across the queue manager.


I'd say symptom of a rapidly filling destination queue. There are some parameters that can be set on the channel to minimize the time between and number of retries and make the messages go to the DLQ faster.

Check the DLQ on the destination system for messages with reason 2053. What queue are they for? The audit queue?
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
exerk
PostPosted: Wed Feb 04, 2009 7:41 am    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

fjb_saper wrote:
...I'd say symptom of a rapidly filling destination queue...


Would that have a 'global' effect on other queues in the queue manager?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Feb 04, 2009 7:43 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

fjb_saper wrote:
Check the DLQ on the destination system for messages with reason 2053. What queue are they for? The audit queue?


DLQ on the destination system is empty, and the audit database (the final resting place for these messages) shows some updates for the times in question.

I repeat, the channel remains running throughout.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Feb 04, 2009 7:46 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

exerk wrote:
fjb_saper wrote:
...I'd say symptom of a rapidly filling destination queue...


Would that have a 'global' effect on other queues in the queue manager?

Imagine you have a flood that needs to go through a funnel.
The funnel examines each message and directs it to it's slot.
Now for 2/3 of the messages coming through the funnel you need to look at, try, pause for 10 seconds try again repeat 10 times and put to the DLQ.

What do you think that will do to your throughput on the channel.??
All destinations on the remote qmgr will be affected by the one destination.
You can alleviate that some by sending the operational messages at a higher priority than the audit messages. What you really need is to scale the app reading the audit messages off the queue, and get a bigger queue depth to accommodate for spikes
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Wed Feb 04, 2009 7:51 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7723

Vitor wrote:
PeterPotkay wrote:
So you are saying that when the new flows are active, dequeue rates across multiple queues on the QM, even unrelated queues, drops? And when the flows are stopped, the rates return to normal?


After an apparently random period of normality, yes. It's a subtle effect as we generate far more audit than we do useful messages, but that seems to be the case. Once dequeue rate falls on this queue (easily noticed by the rapidly increasing depth), dequeue rates drop across the queue manager.

Could it be possible that after a period of time the flows have put so much under syncpoint and not committed that you get into a QM rolling back log scenario?

Or after a period of time the new flows decide to do something that is supper I/O or CPU intensive, starving the server of resources?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Feb 04, 2009 7:54 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

PeterPotkay wrote:
Could it be possible that after a period of time the flows have put so much under syncpoint and not committed that you get into a QM rolling back log scenario?

Or after a period of time the new flows decide to do something that is supper I/O or CPU intensive, starving the server of resources?

Uncommitted messages does not fit the scenario. They should not be able to remove them using qload. The scenario fits rather a queue full on the remote qmgr.

Server starved of resources is more interesting to pursue...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Wed Feb 04, 2009 7:57 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7723

fjb_saper wrote:
PeterPotkay wrote:
Could it be possible that after a period of time the flows have put so much under syncpoint and not committed that you get into a QM rolling back log scenario?

Or after a period of time the new flows decide to do something that is supper I/O or CPU intensive, starving the server of resources?

Uncommitted messages does not fit the scenario. They should not be able to remove them using qload. The scenario fits rather a queue full on the remote qmgr.


Q Full on a remotye q would drop the dequeu rate on the 1 transmission q, not across the board for multiple queues.

This partucular XMITQ doesn't have to be the one that is filling up the logs with uncommitted messages. Having said that, I would think this type of problem would have shown up in the QM logs, and Vitor says there is nothing odd there, so I guess this aint it.

What if a bad message gets into the flow and the flow starts looping, using all the CPU or I/O? No mention in this thread yet of these stats while the problem is happening.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Feb 04, 2009 8:04 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

PeterPotkay wrote:
What if a bad message gets into the flow and the flow starts looping, using all the CPU or I/O? No mention in this thread yet of these stats while the problem is happening.


We have not ruled out one of the new flows having a bad loop in it, but can't find anything as yet. It's theoretically possible that one is receiving a reply which is making it repeat the question, but that's not easy to determine.

Another thing hard to determine is the utilisation of the box at the times of problem. The best numbers I have are that CPU at server level is around 70% at the time, and I/O does not vary much from the "normal" levels.

I'm trying to obtain something a bit more scientific.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Feb 05, 2009 6:23 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

As most experienced hands on the forum could have predicted, there was not a single cause of this, but the outcome (for the record) was thus:

1) Audit records were changed to be written to a local queue and moved on; this helped but not very much.
2) The 7 audit points in the flow were reduced to a more manageable number, reducing the overall number of messages to deal with.
3) A bug was identified while looking into the transactional / non-transactional question where a flow wrote out a request message and used a MQGet node to read the reply. Regretably the MQOutput was in the same UOW as the flow, so there was never a reply because the request was never committed. Hence the flow sat for 15 seconds waiting for the get to expire, then went through some complicated failure processing. High numbers of this causes the execution group to lock resources, run out of threads and all sorts of bad things, leading to high resource usage in the server.
4) The practice of having a single EG holding every single production flow has been called into question

I thank all concerned for their valuable input.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Thu Feb 05, 2009 6:28 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Vitor wrote:
3) A bug was identified while looking into the transactional / non-transactional question where a flow wrote out a request message and used a MQGet node to read the reply. Regretably the MQOutput was in the same UOW as the flow, so there was never a reply because the request was never committed. Hence the flow sat for 15 seconds waiting for the get to expire, then went through some complicated failure processing. High numbers of this causes the execution group to lock resources, run out of threads and all sorts of bad things, leading to high resource usage in the server.

This would also lead to reserved space on the MQ transaction logs during the timeout, which when there was enough of it would cause the generation of additional secondary logs if possible - and if using circular logs could cause any transaction the queue manager is participating in to grind to a halt.
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Feb 05, 2009 6:31 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

mqjeff wrote:
and if using circular logs could cause any transaction the queue manager is participating in to grind to a halt.


Leading to the poor dequeue performance we were seeing.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Thu Feb 05, 2009 6:34 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Vitor wrote:
mqjeff wrote:
and if using circular logs could cause any transaction the queue manager is participating in to grind to a halt.


Leading to the poor dequeue performance we were seeing.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Feb 05, 2009 7:01 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7723

mqjeff wrote:
Vitor wrote:
3) A bug was identified while looking into the transactional / non-transactional question where a flow wrote out a request message and used a MQGet node to read the reply. Regretably the MQOutput was in the same UOW as the flow, so there was never a reply because the request was never committed. Hence the flow sat for 15 seconds waiting for the get to expire, then went through some complicated failure processing. High numbers of this causes the execution group to lock resources, run out of threads and all sorts of bad things, leading to high resource usage in the server.

This would also lead to reserved space on the MQ transaction logs during the timeout, which when there was enough of it would cause the generation of additional secondary logs if possible - and if using circular logs could cause any transaction the queue manager is participating in to grind to a halt.

Wouldn't there be corresponding errors in the QM Error logs?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Feb 05, 2009 7:07 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7723

Vitor wrote:
4) The practice of having a single EG holding every single production flow has been called into question


Without any other criteria to go by, shoot for 1 EG for every CPU core your server has, and divy up the flows between them as best you can. I also dedicate one of my EGs for any flows that deal with Batch jobs. That way when a flood of transactions come thru, driving that EG to 100% CPU, that EG is only driving one of the CPU cores to 100%, leaving the other cores to service the other EGs that are doing more timely non batch work.

Or, as you have painfully seen, if one EG is housing a bad flow that uses a lot resources, hopefully the other EGs will have access to the other CPUs and not be impacted.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Vitor
PostPosted: Thu Feb 05, 2009 7:12 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

PeterPotkay wrote:
Or, as you have painfully seen, if one EG is housing a bad flow that uses a lot resources, hopefully the other EGs will have access to the other CPUs and not be impacted.


The bittersweet part of this is watching the great and the good asking who decided to lump all these flows into the default EG, and getting responses ranging from "it's always been like that" to "I think it was <insert name of long departed employee> who decided that", with all shades in between.

How many times in the average organisation to you find design decisions which have not been made but arrived at through inertia?

(This question is intended to be rhetorical. If you actually wish to discuss it, please start a new thread!!! )
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2 Page 2 of 2

MQSeries.net Forum Index » General IBM MQ Support » Large volumes in Transmission Queue
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.