MQSeries.net :: View topic - How to solve message retry delays impacting others

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » How to solve message retry delays impacting others

Goto page 1, 2 Next

How to solve message retry delays impacting others

« View previous topic :: View next topic »

Author

Message

zpat

Posted: Sun Sep 27, 2020 12:31 am Post subject: How to solve message retry delays impacting others

Jedi Council

Joined: 19 May 2001
Posts: 5866
Location: UK

If you have a channel from one QM to another (whether standard or cluster sender). it's possible to have a full queue at the destination cause the channel to enter message retry.

The default (before the DLQ is used) is 10 retries at 1 second intervals. Causing a 10 second delay before the message goes to the DLQ, this is then repeated for every message intended for the full queue.

It seems that during this retry period - nothing else will use the channel so that messages intended for other queues that are not full - are also delayed behind the ones in retry.

For high volume, low latency message based applications this is a very serious issue as having thousands of messages with 10 second retries essentially blocks the channel for long periods, even for the apps whose queues have plenty of space.

I can't see an obvious solution other than turning off message retries entirely but am interested in what other people do to avoid this issue.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

bruce2359

Posted: Sun Sep 27, 2020 4:12 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Yes, turn off retries.

One option: Let the messages destined for the full queue go directly to the DLQ, and let the dead-letter queue handler deal with them out-of-band.

Another option: To handle (avert) queue full conditions: Enable and monitor depth events. When queue depth reaches 80%, increase (alter) maxdepth by 25%.

If the underlying problem is insufficient number of concurrent consumers, here's an opportunity for TRIGTYPE(EVERY).
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

gbaddeley

Posted: Sun Sep 27, 2020 3:11 pm Post subject:

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

Do everything possible to avoid having a queue full situation. Set the max depth to a very high value. Monitor the queue depth. Monitor the app that is supposed to be consuming the msgs.
_________________
Glenn

PeterPotkay

Posted: Sun Sep 27, 2020 5:26 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7722

Turn off Message Retry. How many times does it actually accomplish its mission? Almost always the reason for the failed PUTs is not going to resolve itself in a few seconds, so why waste time retrying. Sure, once in a blue moon it will pull it off delivering messages that would have went to a dead letter queue. But its not worth the risk to impact other messages on a shared channel.

One place I absolutely use Message Retry is on my Edge Queue Managers that have channels from other companies coming to us. On those dedicated channels between our company and one other company if there is any funny business occurring causing my RCVR to send to my DLQ, I want Message Retry kicking in to throttle what's coming across. If my RCVR is sending to the DLQ something is seriously wrong, maybe even something malicious. I want that RCVR slowing waaaaay down so we have time to react to the alert for the messages arriving on the DLQ.
_________________
Peter Potkay
Keep Calm and MQ On

zpat

Posted: Mon Sep 28, 2020 1:30 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5866
Location: UK

The destination queue in this case is a z/OS shared queue using a Coupling Facility structure.

These are memory based and inherently limited in size. We actually hit the CF space full before the max queue depth.

We have QDEPTHHI event alerts at 50%, but it filled very quickly.

So I am going to get the CF made bigger but it's the co-lateral impact on other applications (using the same channel) that caused most grief.

I agree that message retry rarely has any value, it's one of those MQ defaults that probably belongs in a museum now. (There is a nice museum in Hursley!)
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

bruce2359

Posted: Mon Sep 28, 2020 3:19 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

So, the consuming app (apps) can't keep up with message arrival rate. Why? What is the bottleneck?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

zpat

Posted: Mon Sep 28, 2020 3:47 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5866
Location: UK

Against my advice, it was not set up with HA or at least some kind of automated restart.

They relied on manual alerting which failed. However this is not really the issue since there could be planned downtime.

My concern is avoiding impact on unrelated (and often more important) applications when some lesser application fills up their queue.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

bruce2359

Posted: Mon Sep 28, 2020 5:15 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

Messages of lesser importance should not be sent across the same channel as the important messages.

CF storage is real, not virtual, and therefore limited real estate. Not likely the CF admins will provision much more (way more) structure storage. Does an SMDS data set back up the offending queue?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

zpat

Posted: Mon Sep 28, 2020 6:13 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5866
Location: UK

Non-persistent messages don't go to the SMDS. The queue that fills is not that important, it's the other applications that matter more.

Easy to say "don't use the same channel". Hard to achieve without creating a new cluster.

Even then there will never be one cluster per queue so co-lateral damage is still possible.

Message priority is another option I am considering.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

bruce2359

Posted: Mon Sep 28, 2020 9:08 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

zpat wrote:

Message priority is another option I am considering.

Where exactly? Sending side XMITQ? Destination queue?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

zpat

Posted: Mon Sep 28, 2020 11:07 am Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5866
Location: UK

MQMD.Priority is set by the original MQPUT so it would be the sending side application (possibly using an attribute on a queue alias) to try set a higher priority for more critical applications (or vice versa).
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

bruce2359

Posted: Mon Sep 28, 2020 11:36 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9475
Location: US: west coast, almost. Otherwise, enroute.

And how will this prevent the queue-full condition.

What about additional consumers?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

zpat

Posted: Mon Sep 28, 2020 1:01 pm Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5866
Location: UK

It won't prevent queue full. But it will mean higher priority messages will be sent in preference over the channel (if the SCTQ is priority sequenced) so after one lower priority message retry - it would send all the higher priority messages before attempting to send another lower priority message.

We can never guarentee the queue won't get full and I can't make them run multiple consumers if they refuse to do so. But I can find a way to stop them impacting more important applications.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

gbaddeley

Posted: Mon Sep 28, 2020 5:56 pm Post subject:

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

FWIW, our DR planning looks at queue depth and storage usage that could accumulate during the expected time for DR restoration. We set maxdepth and storage allocation to cope with this situation.

Short outages or higher than normal peaks during normal operation do not even approach the settings we have for DR, so we very rarely see queue full conditions.

There is an argument that max depth, max message length and storage allocation should not stand in the way of normal or abnormal app message processing, when there is no good reason to do so. We had a couple of instances where app msgs crept up over 4M length and broke several interfaces, due to arbitrary max msg length set by MQ admins years before on queues and channels. We made an executive decision to set everything to 100MB.
_________________
Glenn

zpat

Posted: Mon Sep 28, 2020 11:05 pm Post subject:

Jedi Council

Joined: 19 May 2001
Posts: 5866
Location: UK

Getting off topic but setting channel maxlength to 100 MB can seriously consume CHIN storage on z/OS.

As this queue of ours is QSG shared, the CF (real) storage has to be available and that's quite expensive compared to standard disk (which I would agree is always worth over allocating than having to deal with queue full conditions).

Most of our critical queues are on z/OS and MQ on z/OS is many times more difficult and inconvenient to administer than distributed MQ as you have to worry about CF size, page set size, SMDS size, buffer pool size, CHIN region size and all the other joys of "Ye Olde" MVS.
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » General IBM MQ Support » How to solve message retry delays impacting others

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP