Author |
Message
|
Jeff.VT |
Posted: Fri Aug 03, 2018 10:18 am Post subject: AMQSCLM for Remote Queues? |
|
|
Acolyte
Joined: 02 Mar 2017 Posts: 71
|
I came across AMQSCLM and it looks great... but my message consumers aren't applications, they're other 3rd party queue managers that I don't have access to.
For high-availability we have multiple connections to these 3rd parties. And sometimes due to various reasons, one connection will work and another one wont.
Currently we manually re-route these queues to bypass the down connections.
I found AMQSCLM, and it seems to do exactly what I need. Except it is only for local queues. Is there anyway I could have say 3 clustered remote queues and have something similar to AMQSCLM where if the queue starts backing up because the sender channel isn't connected, that it would automatically route messages to one of the other 2 clustered remote queues?
In short - is there an AMQSCLM for remote queues? |
|
Back to top |
|
|
Jeff.VT |
Posted: Fri Aug 03, 2018 11:40 am Post subject: |
|
|
Acolyte
Joined: 02 Mar 2017 Posts: 71
|
|
Back to top |
|
|
Vitor |
Posted: Fri Aug 03, 2018 11:48 am Post subject: |
|
|
Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
So moved. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
|
PeterPotkay |
Posted: Mon Aug 06, 2018 4:52 pm Post subject: Re: AMQSCLM for Remote Queues? |
|
|
Poobah
Joined: 15 May 2001 Posts: 7717
|
Jeff.VT wrote: |
I came across AMQSCLM and it looks great... but my message consumers aren't applications, they're other 3rd party queue managers that I don't have access to.
For high-availability we have multiple connections to these 3rd parties. And sometimes due to various reasons, one connection will work and another one wont.
Currently we manually re-route these queues to bypass the down connections.
I found AMQSCLM, and it seems to do exactly what I need. Except it is only for local queues. Is there anyway I could have say 3 clustered remote queues and have something similar to AMQSCLM where if the queue starts backing up because the sender channel isn't connected, that it would automatically route messages to one of the other 2 clustered remote queues?
In short - is there an AMQSCLM for remote queues? |
MQ Clustering out of the box should do this for you.
Unless the message is specifically addressed to the instance of the queue on the QM that can't be reached. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
|
Jeff.VT |
Posted: Wed Aug 08, 2018 10:48 am Post subject: |
|
|
Acolyte
Joined: 02 Mar 2017 Posts: 71
|
I might have not explained it well.
I don't control all of the queue managers in the spiderweb.
Here's an image that explains what I'm asking about.
So to get around this, I was setting up a Clustered Remote Queue that points to the 3rd party queue on my gateways.
Let's say I'd like App Istanbul to go out through Shanghai now that the link to the 3rd party is down in Gateway Istanbul.
From my testing, the cluster does not do this automatically.
What the cluster DOES do automatically, is if one of the light blue lines - the channels or connections inside my cluster goes down, it will automatically find the next priority queue and use that instead.
But unless I somehow added the 3rd party queue manager itself to my cluster, The cluster doesn't check to see if the target is up or down. Only the initial hop.
Or am I missing something about clusters?
------------
My solution was to have something watch the 3rd party link, and if the channel starts to error or the messages pile up in the transmit queue, to set the CLWL priority lower by 5. - Then restore it when the connection has been restored.
-----------
I will further note, that the queue at the 3rd party, the queue manager, and the channel might not be identically named across all 3 gateways. But they still generally don't care from which country I send my data - they'll route it just fine either way. It's just faster & cheaper for us to send as locally as we can. |
|
Back to top |
|
|
gbaddeley |
Posted: Wed Aug 08, 2018 5:56 pm Post subject: |
|
|
Jedi Knight
Joined: 25 Mar 2003 Posts: 2527 Location: Melbourne, Australia
|
Design is OK, 3rd parties are not in MQ cluster.
Quote: |
My solution was to have something watch the 3rd party link, and if the channel starts to error or the messages pile up in the transmit queue, to set the CLWL priority lower by 5. - Then restore it when the connection has been restored. |
Looks like while Istanbul gateway can't push thru msgs via its SDR channel to 3rd party, you want msgs to be routed via the other gateways.
Do you have identically named Clustered Remote Queue object(s) on multiple gateway qmgrs, that are used for routing out to the 3rd party? If yes, you can temporarily PUT DISABLE the affected object(s) on Istanbul gateway qmgr. The MQ cluster will then use the other gateways to push new msgs thru.
Note that there could be msgs marooned on the SDR channel's transmission queue until the SDR can go into Running status. _________________ Glenn |
|
Back to top |
|
|
PeterPotkay |
Posted: Wed Aug 08, 2018 8:01 pm Post subject: |
|
|
Poobah
Joined: 15 May 2001 Posts: 7717
|
The picture is helpful, thanks.
This is a tricky one.
gbaddeley's ideas would work, but as he states, you will have the risk of one or more marooned messages in the down SNDR channel path that will require intervention no matter how fast your "trick" kicks in to make the alternate path priority.
I guess you could take the source code for AMQSCLM and tune it for your situation. It would need additional help to stop that retrying SNDR channel and get enable the XMITQ queue to allow it to be able to reroute.
Ugh. No easy pretty solutions that I can come up with.
I don't think adding the single 3rd party box into your cluster would solve anything anyway. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
|
Jeff.VT |
Posted: Thu Aug 09, 2018 7:54 am Post subject: |
|
|
Acolyte
Joined: 02 Mar 2017 Posts: 71
|
I just thought if there was an out-of-the-box sample application for local queues, there might be something similar for remote queues that I was missing.
If you guys care, what I was going to do...
Each gateway will have 3 clustered remote queues
FROM.MIDDLEEAST.TO.3RDPT
FROM.ASIAPACIFIC.TO.3RDPT
FROM.NORTHAMER.TO.3RDPT
Istanbul , MiddleEast will be CLWL priority 9
Istanbul , AsiaPacific will be CLWL priority 8
Istanbul , North Amer will be CLWL priority 7
Shanhai , MiddleEast will be CLWL priority 8
etc... so the priority is to send locally, then further away you get the less likely it will be to send.
Disabling PUT is an option, but since it's equally likely that the TirdParty queue manager connection will be down for all 3 regions at the same time, and if that's the case, I'd rather messages queue up locally...
So I was going to make my own app similar to AMQSCLM running on each Gateway. And when any channel goes down, it checks for clustered remote queues utilizing that channel. And if it's down for longer than <x> time, it will simply modify CLWL priority to <value>-5...
So if Istanbul > 3rd party is down, its CLWL will change from 9 to 4.
This means if all 3 are down, all of the CLWL will change on all of them, and each region will continue to send to their local region. But if one stays up, it will be higher priority and go through there.
When the connection is restored, the CLWL is restored to the 'correct' value, and messages are immediately re-routed.
I do sort of wish I had more than 0-9 in that value - but I only currently have 3 global gateways - and I doubt that'll be changing anytime soon.
Thanks for the help & input ^_^ |
|
Back to top |
|
|
PeterPotkay |
Posted: Thu Aug 09, 2018 8:57 am Post subject: |
|
|
Poobah
Joined: 15 May 2001 Posts: 7717
|
I think your solution will work for new messages.
How often will you be checking the status? Although any interval greater than 0.0000000 seconds means there is that much time for a message to get sent down a path that is not up.
How will you handle messages put since the last time you checked that got routed down a path that is currently down? They will be on the XMITQ that is get inhibited and/or opened exclusively by the retrying sender channel to the 3rd Party. Or they will be in the channel's batch. Its solvable, just need to consider and code for it. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
|
Jeff.VT |
Posted: Thu Aug 09, 2018 10:31 am Post subject: |
|
|
Acolyte
Joined: 02 Mar 2017 Posts: 71
|
PeterPotkay wrote: |
I think your solution will work for new messages.
How often will you be checking the status? Although any interval greater than 0.0000000 seconds means there is that much time for a message to get sent down a path that is not up.
How will you handle messages put since the last time you checked that got routed down a path that is currently down? They will be on the XMITQ that is get inhibited and/or opened exclusively by the retrying sender channel to the 3rd Party. Or they will be in the channel's batch. Its solvable, just need to consider and code for it. |
At the moment, we essentially wait for us to make a determination and then manually re-route it. So really any rules I set up would be better than that.
I'm certainly no Roger, and TBH I don't trust my development team at this point to not break things. So I'll probably make something rather crude.
I was considering checking every minute and requiring the channel to be down for more than 5 minutes for it to auto-re-route.
It's not meant to fix blips as much as it's meant to route around issues that don't seem to be resolving themselves without having to involve as much human interaction.
Now that I'm thinking about it... I'll probably need a configuration that each of those values is changeable per link type - some might want to re-route far more quickly than others.
Also, some way to determine if this is a primary outlet (other than just istanbul & istanbul - since sometimes we don't have 3 links to a 3rd party, just 1 or 2). Then adjust CLWL for non-primary links before primary ones... say after just 60 seconds of downtime.
That way if there's a lag time between checks, things aren't re-routed around a global outage if it doesn't need to be. hmm.
Maybe something as easy as - 5 minutes for CLWL 9's, 3 minutes for CLWL 8's and 1 minute cor CLWL 7's... |
|
Back to top |
|
|
gbaddeley |
Posted: Thu Aug 09, 2018 6:16 pm Post subject: |
|
|
Jedi Knight
Joined: 25 Mar 2003 Posts: 2527 Location: Melbourne, Australia
|
Whatever checking and altering work-around that you implement, it will still not guarantee that all msgs will go thru in an expected timeframe. It may break in unexpected ways. Solutions like this need to be solidly tested to ensure that you have a full understanding of its behavior in all the channel failure / retry scenarios aross all the gateways. _________________ Glenn |
|
Back to top |
|
|
Jeff.VT |
Posted: Fri Oct 05, 2018 8:30 am Post subject: |
|
|
Acolyte
Joined: 02 Mar 2017 Posts: 71
|
So I made a little script that does this.
I might be the only person in IBMland who has MQ on Windows and actually uses the MQ Powershell pack - but when you get past the few bugs here and there, it does make things pretty easy.
What I ended up with:
Get all the Clustered Alias Queues on a specified queue manager. Determine the channel they point to.
If the channel has been running for >5 minutes and the priority isn't set to the 'high' priority for that queue type, set the priority to the high priority.
If the channel isn't running and has 0 'short retries' remaining, and the priority isn't set to the 'low' priority for that queue type, set it to the low priority.
If the queue has 'DNP' (Do not prioritize) in the description, always have it set to the low priority.
-----------
If anybody would like to see the script, ask.
But I'd guess the venn diagram of people that use MQ Powershell pack and people who are on MQSeries.net is probably 1... me... |
|
Back to top |
|
|
gbaddeley |
Posted: Sun Oct 07, 2018 2:49 pm Post subject: |
|
|
Jedi Knight
Joined: 25 Mar 2003 Posts: 2527 Location: Melbourne, Australia
|
Do you also have monitoring / alerting in place, in case the powershell script fails / stops running / takes inappropriate action?
Note the SupportPac MO74 disclaimer "Category 2 SupportPacs are provided in good faith and AS-IS. There is no warranty or further service implied or committed and any supplied sample code is not supported via IBM product service channels. " _________________ Glenn |
|
Back to top |
|
|
Jeff.VT |
Posted: Mon Oct 08, 2018 8:39 am Post subject: |
|
|
Acolyte
Joined: 02 Mar 2017 Posts: 71
|
Yes, but two general points... I set it up so it's being controlled by MS Failover Cluster. So if it fails, MSFC should restart it and send a SCOM alert that it failed.
But in general, even if it DOES fail, it's not a huge problem at the moment. We're not relying on this re-routing, it's more of a nice-to-have. I don't have any contracts or anything saying I have to keep connectivity alive through any means. They are contracted to go out through "Istanbul"... if that connection fails, It's not required I re-route it to a comparable connection in Shanghai. We just do to be nice... and because I hate seeing errors.
Currently I'm also using the Powershell Support Pack to archive old messages from queues to flat-file logs each night. Most of these messages are sort of known junk messages or things we couldn't route... and we have logs of them in our message moving applications already.
That's been running for several months without any issues so far. |
|
Back to top |
|
|
|