Author |
Message
|
hughson |
Posted: Sun Apr 19, 2020 4:42 pm Post subject: |
|
|
Padawan
Joined: 09 May 2013 Posts: 1948 Location: Bay of Plenty, New Zealand
|
Received the following image and text in an email from the OP.
zrux wrote: |
Please find the picture. I have edited your diagram and tried to make it look like the topology I had in mind.
The basic issue I am trying to address at this design stage is
How can the external QMs which is not part of the cluster send to the GW QMs in DC1 and DC2, taking into account that if connectivity to DC1 or its infrastructure is down the messages should be able to go to DC2.
The earlier design you sent assumes the GW QM in DC1 is always up and running. I understand that GW QM in DC1 can be made as MQ MI/ under VCS cluster, but doing that we cannot assume that the GWQM in DC1 can always be reachable by external QM as there might be cases that DC1 is not reachable due to network issues, NFS failure on DC1 etc.
I think the SDRs to the external QMs from the DC1/DC2 need to be uniquely named, to rule out SDRs going out of Sync.
The RCVRs at the DC1/DC2 needs to be named same so that the remote Q on EXT QM can use the same SDR channel to DC1, DC2 to send the messages.
With scripts in DC1/DC2 which resets the seq numbers if it see an error on the error logs and as per the expected number on the error logs. |
Have asked for confirmation about which queue managers have the same name, as this is not shown on the picture, but no response as yet.
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
|
fjb_saper |
Posted: Mon Apr 20, 2020 6:56 am Post subject: |
|
|
Grand High Poobah
Joined: 18 Nov 2003 Posts: 20729 Location: LI,NY
|
This is all very distressing.
Reading the OP you'd think that both DC's participate in a single cluster.
However this is not at all reflected in the picture.
It looks here as if both DC's are completely independent...
Count me confused... _________________ MQ & Broker admin |
|
Back to top |
|
|
hughson |
Posted: Mon Apr 20, 2020 5:36 pm Post subject: |
|
|
Padawan
Joined: 09 May 2013 Posts: 1948 Location: Bay of Plenty, New Zealand
|
Got an answer to my email question about same named queue managers.
zrux wrote: |
Initially my thinking was the GWQM(1/2) can have the same name. But the cluster wont like that.
So its been renamed as GWQM1, GWQM2. I am open to suggestions on this. |
I am also confused by this since the diagram doesn't have GWQM1 and GWQM2 in the same cluster.
However, it is very good practice to avoid designing something with the same named queue managers, whether they are in the same cluster or not, so I am glad to hear that this is now the case.
@zrux, if you see this post, please can you help us to understand the two different clusters. Your DC1 and DC2 on your diagram. Are they all one cluster, or two different clusters?
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
|
zrux |
Posted: Tue Apr 21, 2020 12:18 am Post subject: |
|
|
Apprentice
Joined: 21 May 2006 Posts: 37 Location: UK
|
the QMs in DC1 and DC2 are part of the same cluster. |
|
Back to top |
|
|
pcelari |
Posted: Wed Apr 22, 2020 6:39 am Post subject: |
|
|
Chevalier
Joined: 31 Mar 2006 Posts: 411 Location: New York
|
can we assume your DC1 and DC2 are geographically so far apart, (DC1 in NY, DC2 in LA) that a multi-instance QM as GWQM is not feasible due to latency?
If it is the case, the discussion needs to continue, I'm sure you're not the first one encountering this need...
If it's NOT the case, you can readily put a F5 GTM in front of the two DC'2 for the external parties to retrieve the current active IP based on which one has the active GWQM, provided multi-instance is an option for you. Or you can keep a copy of the GWQM synchronized using a metro-cluster solution at the O/S and network level, without using multi-instance.
It's an interesting discussion... _________________ pcelari
-----------------------------------------
- a master of always being a newbie |
|
Back to top |
|
|
Vitor |
Posted: Wed Apr 22, 2020 7:28 am Post subject: |
|
|
Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
pcelari wrote: |
can we assume your DC1 and DC2 are geographically so far apart, (DC1 in NY, DC2 in LA) that a multi-instance QM as GWQM is not feasible due to latency?
If it is the case, the discussion needs to continue, I'm sure you're not the first one encountering this need... |
...this is exactly the situation I find myself in and my earlier post refers. We have one (1) gateway queue manager with one (1) external facing IP address that runs in any of our data centers but only ever in one at any given time. So in the event of a failure it looks from the outside like our gateway queue manager went down and, a short time later, came back. From the inside it's a completely different instance of the same queue manager running somewhere different. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
|
zrux |
Posted: Wed Apr 22, 2020 9:29 am Post subject: |
|
|
Apprentice
Joined: 21 May 2006 Posts: 37 Location: UK
|
@pcelari - The DCs are 1000s of miles apart. And hence cannot be under MQ MI.
I will need to explore "metro-cluster solution at the O/S". If you got any more info on this please share. Also, has anyone successfully implemented this ?
-------------
@Vitor - Like I said earlier, the setup I am preparing, needs to take into account -> External QM may not be able to connect to DC1 GWQM as there might be cases that DC1 is not reachable due to network issues, NFS failure on DC1 GWQM MQMI etc.
The External QM should be able to connect to DC2, if the network link/ HA GW MQ on DC1 is not available.
------------- |
|
Back to top |
|
|
Vitor |
Posted: Wed Apr 22, 2020 10:03 am Post subject: |
|
|
Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
zrux wrote: |
@Vitor - Like I said earlier, the setup I am preparing, needs to take into account -> External QM may not be able to connect to DC1 GWQM as there might be cases that DC1 is not reachable due to network issues, NFS failure on DC1 GWQM MQMI etc.
The External QM should be able to connect to DC2, if the network link/ HA GW MQ on DC1 is not available.
------------- |
I get all of that. That's the assumption in my set up.
So I have a queue manager called GWQM which all my external clients connect to at 256.300.218.1 and it runs in DCA in New York. It's doesn't run in DC1 (because that's your data center ) nor does it run in DCB (Austin), DCC (San Francisco) or DCE (Toronto).
One day Godzila comes up out of the Hudson and stomps DCA flat, doing some damage to the rest of New York to cover up the fact he's working for a rival financial institution bent on sabotage. The data center is a total loss, no connectivity, nothing. The external client channels go into retry.
A brief period of time later, GWQM starts up again and starts receiving traffic on 256.300.218.1. Our clients relax and start sending us stuff. They don't know that the queue manager is now running in DCB with the IP address redirected and don't care. If any compute capacity survives in DCA and we can reconnect to it once the cable's fixed then we can start sending cluster traffic from DCB to DCA in the same way we used to send it from DCA to DCB.
If Godzila takes a road trip and starts smashing up Austin, then we move the IP address again and use the queue manager in one of the survivng data centers including (in extremis) a queue manager running in Azure or AWS.
But there's still only one queue manager called GWQM in our topology. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
|
zrux |
Posted: Wed Apr 22, 2020 12:02 pm Post subject: |
|
|
Apprentice
Joined: 21 May 2006 Posts: 37 Location: UK
|
@Vitor - So with your setup have you got
Is it ->
1) GWQM on DCA and GWQM on DCB with the QMs names being the same(unlikely as cluster doesn't like same named QM)
or
2) GWQM on DCA and GWQMx on DCB with the QMs names being different but when external QMs try to connect have a QM alias created on DCB/DCA?
or
3) you are moving the GWQM QMs files to DCB in the event of a failure on DCA
or
4) Something else (not sure what...)
If you are doing 2) and QM at DCB is a different instance all together and being resolved by an alias are you resetting the SDR/RCVR channel sequence number when a failover occurs from DCA to DCB?
Or is there any other way you are managing the failovers. |
|
Back to top |
|
|
Vitor |
Posted: Wed Apr 22, 2020 12:30 pm Post subject: |
|
|
Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
zrux wrote: |
3) you are moving the GWQM QMs files to DCB in the event of a failure on DCA |
Though we're not moving them in the event of a failure; we're replicating them in case there's a failure. One data center is the designated fail over site and gets the files in real time, the rest are replicated as and when (typically less than 60 seconds later).
This isn't an MQ thing, this is a "what do we do if we lose all or part of DCA" thing. As a practice we try and distribute everything over all the data centers but there's a few things for which this is not feasible (the gateway queue manager being a good example).
Note that you don't mention changing the external IP address so it resolves to DCB not DCA. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
|
pcelari |
Posted: Thu Apr 23, 2020 6:33 am Post subject: |
|
|
Chevalier
Joined: 31 Mar 2006 Posts: 411 Location: New York
|
Vitor wrote: |
... we're replicating them in case there's a failure. One data center is the designated fail over site and gets the files in real time, the rest are replicated as and when (typically less than 60 seconds later). |
@Vitor, This is a brilliant design. It seems your 'fail-over' site is a relay point for real-time data synchronization - a hub for replicating to other sites in the event the primary site goes down. The other sites stay synchronized with a slight, acceptable delay. Presumably your data fail-over site must be somehow in the middle geographically.
But given the physical distance, a single IP point of entry will likely result in significant latency for clients in the far end of the continent. How do you remedy that?
@zrux, the Metro-Cluster data synchronization will not work for you as your DCs are too far apart. it works only for DCs within a "metropolitan area", say 50 miles apart. _________________ pcelari
-----------------------------------------
- a master of always being a newbie |
|
Back to top |
|
|
Vitor |
Posted: Thu Apr 23, 2020 6:58 am Post subject: |
|
|
Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
pcelari wrote: |
But given the physical distance, a single IP point of entry will likely result in significant latency for clients in the far end of the continent. How do you remedy that? |
Dedicated fiber and telling the clients to suck it up.
Seriously, we have the best backbone money can lease, and we do packet-shaping at the network level to give priority to external clients with low SLA.
But (and don't tell anyone on this forum) this is one reason we're moving away from MQ as a transport protocol and into REST. It gives a lot more flexibility in routing clients to geographically adjacent data centers and external clients with low SLA / poor tolerance for latency are being encouraged to move away from MQ so they can use a more local DC.
A good whack of our available bandwidth is used (and reserved) for replication as we chase the Holy Grail of multiple synchronized data centers over which the entire workload can be evenly distributed. We also have a contact admin ton of automation and monitoring controlling the replication and fail over of components, most of which works most of the time but we still make sacrifices to the gods of business continuity during the quarterly fail over testing.
It's also worth pointing out that we are a North American institution with some Canada & a spot of Mexico so we're not trying to do this globally, which limits the possible physical distance to "gosh" rather than "yikes". _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
|
zrux |
Posted: Wed Jun 17, 2020 7:57 am Post subject: |
|
|
Apprentice
Joined: 21 May 2006 Posts: 37 Location: UK
|
any tried RDQM available on v9 onwards instead of copying the files for the Gateway QM to the 2nd site? |
|
Back to top |
|
|
exerk |
Posted: Wed Jun 17, 2020 8:45 am Post subject: |
|
|
Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
zrux wrote: |
any tried RDQM available on v9 onwards instead of copying the files for the Gateway QM to the 2nd site? |
Will the link between the DCs meet the minimum latency requirement for RDQM?
The KC states:
RDQM HA
Quote: |
If you do choose to locate the nodes in different data centers, then be aware of the following limitations:
* Performance degrades rapidly with increasing latency between data centers. Although IBM will support a latency of up to 5 ms, you might find that your application performance cannot tolerate more than 1 to 2 ms of latency. |
RDQM DR
Quote: |
You should be aware of the following limitations:
* Performance degrades rapidly with increasing latency between data centers. IBM will support a latency of up to 5 ms for synchronous replication and 50 ms for asynchronous replication. |
_________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
|
|