ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Controlling QMGR failover during network outage

Post new topic  Reply to topic
 Controlling QMGR failover during network outage « View previous topic :: View next topic » 
Author Message
Abhi
PostPosted: Sun Apr 25, 2021 5:42 pm    Post subject: Controlling QMGR failover during network outage Reply with quote

Novice

Joined: 10 Mar 2011
Posts: 19

Hi,

We do have HA qmgr setup where MQ data folders are mounted to remote nfs shares. During network outages (even when it's for less than 5 seconds) i.e when we don't have access to the nfs share our qmgrs would failover when connection comes back. The failover is not expected since during outages neither Active nor passive host would have access to remote nfs share. While Analysing logs I fail to understand below points and looking for any help explaining:

Quote:
The failover behaviour is random i.e not all qmgrs failover and not the same queue manager failover every time.

Quote:
When connection recovers post these outages the queue manager first recover as normal i.e active comes back as active and standby comes back as standby. Seconds later a FFST file is generated (detais below) on the active node and a minute later qmgr fails over.

Quote:
The point when nfs access is gone and qmgr tries to recover, is there a policy which controls which node will get the qmgr lock when connection comes back or is it random?

Quote:
What happens if the outage is more than the time specified for FileLockHeartBeatLen?

Quote:
Can this behaviour be controlled using any scripts, like stop standby instances during outages making it non HA and then start back stand by instance when things get back normal?


FFST Details:
Code:

Probe Id          :- HL206037                                               |
Application Name  :- MQM                                                    |
Component         :- mqloWritevFile
Effective UserID  :- 1500 (mqm)                                             |
Real UserID       :- 1500 (mqm)                                             |
Program Name      :- amqzmuc0                                               |
Arguments         :- -m QMGR                                              |
Addressing mode   :- 64-bit
Major Errorcode   :- xecF_E_UNEXPECTED_RC                                   |
Minor Errorcode   :- hrcE_MQLO_DERR                                         |
Probe Type        :- MSGAMQ6118                                             |
Probe Severity    :- 1                                                      |
Probe Description :- AMQ6118S: An internal IBM MQ error has occurred        |
(20806826) 


IBM MQ Version: 9.1.3.0
Platform: Rhel

Regards,
Abhi
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sun Apr 25, 2021 6:55 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9405
Location: US: west coast, almost. Otherwise, enroute.

Next time, please post more of the errors logged.

Was a FDC created for this event?

Did you research error message AMQ6118S? Did you open a PMR with IBM?

Other than posting here, what did you do?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Andyh
PostPosted: Tue Apr 27, 2021 12:43 pm    Post subject: Reply with quote

Master

Joined: 29 Jul 2010
Posts: 237

mqloWritevFile is the MQ function that writes to the recovery log.
hrcE_MQLO_DERR is the error reported when an EIO error is returned to this function.
The QMgr doesn't attempt to handle this error and will end abruptly after receiving such an error, after an EIO error the QMgr doesn't know if some of the requested write data made it to disk or not. The speed with which the QMgr terminates abruptly following such an error is much improved in MQ 9.2.1.

You might like to check the NFS mount options for the file systems hosting the MQ data are correct, for example using a hard mount.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Controlling QMGR failover during network outage
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.