ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Switch QM between nominal / rescue

Post new topic  Reply to topic Goto page Previous  1, 2, 3  Next
 Switch QM between nominal / rescue « View previous topic :: View next topic » 
Author Message
hughson
PostPosted: Thu Jun 17, 2021 9:12 pm    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

Bad wrote:
"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now

What does the AMQERR01.LOG show, if anything, that the queue manager is doing during this hour?
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Bad
PostPosted: Mon Jun 21, 2021 5:30 am    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

hughson wrote:
Bad wrote:
"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now

What does the AMQERR01.LOG show, if anything, that the queue manager is doing during this hour?


hank you for your return

I have nothing in the AMQERR01 during the switchover.
However when I switch with endmqm -s QMA the QMA goes into running as standby quickly it is the QMB which takes 1 hour or more to go from the STARTING state to RUNNING with a process (amqrrmfa) at 100% during this hour
Back to top
View user's profile Send private message
bruce2359
PostPosted: Mon Jun 21, 2021 11:55 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

hughson wrote:
What does the AMQERR01.LOG show, if anything, that the queue manager is doing during this hour?


bad wrote:
I have nothing in the AMQERR01 during the switchover.


Please be a bit more precise. Do you mean that AMQERR01 is empty? Nothing whatsoever is logged after the fail?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
hughson
PostPosted: Mon Jun 21, 2021 1:31 pm    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

Bad wrote:
hughson wrote:
Bad wrote:
"Does it still take 1 hour to fail over now?" Yes it's take 1h to fail over and more now

What does the AMQERR01.LOG show, if anything, that the queue manager is doing during this hour?


hank you for your return

I have nothing in the AMQERR01 during the switchover.
However when I switch with endmqm -s QMA the QMA goes into running as standby quickly it is the QMB which takes 1 hour or more to go from the STARTING state to RUNNING with a process (amqrrmfa) at 100% during this hour

There will be various messages output to the AMQERR01.LOG in your qmgr directory as each of the processes that form the queue manager start up. Are you sure you are looking in the queue manager's error log?

amqrrmfa is the cluster repository manager. Could it be processing all the messages on the cluster command queue that you told us about.

Once it gets up and running is the cluster command queue finally empty?

If you switch over again does it take another hour?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Bad
PostPosted: Tue Jun 22, 2021 11:03 am    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

thanks for your return

"There will be various messages output to the AMQERR01.LOG in your qmgr directory as each of the processes that form the queue manager start up. Are you sure you are looking in the queue manager's error log? "

Yes we looked with the mqseries expert there is a lot of log but we did not find a severe / critical error in the file

amqrrmfa is the cluster repository manager. Could it be processing all the messages on the cluster command queue that you told us about.

"Once it gets up and running is the cluster command queue finally empty"?

No, the SYSTEM.CLUSTER.COMMAND.QUEUE is not empty for example we made a switch last Tuesday and we are currently at 206,966 messages it sometimes takes a week or more after a switch to empty (on some switch we went up to more of 1 million messages)


"If you switch over again does it take another hour?"

Yes it takes another hours
Back to top
View user's profile Send private message
Bad
PostPosted: Tue Jun 22, 2021 11:07 am    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

gbaddeley wrote:
Can you bring your MQ expert into this chat?


Sorry i cant
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Tue Jun 22, 2021 2:37 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

Bad wrote:
gbaddeley wrote:
Can you bring your MQ expert into this chat?


Sorry i cant

Are you sure you don't have a poison message in the cluster command queue?
You did not specify the size of your cluster, but it seems to me that you have a huge number of messages there...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
hughson
PostPosted: Tue Jun 22, 2021 3:00 pm    Post subject: Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

Bad wrote:
"Once it gets up and running is the cluster command queue finally empty"?

No, the SYSTEM.CLUSTER.COMMAND.QUEUE is not empty for example we made a switch last Tuesday and we are currently at 206,966 messages it sometimes takes a week or more after a switch to empty (on some switch we went up to more of 1 million messages)


The issue with your cluster command queue would appear to be causing the issue with the startup time.

Your comment implies that the cluster command queue only gets lots of messages on it when you switch? Are you saying that the switch causes the large numbers of messages? Or do you have large numbers of messages on your cluster command queue regardless of when you switch?

Why do you have so many messages on your cluster command queue? Have you issued a refresh cluster or something like that? Perhaps more than once?

Is the depth increasing or decreasing? Normally the cluster command queue depth should be tending to zero. Having a million messages on there is not the normal expectation.

Can you tell us more about your cluster? How many queue managers? How many queues?

Do you have an idea of where all the messages on the cluster command queue are coming from? Are they all from one queue manager for example?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
Bad
PostPosted: Sun Jun 27, 2021 6:32 am    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

thanks for your return
hughson wrote:
Bad wrote:
"Once it gets up and running is the cluster command queue finally empty"?

No, the SYSTEM.CLUSTER.COMMAND.QUEUE is not empty for example we made a switch last Tuesday and we are currently at 206,966 messages it sometimes takes a week or more after a switch to empty (on some switch we went up to more of 1 million messages)


The issue with your cluster command queue would appear to be causing the issue with the startup time.

Your comment implies that the cluster command queue only gets lots of messages on it when you switch? Are you saying that the switch causes the large numbers of messages? Or do you have large numbers of messages on your cluster command queue regardless of when you switch?

Why do you have so many messages on your cluster command queue? Have you issued a refresh cluster or something like that? Perhaps more than once?

Is the depth increasing or decreasing? Normally the cluster command queue depth should be tending to zero. Having a million messages on there is not the normal expectation.

Can you tell us more about your cluster? How many queue managers? How many queues?

Do you have an idea of where all the messages on the cluster command queue are coming from? Are they all from one queue manager for example?

Cheers,
Morag


"Your comment implies that the cluster command queue only gets lots of messages on it when you switch?" Yes
"Are you saying that the switch causes the large numbers of messages ?"
Yes

"Is the depth increasing or decreasing?" the depth is increasing now we have 1,843,000 messages in the SYSTEM.CLUSTER.COMMAND.QUEUE

"Why do you have so many messages on your cluster command queue? Have you issued a refresh cluster or something like that? Perhaps more than once?" We have not done a refresh cluster for more than 2 months to avoid having more messages

" How many queue managers?" 60
" How many queues? " 180
Do you have an idea of where all the messages on the cluster command queue are coming from? messages come from 2 FR
Back to top
View user's profile Send private message
Bad
PostPosted: Sun Jun 27, 2021 6:34 am    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

fjb_saper wrote:
Bad wrote:
gbaddeley wrote:
Can you bring your MQ expert into this chat?


Sorry i cant

Are you sure you don't have a poison message in the cluster command queue?
You did not specify the size of your cluster, but it seems to me that you have a huge number of messages there...


How could I locate this poison message in my queue?
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Sun Jun 27, 2021 11:37 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

Set the max retry on the SYSTEM.CLUSTER.COMMAND.QUEUE and maybe set a BACKOUT queue and you should find the poison message on the backout queue. The processing of legitimate messages should also considerably speed up.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Bad
PostPosted: Mon Jun 28, 2021 1:29 am    Post subject: Reply with quote

Novice

Joined: 15 Jun 2021
Posts: 14

fjb_saper wrote:
Set the max retry on the SYSTEM.CLUSTER.COMMAND.QUEUE and maybe set a BACKOUT queue and you should find the poison message on the backout queue. The processing of legitimate messages should also considerably speed up.


Thanks for your returns

I looked at the first 5000 messages of the commade.queue I have no retry on my messages

On one of my FR I noticed the presence of a G8 "cache switch" message in the first position in front of the RFQR message in the System.cluster.repository.queue is this a poison message?
Back to top
View user's profile Send private message
bruce2359
PostPosted: Mon Jun 28, 2021 6:19 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

Have you or your expert opened a PMR with IBM?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Mon Jun 28, 2021 8:34 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

Bad wrote:
exerk wrote:
And another question - which file system was full?

It was the file SYSTEM\!CLUSTER\!COMMAND\!QUEUE/ more 3GB

the cause was a partner who continued to send messages to a partner who closed his database for 3 days ...

A closed/unavailable database should not normally result in cluster administrative messages.

How many transactions per day comprise 3 days workload? What is the length of the transaction messages? What does the app in question do when it discovers that the db is not available to complete a transaction? Does the app put-inhibit an application cluster-queue for each app message, for example?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Mon Jun 28, 2021 3:27 pm    Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2492
Location: Melbourne, Australia

bruce2359 wrote:
Have you or your expert opened a PMR with IBM?

Agree. You first need to investigate and resolve the issue with a ridiculously large number of messages on the SCCQ. Not only is the cluster command process unable to process them at a reasonable rate, but something is generating a massive flood of them as well.

Is there 1 IPPROC on the queue, by process amqrrmfa?
Does the first message have a non-zero BackoutCount?
_________________
Glenn
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2, 3  Next Page 2 of 3

MQSeries.net Forum Index » Clustering » Switch QM between nominal / rescue
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.