ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Old MQ died after huge amount of data - AMQ4059

Post new topic  Reply to topic Goto page 1, 2  Next
 Old MQ died after huge amount of data - AMQ4059 « View previous topic :: View next topic » 
Author Message
Buster
PostPosted: Sat Apr 27, 2024 8:22 am    Post subject: Old MQ died after huge amount of data - AMQ4059 Reply with quote

Newbie

Joined: 27 Apr 2024
Posts: 8
Location: Sweden

We have an old test system that has run for years in a Linux environment. It is never changed since it always works.

During recent tests an unusual huge amount of test data was sent. After that MQ was unresponsive and I cannot connect to it with my MQ Explorer from Windows 11. Nothing is changed in configuration, firewalls or anything. MQ Explorer reports "reason 2538 (AMQ4059)".

I have tried to restart the MQ:

Code:
 
$ endmqlsr
$ endmqm MYMQ
$ strmqm MYMQ
$ runmqlsr -r -m MYMQ -t TCP -p 1414 &


but I still cannot connect with MQ Explorer.

The AMQERR01.LOG shows:

Code:

04/27/2024 12:30:08 PM - Process(2398.1) User(mqm) Program(runmqlsr)
                    Host(MyHOST)
AMQ6125: An internal WebSphere MQ error has occurred.

EXPLANATION:
An internal error has occurred with identifier 10805014.  This message is
issued in association with other messages.
----- amqxfdcx.c : 785 --------------------------------------------------------
04/27/2024 12:30:08 PM - Process(2398.1) User(mqm) Program(runmqlsr)
                    Host(MyHOST)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager MYMQ.

EXPLANATION:
An error has been detected, and the WebSphere MQ error recording routine has
been called. The failing process is process 2398.
----- amqxfdcx.c : 824 --------------------------------------------------------


Since this is a test server, the data sent is of no value, but how do I get my MQ responsive again?

Is there a way to empty all queues/channels without affecting any configuration?

Do you have any other recommendation for me to try?
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sat Apr 27, 2024 12:15 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9406
Location: US: west coast, almost. Otherwise, enroute.

Before you do anything to restore the qmgr, look in the error logs for more detail as to what occured before the qmgr became unresponsive. Look for FDC's related to the error. Post results here.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Buster
PostPosted: Sat Apr 27, 2024 9:14 pm    Post subject: Reply with quote

Newbie

Joined: 27 Apr 2024
Posts: 8
Location: Sweden

bruce2359 wrote:
Before you do anything to restore the qmgr, look in the error logs for more detail as to what occured before the qmgr became unresponsive. Look for FDC's related to the error. Post results here.


Thank you for your response.

The test server has run unattended for years. When the fault was first observed, this is the first message in the log errors/AMQERR01.LOG
Code:

----- amqxfdcx.c : 824 --------------------------------------------------------
04/26/2024 08:12:05 PM - Process(23017.1) User(mqm) Program(runmqlsr)
                    Host(MyHOST)
AMQ6125: An internal WebSphere MQ error has occurred.

EXPLANATION:
An internal error has occurred with identifier 10805014.  This message is
issued in association with other messages.


Checking the log errors/AMQ23017.0.FDC that seems to be created at the first time of the fault gives the following lines. I have extracted what I think can be relevant:
Code:

WebSphere MQ First Failure Symptom Report
=========================================
Date/Time         :- Fri April 26 2024 20:12:05 UTC
Component         :- zupSetProcessStatus
SCCS Info         :- lib/zu/amqzuxp0.c, 1.26.2.1
Line Number       :- 1961
Build Date        :- Jul 20 2016
Program Name      :- runmqlsr
Major Errorcode   :- zrcX_PROCESS_NOT_RUNNING
Minor Errorcode   :- OK
Probe Type        :- INCORROUT
Probe Severity    :- 3
Probe Description :- AMQ6125: An internal WebSphere MQ error has occurred.
FDCSequenceNumber :- 0


If I go through the entire dump in the errors/AMQ23017.0.FDC file, the only thing that I can interpret as a potential problem is this:
Code:

Data: 0x00000006
----} cccProcessDisconnect rc=OK
----{ xcsReaddir
----} xcsReaddir rc=OK
----{ xcsTerminate
-----{ xcsRequestThreadMutexSem
-----} xcsRequestThreadMutexSem rc=OK
-----{ xihRemoveQueueManager
-----} xihRemoveQueueManager rc=Unknown(1)
-----{ xcsDisconnectSharedSubpool
-----} xcsDisconnectSharedSubpool rc=OK
-----{ xcsReleaseThreadMutexSem
-----} xcsReleaseThreadMutexSem rc=OK
----} xcsTerminate rc=OK
----{ xcsFreeMemFn
----} xcsFreeMemFn rc=OK
---} ccxQueryListeners rc=OK
---{ rrxError
---} rrxError rc=rrcE_LISTENER_ALREADY_RUNNING
--} ccxCheckOtherListeners rc=rrcE_LISTENER_ALREADY_RUNNING
-} cciTcpListenConv rc=rrcE_LISTENER_ALREADY_RUNNING
} ccxListenConv rc=rrcE_LISTENER_ALREADY_RUNNING
{ zupSetProcessReturn
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sun Apr 28, 2024 4:08 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9406
Location: US: west coast, almost. Otherwise, enroute.

Quote:
---- amqxfdcx.c : 824 --------------------------------------------------------
04/26/2024 08:12:05 PM - Process(23017.1) User(mqm) Program(runmqlsr)
Host(MyHOST)
AMQ6125: An internal WebSphere MQ error has occurred.

EXPLANATION:
An internal error has occurred with identifier 10805014. This message is
issued in association with other messages.

This is telling you to look backward in the AMQERR01.LOG to find the cause of the failure.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Buster
PostPosted: Sun Apr 28, 2024 5:13 am    Post subject: Reply with quote

Newbie

Joined: 27 Apr 2024
Posts: 8
Location: Sweden

[quote="bruce2359"]
Quote:

This is telling you to look backward in the AMQERR01.LOG to find the cause of the failure.


The two messages that I listed are the first from 2024. The one before is from last year. Any other place I can look?
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sun Apr 28, 2024 5:43 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9406
Location: US: west coast, almost. Otherwise, enroute.

[quote="Buster"]
bruce2359 wrote:
Quote:

This is telling you to look backward in the AMQERR01.LOG to find the cause of the failure.


The two messages that I listed are the first from 2024. The one before is from last year. Any other place I can look?

Is the test qmgr left RUNNING when not being used?

What is the last changed date on the mq.ini file for this qmgr? What change was made?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Buster
PostPosted: Sun Apr 28, 2024 6:19 am    Post subject: Reply with quote

Newbie

Joined: 27 Apr 2024
Posts: 8
Location: Sweden

bruce2359 wrote:

Is the test qmgr left RUNNING when not being used?

What is the last changed date on the mq.ini file for this qmgr? What change was made?


The test qmgr is always running. Test messages are normally sent every day.

The ini file was last changed 2018.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sun Apr 28, 2024 7:04 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9406
Location: US: west coast, almost. Otherwise, enroute.

Buster wrote:
The two messages that I listed are the first from 2024.

Really? There were zero log entries from the first three months of 2024 thru these error messages? And the qmgr is used daily? No checkpoints logged? No channels started or ended?

Was this qmgr stopped and restarted? O/S rebooted?

Was this qmgr backed up and a prior backup restored? Odd that there's a months long gap in your logs.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Buster
PostPosted: Sun Apr 28, 2024 9:05 am    Post subject: Reply with quote

Newbie

Joined: 27 Apr 2024
Posts: 8
Location: Sweden

bruce2359 wrote:

Really? There were zero log entries from the first three months of 2024 thru these error messages? And the qmgr is used daily? No checkpoints logged? No channels started or ended?

Was this qmgr stopped and restarted? O/S rebooted?

Was this qmgr backed up and a prior backup restored? Odd that there's a months long gap in your logs.


Last OS reboot was 2023-10-20, and nothing is written to the AMQERR01.LOG between that date and 2024-04-26. Surprisingly, the messages written in the log at reboot were the same as the one we see now but the MQ has been working flawlessly since reboot until this Friday.

The server isn't used for much but routing a few messages between two groups of test servers. Normally, this is a few hundreds per day. On Friday when we run into problems, there where sent 600 K messages. A while after that the MQ stopped responding. We have plenty of disk so the volume shouldn't be a problem. No backup/restore was done. The qmgr is never stopped since it has always worked until now. Only now after the problems I have tried to restart the qmgr a number of times with the commands shown above.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sun Apr 28, 2024 11:03 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9406
Location: US: west coast, almost. Otherwise, enroute.

Hmmmm. What are the create date/time for AMQERR01.LOG, AMQERR02.LOG and AMQERR03.LOG?

Since this is a test qmgr and of minimal value, I'd suggest deleting and recreating it with the same attributes.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Sun Apr 28, 2024 5:46 pm    Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2499
Location: Melbourne, Australia

I suggest ending the qmgr and then kill all mqm processes associated with it.
endmqm -i YOURQMNAME
ps -ef | grep mqm
kill -9 processidnumber

strmqm YOURQMNAME

Check that the listener has started
ps -ef | grep runmqlsr
netstat -an | grep LISTEN | grep yourportnumber

Investigate errors in qmgrs/YOURQMNAME/errors/AMQERR01.LOG
_________________
Glenn
Back to top
View user's profile Send private message
Buster
PostPosted: Sun Apr 28, 2024 9:01 pm    Post subject: Reply with quote

Newbie

Joined: 27 Apr 2024
Posts: 8
Location: Sweden

bruce2359 wrote:
Hmmmm. What are the create date/time for AMQERR01.LOG, AMQERR02.LOG and AMQERR03.LOG?

Since this is a test qmgr and of minimal value, I'd suggest deleting and recreating it with the same attributes.


The last modification times are:

    AMQERR01.LOG - 2024-04-29 (I have tried to restart it today as well)
    AMQERR02.LOG - 2023-06-12
    AMQERR03.LOG - 2018-06-14


I haven't tried to delete and recreate it yet since I am not sure I can do that correctly.
Back to top
View user's profile Send private message
Buster
PostPosted: Sun Apr 28, 2024 9:10 pm    Post subject: Reply with quote

Newbie

Joined: 27 Apr 2024
Posts: 8
Location: Sweden

gbaddeley wrote:
I suggest ending the qmgr and then kill all mqm processes associated with it.
endmqm -i YOURQMNAME
ps -ef | grep mqm
kill -9 processidnumber

strmqm YOURQMNAME

Check that the listener has started
ps -ef | grep runmqlsr
netstat -an | grep LISTEN | grep yourportnumber

Investigate errors in qmgrs/YOURQMNAME/errors/AMQERR01.LOG


I have tried that and there is no errors in the AMQERR01.LOG but also no change in behavior: the MQ Explorer can not connect and shows "reason 2538 (AMQ4059)"

It looks that the commands that I used to restart the MQ was duplicated by each other. I have been told to always do:
Code:

$ strmqm MYMQ
$ nohup runmqlsr -r -m MYMQ -t TCP -p 1414 &

but it seems that both these commands do roughly the same thing and the second command creates the errors in the AMQERR01.LOG that we have been struggling with.

Am I guessing correct?
Back to top
View user's profile Send private message
bruce2359
PostPosted: Mon Apr 29, 2024 4:09 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9406
Location: US: west coast, almost. Otherwise, enroute.

[quote="Buster"]
gbaddeley wrote:
I suggest ending the qmgr and then kill all mqm processes associated with it.
endmqm -i YOURQMNAME
ps -ef | grep mqm
kill -9 processidnumber

strmqm YOURQMNAME

Check that the listener has started
ps -ef | grep runmqlsr
netstat -an | grep LISTEN | grep yourportnumber

Investigate errors in qmgrs/YOURQMNAME/errors/AMQERR01.LOG


I have tried that and there is no errors in the AMQERR01.LOG but also no change in behavior: the MQ Explorer can not connect and shows "reason 2538 (AMQ4059)"

It looks that the commands that I used to restart the MQ was duplicated by each other. I have been told to always do:
Code:

$ strmqm MYMQ
$ nohup runmqlsr -r -m MYMQ -t TCP -p 1414 &

but it seems that both these commands do roughly the same thing and the second command creates the errors in the AMQERR01.LOG that we have been struggling with.

Buster wrote:
Am I guessing correct?
No. The nohup command starts a listener.

Glenn asked you to issue these shell commands. What were the results of each command? Post the results here.

Quote:
endmqm -i YOURQMNAME
ps -ef | grep mqm
kill -9 processidnumber

strmqm YOURQMNAME

Glenn asked you to issue these shell commands. What were the results of each? Post the results here.

Quote:
ps -ef | grep runmqlsr
netstat -an | grep LISTEN | grep yourportnumber

No. the nohup command starts a listener in background.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Mon Apr 29, 2024 4:14 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9406
Location: US: west coast, almost. Otherwise, enroute.

bruce2359 wrote:
Buster wrote:
gbaddeley wrote:
I suggest ending the qmgr and then kill all mqm processes associated with it.
endmqm -i YOURQMNAME
ps -ef | grep mqm
kill -9 processidnumber

strmqm YOURQMNAME

Check that the listener has started
ps -ef | grep runmqlsr
netstat -an | grep LISTEN | grep yourportnumber

Investigate errors in qmgrs/YOURQMNAME/errors/AMQERR01.LOG


I have tried that and there is no errors in the AMQERR01.LOG but also no change in behavior: the MQ Explorer can not connect and shows "reason 2538 (AMQ4059)"

It looks that the commands that I used to restart the MQ was duplicated by each other. I have been told to always do:
Code:

$ strmqm MYMQ
$ nohup runmqlsr -r -m MYMQ -t TCP -p 1414 &

but it seems that both these commands do roughly the same thing and the second command creates the errors in the AMQERR01.LOG that we have been struggling with.

Am I guessing correct?
No. The nohup command starts a listener.

Glenn asked you to issue these shell commands. What were the results of each command? Post the results here.

Quote:
endmqm -i YOURQMNAME
ps -ef | grep mqm
kill -9 processidnumber

strmqm YOURQMNAME

Glenn asked you to issue these shell commands. What were the results of each? Post the results here.

Quote:
ps -ef | grep runmqlsr
netstat -an | grep LISTEN | grep yourportnumber

No. the nohup command starts a listener in background.

_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » General IBM MQ Support » Old MQ died after huge amount of data - AMQ4059
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.