Author |
Message
|
yanaK |
Posted: Tue Jun 02, 2020 8:28 am Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
But even after all these I'll not know what the root cause was - right?
Last edited by yanaK on Tue Jun 02, 2020 2:24 pm; edited 1 time in total |
|
Back to top |
|
|
PeterPotkay |
Posted: Tue Jun 02, 2020 8:34 am Post subject: |
|
|
Poobah
Joined: 15 May 2001 Posts: 7717
|
yanaK wrote: |
@PeterPotkay so it seems the q mgr restart is necessary - will try that - also how did you come up with the numbers for nofiles and nproc?
|
Trial and error
A Case with IBM pointed us to these being the problem. They could not / would not give us specific values to use.
We wrote a little test app that would grab client connections from our desktop, as many as we asked it to. We had multiple people on our team use this tool concurrently. We were able to reproduce the problem, and after upping the parameters AND restarting the QM we were able to drive the number of concurrent client connections up to what the QM had set for MaxActiveChannels. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
|
yanaK |
Posted: Fri Jun 05, 2020 1:01 pm Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
The issue is happening now and I see none of the resources are strained:
Code: |
mqconfig: Analyzing Red Hat Enterprise Linux Server release 7.8 (Maipo)
settings for WebSphere MQ V7.1
System V Semaphores
semmsl (sem:1) 500 semaphores IBM>=500 PASS
semmns (sem:2) 12 of 256000 semaphores (0%) IBM>=256000 PASS
semopm (sem:3) 250 operations IBM>=250 PASS
semmni (sem:4) 5 of 1024 sets (0%) IBM>=1024 PASS
System V Shared Memory
shmmax 18446744073692774399 bytes IBM>=268435456 PASS
shmmni 40 of 4096 sets (0%) IBM>=4096 PASS
shmall 58741 of 18446744073692774399 pages (0%) IBM>=2097152 PASS
System Settings
file-max 11104 of 583180 files (1%) IBM>=524288 PASS
tcp_keepalive_time 300 seconds IBM<=300 PASS
Current User Limits (mqm)
nofile (-Hn) 10240 files IBM>=10240 PASS
nofile (-Sn) 10240 files IBM>=10240 PASS
nproc (-Hu) 88 of 47673 processes (0%) IBM>=4096 PASS
nproc (-Su) 88 of 47673 processes (0%) IBM>=4096 PASS |
What else can I look into? What's the command to check active channel usage?
Error:
Code: |
----- amqxfdcx.c : 876 --------------------------------------------------------
06/05/2020 01:50:33 PM - Process(97344.459) User(mqm) Program(amqrmppa)
Host(z6532) Installation(Installation1)
VRMF(7.1.0.7) QMgr(QPT4)
AMQ9208: Error on receive from host p0073729 (10.112.404.126).
EXPLANATION:
An error occurred receiving data from p0073729 (10.112.404.126) over TCP/IP.
This may be due to a communications failure.
ACTION:
The return code from the TCP/IP read() call was 104 (X'68'). Record these
values and tell the systems administrator.
----- amqccita.c : 3832 -------------------------------------------------------
06/05/2020 01:50:33 PM - Process(97344.459) User(mqm) Program(amqrmppa)
Host(z6532) Installation(Installation1)
VRMF(7.1.0.7) QMgr(QPT4)
AMQ9999: Channel 'PS.SVR' to host '10.112.404.126' ended abnormally.
EXPLANATION:
The channel program running under process ID 97344 for channel 'PS.SVR'
ended abnormally. The host name is '10.112.404.126'; in some cases the host name
cannot be determined and so is shown as '????'.
ACTION:
Look at previous error messages for the channel program in the error logs to
determine the cause of the failure. Note that this message can be excluded
completely or suppressed by tuning the "ExcludeMessage" or "SuppressMessage"
attributes under the "QMErrorLog" stanza in qm.ini. Further information can be
found in the System Administration Guide.
----- amqrmrsa.c : 939 --------------------------------------------------------
06/05/2020 01:50:58 PM - Process(68029.1) User(mqm) Program(runmqlsr)
Host(z6532) Installation(Installation1)
VRMF(7.1.0.7) QMgr(QPT4)
AMQ6184: An internal WebSphere MQ error has occurred on queue manager QPT4
.
EXPLANATION:
An error has been detected, and the WebSphere MQ error recording routine has
been called. The failing process is process 68029.
ACTION:
Use the standard facilities supplied with your system to record the problem
identifier and to save any generated output files. Use either the MQ Support
site: http://www.ibm.com/software/integration/wmq/support/, or IBM Support
Assistant (ISA): http://www.ibm.com/software/support/isa/, to see whether a
solution is already available. If you are unable to find a match, contact your
IBM support center. Do not discard these files until the problem has been
resolved.
----- amqxfdcx.c : 916 --------------------------------------------------------
06/05/2020 01:50:58 PM - Process(68029.1) User(mqm) Program(runmqlsr)
Host(z6532) Installation(Installation1)
VRMF(7.1.0.7) QMgr(QPT4)
AMQ6119: An internal WebSphere MQ error has occurred ('11 - Resource
temporarily unavailable' from pthread_create.)
EXPLANATION:
MQ detected an unexpected error when calling the operating system. The MQ error
recording routine has been called.
ACTION:
Use the standard facilities supplied with your system to record the problem
identifier and to save any generated output files. Use either the MQ Support
site: http://www.ibm.com/software/integration/wmq/support/, or IBM Support
Assistant (ISA): http://www.ibm.com/software/support/isa/, to see whether a
solution is already available. If you are unable to find a match, contact your
IBM support center. Do not discard these files until the problem has been
resolved.
----- amqxfdcx.c : 876 --------------------------------------------------------
06/05/2020 01:53:37 PM - Process(72311.785) User(mqm) Program(amqrmppa)
Host(z6532) Installation(Installation1)
VRMF(7.1.0.7) QMgr(QPT4)
AMQ9772: MQCTL failed with MQRC=2534.
EXPLANATION:
The indicated WebSphere MQ API call failed for the specified reason code.
ACTION:
Refer to the Application Programming Reference manual for information about
Reason Code 2534.
----- cmqxrstf.c : 2683 ------------------------------------------------------- |
Failure report:
Code: |
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Fri June 05 2020 13:50:58 PDT |
| UTC Time :- 1591390258.657512 |
| UTC Time Offset :- -420 (PST) |
| Host Name :- z6532 |
| Operating System :- Linux 3.10.0-1127.el7.x86_64 |
| PIDS :- 5724H7230 |
| LVLS :- 7.1.0.7 |
| Product Long Name :- WebSphere MQ for Linux (x86-64 platform) |
| Vendor :- IBM |
| Installation Path :- /opt/mqm |
| Installation Name :- Installation1 (1) |
| Probe Id :- XC035040 |
| Application Name :- MQM |
| Component :- xcsCreateThread |
| SCCS Info :- lib/cs/unix/linux_2/amqxprmx.c, 1.176.1.1 |
| Line Number :- 1974 |
| Build Date :- Nov 4 2015 |
| CMVC level :- p710-007-151104 |
| Build Type :- IKAP - (Production) |
| Effective UserID :- 3732 (mqm) |
| Real UserID :- 3732 (mqm) |
| Program Name :- amqrmppa |
| Arguments :- -m "QPT4 " |
| Addressing mode :- 64-bit |
| LANG :- en_US.UTF-8 |
| Process :- 51562 |
| Process(Thread) :- 51562 |
| Thread :- 1 |
| ThreadingModel :- PosixThreads |
| QueueManager :- QPT4 |
| UserApp :- FALSE |
| ConnId(1) IPCC :- 4169 |
| ConnId(3) QM-P :- 1721 |
| ConnId(4) App :- 0 |
| Last HQC :- 1.0.0-191744 |
| Last HSHMEMB :- 0.0.0-0 |
| Major Errorcode :- xecF_E_UNEXPECTED_SYSTEM_RC |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6119 |
| Probe Severity :- 2 |
| Probe Description :- AMQ6119: An internal WebSphere MQ error has occurred |
| ('11 - Resource temporarily unavailable' from pthread_create.) |
| FDCSequenceNumber :- 30 |
| Arith1 :- 11 (0xb) |
| Comment1 :- '11 - Resource temporarily unavailable' from |
| pthread_create. |
| |
+-----------------------------------------------------------------------------+ |
|
|
Back to top |
|
|
yanaK |
Posted: Fri Jun 05, 2020 3:06 pm Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
Another FDC header I see around that time is (I see a similar one with ZS401070 too):
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Fri June 05 2020 13:45:28 PDT |
| UTC Time :- 1591389928.316601 |
| UTC Time Offset :- -420 (PST) |
| Host Name :- z6532 |
| Operating System :- Linux 3.10.0-1127.el7.x86_64 |
| PIDS :- 5724H7230 |
| LVLS :- 7.1.0.7 |
| Product Long Name :- WebSphere MQ for Linux (x86-64 platform) |
| Vendor :- IBM |
| Installation Path :- /opt/mqm |
| Installation Name :- Installation1 (1) |
| Probe Id :- ZS401010 |
| Application Name :- MQM |
| Component :- zstStartAsyncConsumeThread |
| SCCS Info :- lib/zst/amqzstd0.c, 1.230.1.7 |
| Line Number :- 415 |
| Build Date :- Nov 4 2015 |
| CMVC level :- p710-007-151104 |
| Build Type :- IKAP - (Production) |
| Effective UserID :- 3732 (mqm) |
| Real UserID :- 3732 (mqm) |
| Program Name :- amqrmppa |
| Arguments :- -m "QPT4 " |
| Addressing mode :- 64-bit |
| LANG :- en_US.UTF-8 |
| Process :- 51562 |
| Process(Thread) :- 34172 |
| Thread :- 936 |
| ThreadingModel :- PosixThreads |
| QueueManager :- QPT4 |
| UserApp :- FALSE |
| ConnId(1) IPCC :- 980255 |
| Last HQC :- 1.0.0-3465208 |
| Last HSHMEMB :- 0.0.0-0 |
| Major Errorcode :- xecP_E_PROC_LIMIT |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6026 |
| Probe Severity :- 2 |
| Probe Description :- AMQ6026: A resource shortage prevented the creation of |
| a WebSphere MQ process. |
| FDCSequenceNumber :- 28 |
| |
+-----------------------------------------------------------------------------+ |
|
Back to top |
|
|
fjb_saper |
Posted: Fri Jun 05, 2020 7:40 pm Post subject: |
|
|
Grand High Poobah
Joined: 18 Nov 2003 Posts: 20729 Location: LI,NY
|
yanaK wrote: |
The issue is happening now and I see none of the resources are strained:
Code: |
mqconfig: Analyzing Red Hat Enterprise Linux Server release 7.8 (Maipo)
settings for WebSphere MQ V7.1
System V Semaphores
semmsl (sem:1) 500 semaphores IBM>=500 PASS
semmns (sem:2) 12 of 256000 semaphores (0%) IBM>=256000 PASS
semopm (sem:3) 250 operations IBM>=250 PASS
semmni (sem:4) 5 of 1024 sets (0%) IBM>=1024 PASS
System V Shared Memory
shmmax 18446744073692774399 bytes IBM>=268435456 PASS
shmmni 40 of 4096 sets (0%) IBM>=4096 PASS
shmall 58741 of 18446744073692774399 pages (0%) IBM>=2097152 PASS
System Settings
file-max 11104 of 583180 files (1%) IBM>=524288 PASS
tcp_keepalive_time 300 seconds IBM<=300 PASS
Current User Limits (mqm)
nofile (-Hn) 10240 files IBM>=10240 PASS
nofile (-Sn) 10240 files IBM>=10240 PASS
nproc (-Hu) 88 of 47673 processes (0%) IBM>=4096 PASS
nproc (-Su) 88 of 47673 processes (0%) IBM>=4096 PASS |
What else can I look into? What's the command to check active channel usage? |
Looks like you've reached the max of both your nofile parameters. What happens if you say double those? _________________ MQ & Broker admin |
|
Back to top |
|
|
yanaK |
Posted: Sat Jun 06, 2020 1:22 pm Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
Thanks fjb_saper.
Do you mean
Code: |
file-max 11104 of 583180 files (1%) IBM>=524288 PASS
|
Exceeding 10240 ?
What causes to reach nofiles max?
I can try doubling them. |
|
Back to top |
|
|
yanaK |
Posted: Sat Jun 06, 2020 4:12 pm Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
Code: |
mqm@ z6532:/export/home/mqm$ lsof | wc -l
597768 |
This is interesting too! |
|
Back to top |
|
|
yanaK |
Posted: Mon Jun 08, 2020 9:47 am Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
Can anyone point out on what might be causing to reach nofiles max? Are there lot of file openings involved in MQ? If so, why/how?
Thanks |
|
Back to top |
|
|
bruce2359 |
Posted: Mon Jun 08, 2020 10:32 am Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9442 Location: US: west coast, almost. Otherwise, enroute.
|
PaulClarke wrote: |
So, am I right in thinking that you are trying to run a lot of clients in to your queue manager ? |
yanaK wrote: |
Theoretically yes (but I'd expect it to get distributed over all the q mgrs)
|
All apps connect to a queue manager, not to a cluster. Are your apps client-bindings? If so, they connect to a SVRCONN channel. So, back to Paul's question: How many clients connect to this queue manager? Is the error you see when there is high connectivity rate?
At the client platforms, does the app make use of a CCDT? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
yanaK |
Posted: Mon Jun 08, 2020 10:58 am Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
How many clients connect to this queue manager?
How do I know ? Is there a command I can run to see ? Theoretically 12k can connect.
Are your apps client-bindings?
What do I need to check in client side? I know they specify all the 4 queue manager names in the client config (wish there was a way to connect to a cluster)
does the app make use of a CCDT?
I am not sure - if you can point me on what to check I can do that.
Thanks! |
|
Back to top |
|
|
bruce2359 |
Posted: Mon Jun 08, 2020 11:55 am Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9442 Location: US: west coast, almost. Otherwise, enroute.
|
yanak asks: How many clients connect to this queue manager?
How do I know ? Is there a command I can run to see ? Theoretically 12k can connect.
Bruce replies: You could know by asking the developers how many users they expect to use the app in a given time period.
Are your apps client-bindings?
What do I need to check in client side? I know they specify all the 4 queue manager names in the client config (wish there was a way to connect to a cluster)
Bruce replies: (see below)
does the app make use of a CCDT?
I am not sure - if you can point me on what to check I can do that.
Bruce replies: At the SERVER end, do you (or anyone else) create/alter CLNTCONN channel definitions? If yes, then a CCDT is created (for use by clients to determine which qmgr/channel to connect to. If yes, at the client end what procedure/script is executed to run the client app? Look at the script for environment variables that have MQ as part of the environment variable name.
Further, what gets installed on a client platform? You likely will need to ask developers and/or deployers.
Thanks! _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
yanaK |
Posted: Mon Jun 08, 2020 2:35 pm Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
Thanks. I've asked the questions to our client team. Also one thing I missed is the client that are seeing and error due to these are all on the response path (not when the request is being placed on the queue) and they are seeing
Quote: |
Connection to queue manager lost |
Can this be of any help on why open files are shooting up so high (and in this server)? |
|
Back to top |
|
|
bruce2359 |
Posted: Mon Jun 08, 2020 3:03 pm Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9442 Location: US: west coast, almost. Otherwise, enroute.
|
yanaK wrote: |
Thanks. I've asked the questions to our client team. Also one thing I missed is the client that are seeing and error due to these are all on the response path (not when the request is being placed on the queue) and they are seeing
Quote: |
Connection to queue manager lost |
Can this be of any help on why open files are shooting up so high (and in this server)? |
Where precisely does this error appear? Please look at the AMQERR01.LOG file in the ERRORS directory on the client platform. Please post the complete error, including any related errors (from TCP, for example).
I’ll suspicion that the lost connections leaves resources Allocated, And repetitive client reconnection is driving up demand for resources. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Last edited by bruce2359 on Mon Jun 08, 2020 5:56 pm; edited 1 time in total |
|
Back to top |
|
|
yanaK |
Posted: Mon Jun 08, 2020 3:33 pm Post subject: |
|
|
Acolyte
Joined: 28 May 2020 Posts: 69
|
the error appears in the application log.
I didn't know there is a AMQERR01.LOG in the client side too
Let me ask the client team.
Thanks for all the help. |
|
Back to top |
|
|
PeterPotkay |
Posted: Mon Jun 08, 2020 5:51 pm Post subject: |
|
|
Poobah
Joined: 15 May 2001 Posts: 7717
|
We wanted to be able to get to 5000 concurrent client connections.
We upped nofile to 20480 to be able to do this in our environment.
Stop the QM
Increase nofile to 20480 for the mqm account
Log out of mqm
Log back into mqm
Start up the queue manager
Test
We wrote a test harness to launch as many client connections as we needed and thru trial and error we got to 20480 for nofile to get to 5000 client connections.
Your numbers may vary. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
|
|