ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Sporadic timeouts on request/reply

Post new topic  Reply to topic
 Sporadic timeouts on request/reply « View previous topic :: View next topic » 
Author Message
squidward
PostPosted: Thu Aug 05, 2010 2:39 pm    Post subject: Sporadic timeouts on request/reply Reply with quote

Novice

Joined: 27 Mar 2009
Posts: 10

I have the following issue. Any guidance on likely root cause or troubleshooting steps would be appreciated.

I have a request/reply setup where I select the reply message based on correlation id. I get sporadic timeouts on the reply, perhaps %2 of the requests result in timeouts. However, when I inspect the reply queue after the fact I can see that
a) the reply is present
b) the correlation id is present and correct
c) the message timestamp is well within the timeout period, 7 seconds before the timeout was logged

So in theory the timeout should never have happened. Anybody seen anything like this?

My configuration:
mqclient v7.0.1 using JMS, against MQ 6.0.2.6 on wintel. This qmgr exchanges the requests/replies with a remote qmgr on CICS (I don't have that version).

I have 8 second timeout, and responses normally come in <1 second.

Thanks in advance for any advice.
Back to top
View user's profile Send private message
jeevan
PostPosted: Thu Aug 05, 2010 3:50 pm    Post subject: Re: Sporadic timeouts on request/reply Reply with quote

Grand Master

Joined: 12 Nov 2005
Posts: 1432

squidward wrote:
I have the following issue. Any guidance on likely root cause or troubleshooting steps would be appreciated.

I have a request/reply setup where I select the reply message based on correlation id. I get sporadic timeouts on the reply, perhaps %2 of the requests result in timeouts. However, when I inspect the reply queue after the fact I can see that
a) the reply is present
b) the correlation id is present and correct
c) the message timestamp is well within the timeout period, 7 seconds before the timeout was logged

So in theory the timeout should never have happened. Anybody seen anything like this?

My configuration:
mqclient v7.0.1 using JMS, against MQ 6.0.2.6 on wintel. This qmgr exchanges the requests/replies with a remote qmgr on CICS (I don't have that version).

I have 8 second timeout, and responses normally come in <1 second.

Thanks in advance for any advice.


for the sake of troubleshooting, increase the time out value and see what happens.

Also, you can display curdepth and ipprocs of the queue while you put the message and see whether you can see the curdepth before ipprocs disppear or after.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Aug 05, 2010 3:59 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9486
Location: US: west coast, almost. Otherwise, enroute.

Exactly what do you mean by timeout?

Does your app issue an MQGET with WAIT? Did the WAIT expire?

Is the CICS transaction (is there one involved)? timeout?

How do you know something timed out? Was there an error logged in CICS? MQ?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
squidward
PostPosted: Thu Aug 05, 2010 5:20 pm    Post subject: Reply with quote

Novice

Joined: 27 Mar 2009
Posts: 10

Yes, I will try increasing the timeout. Not the best solution, since I still don't have a root cause, and there is a user waiting on the transaction who will have to sit there for even longer.

By timeout, I mean I issue a QueueReceiver.receive(8000) call that returns null after 8 seconds. Even though according to the MQ logs the message was put 7.8 seconds before. Only thing I can think of is that somehow the remote QMGR is not committing the put to my local qmgr, so I'm not able to retrieve the message even though the put has already occured.

The CICS transaction is not timing out -- the remote qmgr is CICS:


(MY CLIENT) <-> (WINTEL QMGR) <-> (MAINFRAME QMGR) <-> CICS SYSTEM

Thanks again.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Aug 05, 2010 6:05 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9486
Location: US: west coast, almost. Otherwise, enroute.

Quote:
(MY CLIENT) <-> (WINTEL QMGR) <-> (MAINFRAME QMGR) <-> CICS SYSTEM

Time to get your z/OS and CICS sysprogs to look at SMF performance data.

I'm going to go out on a limb and speculate that CICS and the z/OS qmgr are not the likely culprits. 8 seconds on a mainframe is, well, 8 seconds on a mainframe. CICS is capable of easily doing thousands of transactions per second.

Is there a database involved in the transaction on the client? On the Wintel qmgr? On the z/OS qmgr? The CICS app?

Are all other apps experiencing delays?

Is the JMS code from the CICS transaction/application? Or from an MQ application? In either case, some tuning of the JVM might be in order.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Aug 05, 2010 8:08 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20767
Location: LI,NY

You might also want to open a PMR and ship some trace logs to IBM.
The trouble is catching the trace log at the point in time when the problem happens.

Are you using a lot of temporary dynamic reply queues? There might be a caching problem in the channel where the dynamic pool is cached too long.
The PMR will tell you what the tuning parameter is ...

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Sat Aug 07, 2010 10:28 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7723

Sporadic timeouts with no other explanation when a Windows QM is involved? Sounds like what I am grappling with now myself. After all applications involved have proved they are processing quickly, after the SAN and server guys have confirmed no I/O problems, after the Network traces showed no errors, we think its related to a new version of anti virus software causing similar problems with other service areas as well. Apparently this thing is causing multiple virus definitions to be downloaded per day, multiple scans to be happening, NIC cards to get hung up, files are being scanned that shouldn't be, etc. Don't know if its a bug or if they misconfigured the thing or what.

I've identified the the XMITQ between the Windows QMs backing up for a minute or 2 every few hours with no errors at all and it recovers on its own. We are going to have them back off that anti virus upgrade and see if it helps like it did other areas.


There can be a lot of reasons for your symptoms. But if you've exhausted all other options, check to to see if the anti virus software hasn't recently been updated.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sat Aug 07, 2010 10:43 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9486
Location: US: west coast, almost. Otherwise, enroute.

I've also seen odd and seemingly inexplicable behaviors when sysadmins installed multiple anti-virus software on the same o/s.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
squidward
PostPosted: Tue Aug 10, 2010 7:30 am    Post subject: Reply with quote

Novice

Joined: 27 Mar 2009
Posts: 10

Glad to hear I'm not the only one banging my head against the wall.

Not using dynamic queues or anything, just normal request queue / reply queue.

I don't think its a Windows server issue, reason being that I have a separate standalone queue manager on the same windows box, which I never see timeouts with despite vastly higher volumes of data and same client codebase. So I think it must be issue on channel from the mainframe.

I did increase the timeout value and do not see any more timeouts. Problem not solved, since i know the remote system is responding in subsecond times.

Guess nothing left but to have a look into the trace options.
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Tue Aug 10, 2010 3:49 pm    Post subject: Reply with quote

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

You should also take a close look at sporadic slow response in the back end application.
_________________
Glenn
Back to top
View user's profile Send private message
mvic
PostPosted: Tue Aug 10, 2010 4:37 pm    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

squidward wrote:
Guess nothing left but to have a look into the trace options.

I always jump to this point if a client reports long end-to-end times. There's no substitute for knowing where the time is being lost in the total transaction.

MQ trace can help here, if you know enough about your message (MsgId, put time etc.) that you can find it in the trace. MQ will have written precise timestamps for you (approx. microsecond resolution) saying what it was doing with each message, and when. You can then correlate with data from other elements of your message's round trip.

BUT be aware that trace can slow down your system even more, which might make a bad situation worse if you are already in danger of breaching SLAs!
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Aug 10, 2010 5:46 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9486
Location: US: west coast, almost. Otherwise, enroute.

Quote:
...it must be issue on channel from the mainframe.

Spoken (typed) like a true newbie. I gather that you are not a mainframe person.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Wed Aug 11, 2010 3:46 pm    Post subject: Reply with quote

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

mvic wrote:
BUT be aware that trace can slow down your system even more, which might make a bad situation worse if you are already in danger of breaching SLAs!

and using up disk space in very very quickly on production systems! I suggest that you start MQ trace at a time of day when a slow message is most likely and then stop it immediately after. Make sure you have the approval of the application managers.
_________________
Glenn
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Sporadic timeouts on request/reply
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.