|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
Confusion over Heartbeat |
« View previous topic :: View next topic » |
Author |
Message
|
mvic |
Posted: Wed Nov 25, 2009 6:42 am Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
mqjeff wrote: |
Regardless of situation, a needlessly running channel doesn't have "zero impact". So if the app can't show any real gain from having the channel running all of the time, you are still wasting some resources for no gain. |
DISCINT 0 keeps your channel "up" but it burns negligible CPU while waiting for the next message. One thing that I've seen interfere with long-running channels that go idle is when there is some active piece of the network (firewall etc.) that chops the connection. Other than this, if the system capacity is up to it (RAM, kernel capacity, MaxActiveChannels etc.) then DISCINT 0 shouldn't give any problems.
Quote: |
And any process runs the risk of failure over an extended period of uptime. You still IPL your mainframe, right? So why not quiesce your channels for the same reasons. |
No need to do so unless there are actually problems being seen. In which case I'm sure IBM would want to help solve that. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Nov 25, 2009 4:29 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20767 Location: LI,NY
|
mvic wrote: |
mqjeff wrote: |
Regardless of situation, a needlessly running channel doesn't have "zero impact". So if the app can't show any real gain from having the channel running all of the time, you are still wasting some resources for no gain. |
DISCINT 0 keeps your channel "up" but it burns negligible CPU while waiting for the next message. One thing that I've seen interfere with long-running channels that go idle is when there is some active piece of the network (firewall etc.) that chops the connection. Other than this, if the system capacity is up to it (RAM, kernel capacity, MaxActiveChannels etc.) then DISCINT 0 shouldn't give any problems.
Quote: |
And any process runs the risk of failure over an extended period of uptime. You still IPL your mainframe, right? So why not quiesce your channels for the same reasons. |
No need to do so unless there are actually problems being seen. In which case I'm sure IBM would want to help solve that. |
The most common problems I have seen is that the sender channel is in retrying mode because the receiver channel never realized that the connection was broken... (between dist. and MF). Using adopt new MCA could certainly alleviate some of that. I'd expect that a SVRCONN channel is not subjected to the same adopt new MCA rules..... as that would potentially raise a whole other number of questions.
Manual intervention used to be required in the above case and we had to force stop the receiver chl on the mainframe... Haven't seen any of those problems in years though (not since using disconnect interval on the channel).  _________________ MQ & Broker admin |
|
Back to top |
|
 |
jcv |
Posted: Sat Nov 28, 2009 4:33 am Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
mqjeff wrote: |
In those in-between cases... |
Let me explain what I meant and how I see this scenario. Let's say that our application has one kind of idle periods which cannot be determined when they exactly begin and when they end, and can only be estimated how long at most they can be, for example during working day's nights, and we add some tolerance to that maximum expected idle period in order to define discint which we are fairly sure cannot be reached over such periods, and especially not during busy daily periods, and other kind of idle periods which can be determined when they exactly end, so that we can actually use scheduling tools to start the channels sufficiently before the application activity starts, that kind may be during non-working days, then I guess that for that other kind of idle period can also be known when it begins, and we can use scheduling tools to stop channels sufficiently after the application activity ends without depending on previously estimated discint to expire, saving more resources that way. That wouldn't mean much more work since one way or another we have to schedule start, in order to fulfill SLA. If we are concerned about the mentioned potential leak of some kind I was fortunately never faced to, although I have history of running non mature versions too, I would say ending the instance would help solving such problems in 99% of cases. If we want to be sure, and execute exactly the required (by that recommendation) branch of code, we may in the same schedule alter discint to 1, start the channel and after it immediately ages out, alter it back. Back to 0 I would say, because there is no need to avoid that value in such scenario, saves us the effort of estimating discint. Naturaly, mature sw should be able to run whenever is needed, for as long as needed, that is, free of any leaks. Hence, such altering can hardly be needed, and hardly an argument against keeping discint at 0.
Now, in that scenario, our channels are non-stop up let's say 5 or 6 days out of 7 regardless of discint set to 0 or to the estimation. When channel encounters retryable error during that period it will go retry regardless of discint, when it encounters non-retryable error it will stop regardles of discint. So if we set appropriate long retry count, what's the benefit of avoiding 0 with respect to avoiding manual interventions on channels in order to restart them? None.
PeterPotkay wrote: |
Shirley, see my previous comments about how a SNDR channel can auto recover since it initiates work, but the RCVR channel just sits there. |
Excuse me, I don't see the relevance here because back then at that moment we were already discussing discints, while this must be solved by heartbeats and adoptnewmca, discint cannot help rcvr during 6 days out of 7, since channel may not rest. It is also not obliged to help, because those two do that instead. I am probably missing something here?
In a scenario in which there are only nights (type 1), but all days are equally working (no type 2), avoiding 0 is even more pointless, since channel never rests, it doesn't have to be actually busy all the time, it just cannot be shutdown or aged out. Abstractly speaking, there can't be idle periods of type 3, for which you know when it ends but you don't know when it starts.
mvic wrote: |
One thing that I've seen interfere with long-running channels that go idle is when there is some active piece of the network (firewall etc.) that chops the connection. |
Isn't that always recoverable error if heartbeats and adoptnewmca are used?
I'm also not clear about clearing retryable and non-retryable errors while channel is inactive and does not notice it. To gain that benefit, and I saw people emphasizing that as a reason to let channels go inactive, which teams usually must do something manually, or is it usually automatic nowadays? Are both types of errors equally solvable without any manual intervention? Obviously, if MQ admin team has to intervene, than there is no actual benefit gained by channels being inactive.
Thanks in advance for answers to my questions and for any corrections of my thoughts. |
|
Back to top |
|
 |
bruce2359 |
Posted: Sat Nov 28, 2009 6:37 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9482 Location: US: west coast, almost. Otherwise, enroute.
|
Paragraph 1 may be the longest ever written on a post on this site. This paragraph is very long, and is very difficult to read and follow the train of thought.
Having said all that... and in summary:
Disconnect interval enable a channel to go inactive when there are no more messages to transfer. Heartbeats allow the channel ends to keep track of their peers health - up to the point of disconnect. Triggered channels allow the next message that arrives in an xmit queue to restart the inactive channel - without the need for external automation (job schedulers).
Given the nature of network hardware and software (message workloads that vary over time, routers, packets, firewalls, cables, nic cards, back-hoes, etc.), channels sometimes fail. WMQ offers some tools to keep channels alive - there is a Hursley post of a similar name that is worth reading).
In an ideal world (99% error-free), 99% error-free would be unacceptable. Rather, we deal with the tools we have. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Sat Nov 28, 2009 7:55 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7723
|
With proper use of AdoptNewMCA, a long or 0 DISCINT is no longer as problematic as it used to be, since the RCVR channel can recover from an blocking on an orphaned socket.
There are still reasons I will never code 0 for DISCINT.
Consider Queue Manager A that has RCVR channels from 1000 other QMs. If one of those 1000 other QMs permanently goes away, and somone doesn't follow due process and clean everything up, do you want that orphaned RCVR channel running forever and ever? Throw a DISCINT of 999,999 versus zero if you must, at least that will allow these types of things to clean up.
Some shops monitor channel status and alert on retrying channels. Consider a channel that gets no traffic from midnight to 6 AM. At 1 AM the cleaning crew spills coffe on a router causing a network outage which is fixed by 5 AM. The MQ Admin whose DISCINT is set to allow that channel to end on its own stays sleeping all night. The MQ Admin who has DISCINT set to 0 is getting paged for a retrying channel.
And mqjeff's point - a QM with 1000 RCVR channels, all of them running unnecessarily running is a waste of resources, however small. If anything, it makes my DIS CHANNEL STATUS (*) command take a lot longer to run and it's output a lot bigger than it needs to be when I'm chasing down some other problem.
Its kinda like turn out the light when you leave the room. Is it some great crime if you don't? No. But it doesn't mean its not the right thing to do if you aren't going to be in the room for a while. There are lots of factors in deciding if its the right thing to do to turn out the light if you are only going to be out of the room for a minute. I'm talking about when you are leaving for a while, like an hour or more. Same thing for MQ channels - I don't have my DISCINT set to some ridiculously small number either. That's as bad or worse than DISCINT 0. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
jcv |
Posted: Sat Nov 28, 2009 3:11 pm Post subject: |
|
|
 Chevalier
Joined: 07 May 2007 Posts: 411 Location: Zagreb
|
Let me repeat what my point was, and what was not. I never said that idle channels of any type should be running unnecessarily if there is no reason (tight SLA) which forces that. I just said that it makes no difference in your scenario whether you keep your channels running by setting discint to 0 or to the estimated # that is sufficient for idle night period. Now, if you have 1000 idle channels that must be running because of tight SLA, I really don't see what you can do about it. I don't have it, because I don't have such tight SLA. I would say the same thing about monitoring retrying status. That's more probably problem for you, than for an MQadmin who doesn't have that tight SLA, and who can use for example default DISCINT. |
|
Back to top |
|
 |
bruce2359 |
Posted: Sat Nov 28, 2009 4:19 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9482 Location: US: west coast, almost. Otherwise, enroute.
|
jcv wrote: |
I really don't see what you can do about it... and who can use for example default DISCINT. |
There's a general consensus:
1) networks are prone to failures from various sources
2) WMQ provides some tools for managing your channels (disconnect interval, heartbeats, retry counts and intervals, and other channel attributes, triggering)
3) there are other tools available from IBM (Tivoli)
4) there are other tools available from 3rd-party vendors
5) we keep trying this and that until it gets better, gets worse, or stays the same. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|