ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Confusion over Heartbeat

Post new topic  Reply to topic Goto page Previous  1, 2, 3  Next
 Confusion over Heartbeat « View previous topic :: View next topic » 
Author Message
JosephGramig
PostPosted: Tue Nov 17, 2009 5:10 am    Post subject: Reply with quote

Grand Master

Joined: 09 Feb 2006
Posts: 1244
Location: Gold Coast of Florida, USA

imho...

It is a bad idea to set the DISCINT=0 because eventually the IP connection will fail and the channel will stop. When it is set to 0, it will not restart automatically.

The reason to set it small or large depends on things like, my application is very sensitive to channel startup times. You might consider a cron job that goes off at a time before the channel is needed to start it (if that is knowable).

On channel stanzas, my favorites are:
Code:

CHANNELS:
   MaxChannels=<some large number>
   MaxActiveChannels=<some large number>
   AdoptNewMCACheck=ALL
   AdoptNewMCA=ALL
   AdoptNewMCATimeout=60
TCP:
   KeepAlive=Yes
   ListenerBacklog=256
QMErrorLog:
   ErrorLogSize=1048576

I invite debate about the AdoptNewMCA settings...
Back to top
View user's profile Send private message AIM Address
PeterPotkay
PostPosted: Tue Nov 17, 2009 5:13 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7723

We have one application with super tight SLAs. The total request / reply has to traverse about 6 QMs. The time it takes for a SNDR channel to trigger and start up is enough to push them over the SLA if 3 or 4 of the SNDR channels in the roundtrip have to start up. For this scenario, I set DISCINT on the channels to a number big enough to keep the channel running overnight when there might not be any traffic for 6-8 hours, so that the channel would be running first thing each morning ready for transaction #1 to zoom thru.

Even though I wanted the channels to "always" stay running, I still didn't pick zero. I just picked a # that got me thru the night. There are cluster channels involved and an MQ god a few years ago drilled it into my head that especially in clusters you want to give the channels the oppourtunity to age out gracefully and end on their own. Th eexact scenario escapes me on why this was needed in clusteres, or if its still even required in current versions. But one thing is certain - I don't need or want 0 for DISCINT as there is no channel in this world that needs to run forever, although zero is an easy (lazy?) way of stating I want this channel running all the time. It is very rare for this to be an actual requirement.


Apply the same logic to Expiry. Really, -1? You need that message to be there 34 or more years from now?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Nov 17, 2009 6:36 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9482
Location: US: west coast, almost. Otherwise, enroute.

Quote:
Because, this sounds a bit like it is absolutely bad choice under any circumstances.

There have been many, many posts here about disconnect interval, and its uses and misunderstandings.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Nov 17, 2009 7:49 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9482
Location: US: west coast, almost. Otherwise, enroute.

Channel misunderstanding #1: a channel in RUNNING state means that the channel is either transmitting messages OR is capable of transmitting messages.

Channel reality #1: a channel in RUNNING state means that as of the precise instance (that you did the DISCHS command, for example), the MCA was transmitting messages OR believed that it could.

This is a subtle difference. If the channel is INACTIVE (disconnect interval exirped), when the MCA next attempts to send messages will it discover whether it can send messages or not. This subtle difference is not well-explained in the manuals.

So, if, as Mr. Potkay describes, the channel is in RUNNING state AND disconnect interval has not expired (or is set to zero to never expire), the MCA believes it can transmit messages.

This is why a long (or zero) DISCINT value doesn't always result in never-ending channels. Historically (old, slow, and under-provisioned servers), DISCINT expiry allowed the release of valuable resources - the aggregate goal of conservation of server resources for the good of all. This is less applicable to multi-processor, and large RAM servers today.

Heartbeats allow the channel ends to keep current on the health of their partners - up to DISCINT. But, then again, heartbeats allow the channels to disconnect (become INACTIVE) when there are no messages to flow - thus preventing the channel partners from discovering each-others current health. Odd, isn't it?

There is balance between disconnect interval and heartbeat interval, and the usual desire to keep channels successfully transmitting messages.

There is no good or bad choice here - just a choice.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
jcv
PostPosted: Mon Nov 23, 2009 5:44 am    Post subject: Reply with quote

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

Peter, your scenario seems to me a pretty solid reason for big discint usage. I don't know then why you actually said there should be no such reasons, maybe you think there should be alternative ways to fulfill such SLA without using big discint, maybe you have some in mind which you can already use, maybe there is something else that you didn't mention. I also can't see the benefit in the fact that you picked a # that got you through idle periods, instead of 0. What practical reason do you have for avoiding 0 for a channel which needs to always stay running?
Back to top
View user's profile Send private message Visit poster's website
exerk
PostPosted: Mon Nov 23, 2009 5:49 am    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

jcv wrote:
...What practical reason do you have for avoiding 0 for a channel which needs to always stay running?


Because, as JosephGramig has stated: "...It is a bad idea to set the DISCINT=0 because eventually the IP connection will fail and the channel will stop. When it is set to 0, it will not restart automatically...."
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Mon Nov 23, 2009 6:02 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7723

jcv wrote:
Peter, your scenario seems to me a pretty solid reason for big discint usage. I don't know then why you actually said there should be no such reasons, maybe you think there should be alternative ways to fulfill such SLA without using big discint, maybe you have some in mind which you can already use, maybe there is something else that you didn't mention.

I never said that you should never use a large DISCINT.

jcv wrote:
I also can't see the benefit in the fact that you picked a # that got you through idle periods, instead of 0. What practical reason do you have for avoiding 0 for a channel which needs to always stay running?

Read the 11-17-09 08:13 AM post again.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
SAFraser
PostPosted: Mon Nov 23, 2009 2:56 pm    Post subject: Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

JosephGramig wrote:


It is a bad idea to set the DISCINT=0 because eventually the IP connection will fail and the channel will stop. When it is set to 0, it will not restart automatically.



Respectfully.... That is not how we see it work at our site. We have channels set to DISCINT=0 that go down every Sunday because the target machine is taken down for a cold backup. The first error is "remote channel unavailable" and the next is "the attempt to allocate a conversation using TCP/IP to host 'xxxx' was not successful".

The channel does not stop, it goes into retrying mode. When the target machine comes back up some hours later, the sender's retry succeeds. The channel comes back up without human intervention.

I am not saying that DISCINT=0 is good. (Though I've never agreed that it is as evil as others say!) I am just sharing our current experience.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Mon Nov 23, 2009 4:10 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7723

Shirley, see my previous comments about how a SNDR channel can auto recover since it initiates work, but the RCVR channel just sits there.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
JosephGramig
PostPosted: Tue Nov 24, 2009 10:36 am    Post subject: Reply with quote

Grand Master

Joined: 09 Feb 2006
Posts: 1244
Location: Gold Coast of Florida, USA

Shirley, because it broke and knew it, didn't run out of both short retries and long retries, it recovered.

So short and long retries are another thing to consider about channels unless you take our advice and set it to a very long DISCINT instead of 0.

So, do you really love seeing all the retry messages in your logs and how it obfuscates all other issues that could be happening?
Back to top
View user's profile Send private message AIM Address
jcv
PostPosted: Tue Nov 24, 2009 4:44 pm    Post subject: Reply with quote

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

PeterPotkay wrote:
Read the 11-17-09 08:13 AM post again.

The only thing there that might look substantial, is that in certain versions (probably older ones) because of a certain scenario, certain channel types (especially cluster ones?) require occasional (regular?) graceful aging out, that is, ending on their own. Reported to you, by an MQ god, a few years ago. Now, shall I build my practice based on such information?
Could you be more specific about that detail before I do that?
Even if that's true, or even if it sounds logical that certain piece of sw likes to be gracefully ended from time to time, this doesn't mean that you cannot apply that kind of maintenance on a channel in a way that most of the time its effective discint setting is 0.
Back to top
View user's profile Send private message Visit poster's website
mqjeff
PostPosted: Tue Nov 24, 2009 5:14 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Either there is some period of time in which your MQ channels are inactive enough in which they can shut down without causing significant business impact - and it is a good idea to allow them to do so because they are otherwise sitting there using network bandwidth, CPU and memory which means more ENERGY and HEAT generated for no actual measurable value.

In this case you should not have DISCINT =0.

Or there is no period of time at all in which your MQ channels can be that inactive.

In this case it doesn't matter what DISCINT is because there is always too much traffic moving over them for them to quiesce in the first place.

In those in-between cases where an app can't tolerate waiting for the channel to start, but is otherwise not sending messages all of the time, you should still set DISCINT to a period of time to shut the channels down during the application's inactivity, and then use scheduling tools to start the channels sufficiently before the application activity starts.

Any other sufficiently complicated edge cases should likewise be answered with a sufficient business justification why the business loses more money from the SLA impact of *one* message than it does by having channels using valuable system resources for no purpose.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Nov 24, 2009 5:16 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9482
Location: US: west coast, almost. Otherwise, enroute.

Quote:
Reported to you, by an MQ god, a few years ago. Now, shall I build my practice based on such information?

Without knowing the God of whom you speak, and the version/release to which He was referring , I'd categorize this as a ROT (Rule of Thumb) that has likely become an IROT (Irrelevant Rule of Thumb). This frequently happens as software matures.

Here at mqseries.net there is lots of experience. You've been offered some historical perspective, as well as some guidelines. The WMQ Intercommunication manual is full of great technical stuff about the current versions of MQ, and how channels work and fail.

Beyond all of that, you get to make a decision about disconnect, heartbeat, retries, and a fleet of other stuff. Each IT shop has differing business requirements. You should document what you do; then revisit the issue from time to time to see if what you've chosen is working for you and your organization.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Nov 24, 2009 5:39 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9482
Location: US: west coast, almost. Otherwise, enroute.

Quote:
...it is a good idea to allow them to do so because they are otherwise sitting there using network bandwidth, CPU and memory which means more ENERGY and HEAT generated for no actual measurable value.

Uh...

Channels not currently transmitting messages use very little bandwidth and CPU. RAM utilization (by MQ) does go down some. However, the heat generated/not-generated, and energy used/not-used will be marginal, at best.

If the server under-provisioned, the CPU and RAM saved might be a benefit to aggregate throughput.

But I'd be hard-pressed to recommend these 'savings' to management as reasons to let channels quiesce.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Tue Nov 24, 2009 6:48 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

bruce2359 wrote:
Quote:
...it is a good idea to allow them to do so because they are otherwise sitting there using network bandwidth, CPU and memory which means more ENERGY and HEAT generated for no actual measurable value.

Uh...

Channels not currently transmitting messages use very little bandwidth and CPU. RAM utilization (by MQ) does go down some. However, the heat generated/not-generated, and energy used/not-used will be marginal, at best.

If the server under-provisioned, the CPU and RAM saved might be a benefit to aggregate throughput.

But I'd be hard-pressed to recommend these 'savings' to management as reasons to let channels quiesce.


A single MQ channel? yes. Depending on HBINT and KAINT.

All cluster channels in a cluster that covers 1,000 queue managers over a large geographic area? Not so negligible.

Regardless of situation, a needlessly running channel doesn't have "zero impact". So if the app can't show any real gain from having the channel running all of the time, you are still wasting some resources for no gain.

And any process runs the risk of failure over an extended period of uptime. You still IPL your mainframe, right? So why not quiesce your channels for the same reasons.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2, 3  Next Page 2 of 3

MQSeries.net Forum Index » General IBM MQ Support » Confusion over Heartbeat
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.