MQSeries.net :: View topic - sender channel status: retrying

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » sender channel status: retrying

Goto page Previous 1, 2, 3 Next

sender channel status: retrying

« View previous topic :: View next topic »

Author

Message

Vitor

Posted: Thu Jul 19, 2007 4:42 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

jcv wrote:

I'm still confused. And how do you obtain, in that case, LSTLUWID from the receiving side, if you dropped the qmgr on the receiving side? You need it, to compare it to a CURLUWID on a sending (in-doubt) side.

Or you need to determine if the messages in that unit of work were received using external means. Then (or) determine if the messages can be discarded or resent to a replacement queue manager. Or if they should be discarded or resent to a replacement queue manager.

This is why recovery can't be automatic. The software has no means of performing this kind of investigation. If it was as easy as checking 2 numbers the MCAs would do it.

_________________
Honesty is the best policy.
Insanity is the best defence.

jcv

Posted: Thu Jul 19, 2007 7:08 am Post subject:

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

O.K., I agree with you, now that you mentioned external means of investigation. I was confused at a certain point of discussion because it seemed to me that you objected to my conclusion that LUWID comparison (plain and simple checking of two numbers) is not applicable to a Nigel's example. Which must be not applicable, when it's not possible.

jcv

Posted: Thu Jul 19, 2007 10:07 am Post subject:

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

Now that I have studied these two topics:

http://www.mqseries.net/phpBB2/viewtopic.php?p=159310
http://www.mqseries.net/phpBB2/viewtopic.php?t=33617

I'm even more confused.

Quote:

Do the assembled:

a) agree with this interpretation;
b) believe that it's still valid for the failing channel to be checked for in-doubt status in the situation described in the original post?
c) If not, and RESOLVE CHANNEL is only for use with in-doubt channels where the target is no longer serviceable, how would the decision be made between COMMIT & BACKOUT? What would be the point of COMMIT?

Isn't your option c) from the topic t=33617, practicaly the same kind of question I have asked Nigelg, when I concluded that LUWID comparison will not be applicable to, what it seems to be presented as, a typical example of unusual conditions where the resolve is necessary? And you didn't get an answer there. So I will rephrase my question again: Why did they prepare a whole chapter in IBM documentation about in-doubt channels, with some proposed procedures in there, if it's not applicable to a main real life scenario?

http://publib.boulder.ibm.com/infocenter/wmqv6/v6r0/topic/com.ibm.mq.csqzae.doc/ic11350_.htm

Vitor

Posted: Thu Jul 19, 2007 11:36 pm Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

As you will have seen from the 2 threads you mention, it's a complex and contentious subject. I'll underline the fact that I don't have divine authority, nor do I work for IBM, so my opinions are just that & my experiences are my own.

As I intimated further up the thread, channels need to be resolved in unusual and exceptional circumstances. Since the early fixes of v5.3 the MCAs have become a lot better at automatic recovery (one of the points raised in the other discussions). Under the majority of circumstances (I'll accept Nigel's 99.9%) channels never need to be resolved.

The reason there's a chapter in the manual is because some situations are not (as you put it) main real life. The facility exists in the product in case you need it; you may never need it, you should never need it, if you need it on a regular basis you have a serious configuration or infrastructure problem. I've used the example of message id; another example is Morse code. All pilots and emergency workers are required to know it to a given standard, even in these days of modern communication, phones, GPS and whatever. They will in all certainty never need the knowledge, but if they ever need it they will really need it.

Let me pose a question back at you: what are you driving at? Is your assertion that the RESOLVE CHANNEL be removed as redundant and all channel resolution be left to the automated processes?
_________________
Honesty is the best policy.
Insanity is the best defence.

jcv

Posted: Fri Jul 20, 2007 1:47 am Post subject:

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

Vitor wrote:

I'll underline the fact that I don't have divine authority, nor do I work for IBM, so my opinions are just that & my experiences are my own.

I fully understand your position, as well as Nigel's. And you understand that my position is to be a service provider in some cases, as well as to be a service consumer in other, particularly in this one too, so I must understand you both.

Vitor wrote:

Let me pose a question back at you: what are you driving at? Is your assertion that the RESOLVE CHANNEL be removed as redundant and all channel resolution be left to the automated processes?

No. I'm driving at clear and clean documented situation. Would you agree on that huge elaboration on things not so relevant, and lack of information on what's relevant in some case and what's not, blurs the documentation? Why do you think that your knowledge should be result of long and exhausting discussions, instead of simply looking at the manual which in concise way depicts the whole situation? I had to say that, even though when I enjoy discussing things with you, and appreciate your every post.
My assertion is that this discussion would never take place, if the defaults were like this:

LONGRTY(999999999) LONGTMR(120)
SHORTRTY(10) SHORTTMR(60)

or simply like this:

RTY(999999999) TMR(60)

instead of:

LONGRTY(999999999) LONGTMR(1200)
SHORTRTY(10) SHORTTMR(60)

I'm missing the reasons for IBM's choice, and I would like to know them.

Vitor

Posted: Fri Jul 20, 2007 2:03 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

jcv wrote:

Fair enough. My, perhaps jaded, experience is that technical documentation is never clear and IBM documentation shares this attribute. It is on this truth that this forum has it's existence. It's also my experience that life seldom presents clear and clean situations. For my part, I'd sooner have facilities in the software which are seldom used and logically unnecessary in case I need them, rather than rely on situations confoming to a pattern.

Maybe I'm getting too old for all this.

jcv wrote:

My assertion is that this discussion would never take place, if the defaults were like this:

LONGRTY(999999999) LONGTMR(120)
SHORTRTY(10) SHORTTMR(60)

or simply like this:

RTY(999999999) TMR(60)

instead of:

LONGRTY(999999999) LONGTMR(1200)
SHORTRTY(10) SHORTTMR(60)

I'm missing the reasons for IBM's choice, and I would like to know them.

Well as I've said I can't speak for IBM. There are a couple of theoretical explainations:

- The defaults were set back in the day. Changing them now would affect every site which has not deliberately set them.

- The settings you postulate mean the channel will attempt to recover automatically almost endlessly. If an exceptional situation does arrive and recovery is impossible, as an MQ administrator I'd like to be notified by the system sooner rather than a mob of enraged users later. Hence the defaults allow for IBM's view of a "reasonable" attempt at recovery; if this fails the channel stops to highlight the issue to a human.

Maybe a passing IBMer can give a more authorotative view.
_________________
Honesty is the best policy.
Insanity is the best defence.

jefflowrey

Posted: Fri Jul 20, 2007 3:50 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

jcv wrote:

I'm missing the reasons for IBM's choice, and I would like to know them.

Get used to disappointment?

I think you will find that there's not a large history or precedent for the people who frequent this place, AND work for IBM, AND are responsible for making decisions about how the product works... to take the time to explain themselves to people who are second-guessing them.

Which is really just another way of saying that this is not an official IBM forum, and everyone here contributes based on their own whims and desires. If you're feeling the need for an official justification, I believe the official channels start with your sales rep.

In general, my experience has been that if there is a reasonable sounding technical explanation for something, then it's probably a good part of the actual decision making process.

But this is just my experience and my opinion.
_________________
I am *not* the model of the modern major general.

jcv

Posted: Fri Jul 20, 2007 7:40 am Post subject:

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

Vitor wrote:

My, perhaps jaded, experience is that technical documentation is never clear and IBM documentation shares this attribute. It is on this truth that this forum has it's existence.

I don't see it that way, I think that things are always the way you accept them. I don't see any reason why should IBM's documentation share this attribute. You can't say for any software product, particulary for this we are dealing with: it's a great product, it's just not sufficiently well documented. You can say: if it's obscurely documented, it's no good. But hey, this is just my opinion.

Vitor wrote:

- The settings you postulate mean the channel will attempt to recover automatically almost endlessly. If an exceptional situation does arrive and recovery is impossible, as an MQ administrator I'd like to be notified by the system sooner rather than a mob of enraged users later. Hence the defaults allow for IBM's view of a "reasonable" attempt at recovery; if this fails the channel stops to highlight the issue to a human.

My first postulate does not mean that kind of change in behaviour, comparing to present state of defaults. It just means more frequent long retry attempts, you might qualify that perhaps only as a resource waste. Also, I don't see why would it mean you can't be notified by the system before you receive a call from enraged user. Could you explicate it a bit more please?

Vitor

Posted: Fri Jul 20, 2007 10:04 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

jcv wrote:

If a channel is retrying, there's no way (obviously) to tell if it can resolve the issue. During this process no messages are flowing. If the channel is left to keep retrying on the assumption that the problem will eventually resolve itself (which in 99.9% of cases it will do) this can lead to a significant interuption of service to the users.

Consider - there's a network failure (dead router). Once the router is repaired or placed, the MCAs will recover the channel automatically and the channel will be retrying until the repair is made. No problem.

If the network problem goes undetected for a period of time, the non-transmission of MQ messages will cause problems for the users, who will complain to the administrator (me) that MQ is "not working". A brief investigation will show the "missing" messages in the XMITQ and the culprit revealed. MQ will still take the blame.

If on the other hand the channel retire for a period then stops, this will be reported to the administrator, who can in turn ask the network people if they're taking action. Now when the users complain, I can say "Not MQ, we've already identified the network issue and it's being worked on by that guy over there".

Maybe your site doesn't play the blame game. Maybe it's just my jaded nature again.

_________________
Honesty is the best policy.
Insanity is the best defence.

jcv

Posted: Sat Jul 21, 2007 1:06 am Post subject:

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

Of course it does, every site plays that game. I was driving now at the fact that you perhaps can highlight channel already in retrying state as an issue, for example QFLEX gives you opportunity that if you choose Channel as Monitor object type, you can choose for Triggering condition (among others) for example Channel is Retrying or Channel is Not Running option. Although Instrumentation Events don't define for Channel and bridge events such an event (I'm afraid to ask why), that's why you probably said:

Quote:

If on the other hand the channel retire for a period then stops, this will be reported to the administrator

there are hopefully ways to tell network guys to check things before channel stops. Anyway, when I said that my postulate does not mean that kind of change in behaviour, that was because LONGRTY default already means channel will attempt to recover automatically almost endlessly. I suppose you always change it to lower value than 999999999 for you channels? You probably noticed that I said "My first postulate", that was because I wasn't sure what happens when SHORTRTY is spent, in terms of events, so I checked the documentation, and saw that nothing happens. Channel only goes from Retrying to Stopped after LONGRTY attempts, and afterwards channel event is generated.
Did you also noticed that I said it would be reasonable to stop the channel not to allow it to go in-doubt, and I didn't emphasize quiesce mode option, not because it was default. I wasn't even aware of the fact there are other modes which might lead to an in-doubt state. On the other hand, if you stop qmgr on receiver's side with quiesce option, without stopping the channel first, that would probably not lead to an in-doubt state?

Vitor

Posted: Sat Jul 21, 2007 4:13 pm Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

- not all sites have QFLEX or other monitoring tools. Of those sites that do, not all chose to purchase the relevant MQ component. The only thing which may be relied upon is the built-in functionality, which is the channel event on STOP. The most obvious other method is to determine xmitq depth, but this is highly reliant on performance and can leave to false positives or non-detections.

Naturally if you have access to more sophisitcated monitoring tools, you'll take that into account when defining MQ objects.

- If the queue manager is set to quiesce, part of the orderly closedown is to complete the current batch of channel work. Hence the channel will not be in doubt.
_________________
Honesty is the best policy.
Insanity is the best defence.

jcv

Posted: Sun Jul 22, 2007 4:24 am Post subject:

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

We don't have yet such a tool, but we are considering it. My point regarding QFLEX was: When you (your site) considered such tools which rely on Real-time Monitoring, what were pro's and contra's, why did you decide to stick to Instrumentation events? When I said "perhaps" and "hopefully", that was because I'm not sure if it's recommendable to use those tools, not because I thought I discover some new things here. As I understand this, such tools connect as MQ clients, talk to command server issuing in regular intervals PCF's to determine, for example, channel status. Do you think there is something questionable about this approach in terms of resource consumption, reliability or something else? I'm asking it because IBM didn't offer such a tool by itself, except for MO71 (AFAIK), which is not a part of standard MQ package, for some reason, and I would like to know this reason.
I'm not looking for an offical explanation through a wrong channel, this forum is public, and as such, I'm looking for public opinion from everyone who is willing to give one.
Regarding our problem, channel in retrying state, it would be interesting to leave it to recover automatically endlessly, and to be notified about it from the start. Do you agree?

Vitor

Posted: Sun Jul 22, 2007 5:58 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

jcv wrote:

You're asking questions with as many answers as people giving them. If you want to obtain view on monitoring, I would recommend you start a new thread in the monitoring section. If you want a brief personal view, developing your own monitoring with PCF or likewise is reinventing the wheel; the decision on my current site to not purchase was entirely financial.

But as I say, this is better posed in the Monitoring section. It's off topic a bit here.

jcv wrote:

Regarding our problem, channel in retrying state, it would be interesting to leave it to recover automatically endlessly, and to be notified about it from the start. Do you agree?

If you say so. I'd sooner know when automatic recovery has failed, then be pestered even time the network hiccups. But that's just me.
_________________
Honesty is the best policy.
Insanity is the best defence.

jefflowrey

Posted: Sun Jul 22, 2007 8:15 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Vitor wrote:

the decision on my current site to not purchase was entirely financial.

I'd be interested to know if that decision included costs for developing and maintaining software, as well as the risks associated with monitoring software being developed by people who haven't done it before.
_________________
I am *not* the model of the modern major general.

jcv

Posted: Sun Jul 22, 2007 12:07 pm Post subject:

Chevalier

Joined: 07 May 2007
Posts: 411
Location: Zagreb

Vitor wrote:

If you say so. I'd sooner know when automatic recovery has failed, then be pestered even time the network hiccups. But that's just me.

And how would existence of, currently non existing SHORTRTY_COUNT_SPENT channel event, change your picture regarding that? I'm not sure and I'll check it, but I think that QFLEX can be configured to filter out network hiccups, meaning that I don't receive notification immediately when channel goes to retrying state, only if it stays for a longer time.

Display posts from previous:

Goto page Previous 1, 2, 3 Next

Page 2 of 3

MQSeries.net Forum Index » General IBM MQ Support » sender channel status: retrying

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP