|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
MQ/Queue Manager Backup and Recovery |
« View previous topic :: View next topic » |
Author |
Message
|
bruce2359 |
Posted: Thu Apr 15, 2010 6:43 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9482 Location: US: west coast, almost. Otherwise, enroute.
|
Back to the OP:
Quote: |
...or should my first step always be to attempt to re-apply the log/data directory (and registry?) from backup? |
Your first step should be to assess the extent of damage. Your next step might be to attempt qmgr restart. (Think first, do second.)
Will the qmgr restart successfully? If so, the restart process will replay the logs, and bring the qmgr back to its last consistent state.
If the qmgr fails to restart, then you will need to go through a manual recovery process. The steps needed for this will depend on the extent of damage.
Restoring portions or all of the file-system (or Registry) may or may not work. Again, it depends on what is damaged.
You are asking for a quick and simple answer to resolving a possible loss or corruption.
A former boss once complained to me that after an outage we took far too long to do problem-determination, compared to how quickly we recovered from the outage. Over and over, I tried to explain that (for the bulk of outages) each outage was different from those that preceded it. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Apr 15, 2010 7:29 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9482 Location: US: west coast, almost. Otherwise, enroute.
|
Problem-determination (p-d) is a skill set developed through knowledge and experience. Understanding the WMQ product, what it does and how it behaves, is a prerequisite to effective p-d. IBM documentation and WMQ administration and programming courses can provide a good base.
There are a few rules for effective problem management.
The first rule: Do no harm. This means DO NOT begin by making changes. The changes may mask the underlying problem and/or make underlying problem worse.
The second rule: Investigate. Gather information about the problem and its symptoms. Document what you have found. Effective problem-resolution means taking the right actions based on the symptoms presented.
Third rule: Plan a strategy for resolving the underlying problem - a fall-forward plan. Without a plan (written, preferably), you will likely miss one or more steps. More likely, given the stress of an outage, you will make mistakes just to get MQ fixed quickly.
Fourth rule: Have a backout plan - a fall-back plan. Before you take action, understand how you will be able to recover from your attempt to resolve the problem - should it fail. This gets back to Rule 1.
Fifth rule: Take appropriate action. This, I believe, is at the core of your OP. Slamming a solution into place, hoping that it works, is a recipe for further disaster; and pretty much ensures that you will not learn anything meaningful, consistent and pervasive.
The Triad: In all things, you have three balancing factors to contend with: Time, Quality, Cost. You (your organization) can only have 2 of the 3 in the Triad.
If your management is only concerned with Time - get the qmgr back up as fast as possible, then they will sacrifice Quality and/or Cost. Yes, its up and running, but with (possibly) lost messages.
If management is only concerned with Quality (no lost messages), it will sacrifice Time and/or Cost.
I/we do appreciate that you want an exact answer to 'will this work?' It's never this simple. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
John89011 |
Posted: Thu Apr 15, 2010 3:25 pm Post subject: |
|
|
Voyager
Joined: 15 Apr 2009 Posts: 94
|
Yea when I am on the outage calls all they want to know is "is it fixed yet?" They don't care what the problem is/was. Although I'd like to spend time identifying/alalyzing the issue rather then just solving it or making the problem worse especially when they claim (which is often) that it's an MQ issue which is really a result of something else.  |
|
Back to top |
|
 |
gbaddeley |
Posted: Thu Apr 15, 2010 4:02 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
bruce2359 wrote: |
Problem-determination (p-d) is a skill set developed through knowledge and experience. Understanding the WMQ product, what it does and how it behaves, is a prerequisite to effective p-d. IBM documentation and WMQ administration and programming courses can provide a good base.
There are a few rules for effective problem management.
The first rule: Do no harm. This means DO NOT begin by making changes. The changes may mask the underlying problem and/or make underlying problem worse.
... |
Yes. If a queue manager won't start, it is usually a good idea to take a backup copy of the MQ directories (qmgrs, logs, errors) and config files *before* you start trying to fix the problem. This allows you to do selective back outs if you make a mistake or you want to try something else while attempting to get MQ back up. It also allows you to do some root cause analysis after service has been restored. _________________ Glenn |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu Apr 15, 2010 4:09 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9482 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
They don't care what the problem is/was. |
Wow. What an attitude. End-users and management often state this.
I'm guessing that there might be messages in queues at the instant of an outage, and that these messages might be lost as a result, and that the messages are of some dollar value to the organization.
That said, your organization has chosen Time (fix it as quickly as possible), at the expense of Quality (possible loss of messages) and Cost (missed funds-transfers, lost sales orders, etc.)
I'm guessing (hoping) that your organization is not a bank. My bank.
[edit] Cleaned up my typos and punctuation. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
John89011 |
Posted: Thu Apr 15, 2010 8:41 pm Post subject: |
|
|
Voyager
Joined: 15 Apr 2009 Posts: 94
|
All the messages can be re-sent in this case and no it's not a bank
It's a wireless carrier, one of the big ones. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Apr 15, 2010 11:38 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20767 Location: LI,NY
|
John89011 wrote: |
All the messages can be re-sent in this case and no it's not a bank
It's a wireless carrier, one of the big ones. |
Does that mean that my wireless payment might get sent twice?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
John89011 |
Posted: Fri Apr 16, 2010 12:14 am Post subject: |
|
|
Voyager
Joined: 15 Apr 2009 Posts: 94
|
That would never happen... I can resubmit your order 50 times and you would still get charged once It comes more down to like if you bought a phone you won't be able to use it until its provisioned on the switch. |
|
Back to top |
|
 |
bruce2359 |
Posted: Fri Apr 16, 2010 5:28 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9482 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
All the messages can be re-sent in this case |
If messages are lost, they cannot be re-sent.
I gather from your statement that your developers have an app that can recreate messages from source data in case of catastrophic failure of a qmgr. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|