ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » 9.3 Linear Logging Issue

Post new topic  Reply to topic
 9.3 Linear Logging Issue « View previous topic :: View next topic » 
Author Message
edelmann77
PostPosted: Mon Oct 02, 2023 6:58 pm    Post subject: 9.3 Linear Logging Issue Reply with quote

Newbie

Joined: 26 Oct 2022
Posts: 9

We have linear logging set, with automatic maintenance, for linux RHEL7 queue manager, IMGINTVL 120, IMGRCOVO YES, IMGRCOVQ YES IMGSCHED YES. All of a sudden tonight, the /var/mqmlogs mount point (15gb max, which is usually around 10-12%) decided to hit 100 with the var/mqm/data location for queues sitting at around 14%. I cleared out all logs except those required for RESTART, and within 5 minutes, it was back at 100% utilized. No queues had any depths to speak of. I then got the mount point back to like 85%, and stopped and restarted the qmgr. At that point, the logging stopped going crazy. It's still a bit high but it's stabilized. Any ideas what may have caused this and how to perhaps better figure out what was actually chewing thru the log files?

LOG PRIMARY set to 50 secondary 20, logfilepages 8192, log bufferpages 2048.


Last edited by edelmann77 on Tue Oct 03, 2023 6:17 am; edited 1 time in total
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Oct 03, 2023 4:37 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9400
Location: US: west coast, almost. Otherwise, enroute.

What messages were written to the system log ANQERR01.LOG?

How did you decide on LOG PRIMARY set to 50 secondary 20, logfilepages 8192, log bufferpages 2048? Is this a production qmgr?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
edelmann77
PostPosted: Tue Oct 03, 2023 6:19 am    Post subject: Reply with quote

Newbie

Joined: 26 Oct 2022
Posts: 9

It's production "like" actually. It's possible that we didn't size things appropriately given those log attributes. This particular queue manager supports other agency testing, not "our" testing so from that point of view, it is only a notch down from general production.

The only messages in the error log were effectively log space full. Connections that use the qmgr failed after a while, but I was able to runmqsc and access the qmgr to issue a display qmstatus all.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Tue Oct 03, 2023 7:00 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9400
Location: US: west coast, almost. Otherwise, enroute.

edelmann77 wrote:
...but I was able to runmqsc and access the qmgr to issue a display qmstatus all.

Please post the results here.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Andyh
PostPosted: Tue Oct 03, 2023 7:51 am    Post subject: Reply with quote

Master

Joined: 29 Jul 2010
Posts: 237

You could use the dmpmqlog to format the content of one or more of the recovery log extents that were filled very quickly in order to see what the queue manager was writing to the logs.
Note that the queue manager has to be stopped before you can format the logs (there are ways around this if you really need to avoid stopping the QMgr, but as this is "production like" I hope that's not necessary).

If everything you've said is accurate then it sounds very like you should be opening a PMR with MQ support. Every now and again the QMgr will write the persistent content of the queues to the log (in order to support media recovery). This activity should generally be spread out over time when the QMgr itself is scheduling the media images, but even if it all happened in one go it shouldn't involve writing more data than is stored in the data file system to the recovery log.
Back to top
View user's profile Send private message
edelmann77
PostPosted: Tue Oct 03, 2023 6:08 pm    Post subject: Reply with quote

Newbie

Joined: 26 Oct 2022
Posts: 9

Actually - perhaps we have stumbled across a logical error in electing to change the interval for the auto record image to be 2 hrs vs what we used to have of 15 min. It seems that if the interval for auto recording is set to 2hrs, that ongoing activity of large message transfers thru the qmgr can cause a lot of logs to be filled up while it's hanging on to the 2hr long media image it took. Once the 2hrs is up, a new media image is written, and all old ones are removed (automatically). But I think if you have a fair amount of large msg traffic (and the msgs are hanging out in cluster transmit queues due to latency in actual msg delivery to a remote qmgr, the log space will be consumed by more and more linear logs ... I reverted to a 15 min log interval and things appear to be holding their own. However, a 15 min interval doesn't work well, if you keep writing a huge amount of msg traffic into the logs every 15 min (as a result, perhaps of having a lot (>5000) large 5+mb msgs on a cluster xmitq for instance). So it's definitely a balancing act.
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Wed Oct 04, 2023 2:56 pm    Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2495
Location: Melbourne, Australia

Why are there a large number of messages on the cluster xmit queue? In normal operation, the depth should be zero most of the time.

Was there a reason to try increasing the auto record image interval to 2 hrs?
_________________
Glenn
Back to top
View user's profile Send private message
Andyh
PostPosted: Thu Oct 05, 2023 2:27 am    Post subject: Reply with quote

Master

Joined: 29 Jul 2010
Posts: 237

At the time that automatic log management and media imaging were added a number of changes were made to improve the efficiency of taking the media images.
As has been discussed here many times, MQ is primarily a data transport rather than a data store. If a large volume of persistent data is stored in MQ then that data will be dumped to the recovery log every time a media image is taken. Conversley, if ALL of the persistent data was only ever very transiently hosted in a queue then none of that data would be dumped into the log when the media image of that queue was taken.
So at a 100,000 foot level one could argue for frequent media imaging where queues are predominantly empty (leading to lower disk space utilization) and less frequent imaging where the queues are deep (with corresponding high disk space requirements).
The fly in the ointment is that one of the main reasons for using asynchronous messaging is to be able to handle a down stream outage more gracefully. When such an incident occurs then the queues typically become deep and at this time high frequency media imaging could become an issue. Indeed, if some queue (usually a transmit queue to a down stream service that is currently offline) becomes very deep then it's a good time to consider suspending automatic media imaging (which implies having sufficient disk space allocated the the filesystem hosting the MQ recovery logs).

Using linear logging with MQ used to have very significant performance and admin overheads (not only when the media images were being taken) but the V9 changes have eliminated most of those overheads. Hopefully the major consideration now is the recoverability requirements on the data.
There's more redundancy in a system using linear logging and in the event of some unexpected issue (often operator error!) then more options available to a customer using linear logging. That being said, I would hope that the vast majority of customers NEVER have to recover an object from the recovery logs (rcrmqobj).

Note also that much of the MQ technology is now thirty years old. Thirty years ago a TB of disk space woud have been unheard of, while now it's as cheap as chips. A need to minimize MQ disk space utilization would have me questioning my choice of storage provider! if you've got terrabytes of data unexpectedly buffered in MQ queues then you've probably got bigger issues to worry about.
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Thu Oct 05, 2023 2:30 pm    Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2495
Location: Melbourne, Australia

Hi Andy,
I concur with your comments about disk space. Most of our prod MQ systems have 100's GB of space.
For DR purposes, we provision enough disk space to store at least 24 hours of transaction messages. In normal operations we are lucky to use 1% of this at any one time for queued data.
_________________
Glenn
Back to top
View user's profile Send private message
edelmann77
PostPosted: Wed Oct 18, 2023 11:24 am    Post subject: Thanks Reply with quote

Newbie

Joined: 26 Oct 2022
Posts: 9

Thanks everyone - i totally agree that the deep xmitq is the root of the problem. We don't typically operate that way - but we had a particular situation during this episode, where "large" 5+ mb msgs were being sent over the WAN from our qmgr in AWS WEST to another queue manager not collocated there at all, and so we were experiencing huge latency in the channels of moving the msgs, hence the buildup. It would literally take a few minutes to move messages of that size off the xmitq (15-20 tops per min, while hundreds were being dumped onto the queue). We can achieve 3-4000 msg per min for msgs across the WAN from other MQ partners just fine as long as they are reasonably sized. The channels involved here are dedicated to "file" sized type msgs to keep the backlog (should it ever occur) from impacting normal higher priority msg traffic. We thus have two clusters, one the normal one, and this "file" based cluster that is used to route larger msgs. The problem occurred when a particular customer decided to send GBs of a large database file their application chunked up as 5MB msgs across the pipes. So no one expected this actually. Eventually i even unshared the alias in the cluster so that the application putting these big mgsgs to the queues would get an error instead of adding to the dilemma. Everyone, it turned out, was getting upset at everyone else as a result of the impact being caused by this unruly customer. Obviously we explained as how MFT is a far better solution for these sorts of things, but customers will do what they will with their queues, unbeknownst to the MQ admins. Is there a specific recommendation for qmgr to qmgr sender /receiver channels (cluster) that would be used to send 5+msgs regularly?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » 9.3 Linear Logging Issue
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.