|   | 
	 
  
    | 
RSS Feed - WebSphere MQ Support
 | 
RSS Feed - Message Broker Support
 |   
 
  
	     | 
	 | 
   
 
  
	|  Shared Memory Problems | 
	« View previous topic :: View next topic »  | 
   
  
  	
	  
		
		
		  | Author | 
		  Message
		 |  
		
		  | fordcam | 
		  
		    
			  
				 Posted: Fri May 30, 2003 12:03 pm    Post subject: Shared Memory Problems | 
				     | 
			   
			 
		   | 
		 
		
		    Apprentice
 
 Joined: 28 Mar 2002 Posts: 35 Location: MGIC 
  | 
		  
		    
			  
				We've got a large Solaris machine running MQSeries 5.2, CSD1. It runs a lot of homegrown applications that are mostly written in Perl. This machine reboots every morning, and is generally trouble free. But for the last week, MQSeries has been dying around 22:00 or so each night. Connect attempts get a 2059, and the queue manager doesn't respond to runmqsc or endmqm commands.
 
 
I get a huge amount of FDCs - 450 were created last time. This is partly because no one caught that the queue manager was unusable, and an FDC was created every time an app tried to connect, right up until reboot time. 
 
 
This makes it difficult to find the root cause of the problem. But I'm seeing a lot of lines like this:
 
 
--} xstAllocBlockInExtent rc=xecsS_E_NO_MEM
 
 
so I think I'm exhausting shared memory.
 
 
We use really large settings for the kernal shared memory parameters, much bigger than the manual's minimum requirements. And we are at our busiest during the day. Yet the lockups occur at night, when it's not as busy, so I suspect that there's a leak of some sort. There haven't been any noteworthy application changes lately, so it could just be volume related. Perhaps we were barely making it to reboot time in the past and I didn't know it.
 
 
We had planned to go to 5.3 this summer. I can accellerate that, but I need to be able to keep this queue manager alive for 24 hours at a time until then. Any suggestions as to how I can do this? I would even settle for
 
being able to anticipate the problem, and recycle 'just in time'. It's disruptive, but better than what we have now. | 
			   
			 
		   | 
		 
		
		  | Back to top | 
		  
		  	
		   | 
		 
		
		    | 
		 
		
		  | tillywern | 
		  
		    
			  
				 Posted: Mon Jun 16, 2003 1:22 pm    Post subject: ipcs is you best friend. | 
				     | 
			   
			 
		   | 
		 
		
		    Centurion
 
 Joined: 28 Jan 2003 Posts: 109 Location: Colorado 
  | 
		  
		    
			  
				Well MQ is really bad at cleaning up shared memory on Solaris.   At least that is what I have seen.    I can't say that  I have seen your specific problem but I can say I have seen systems run out of shared memory, sem, and the like.
 
 
you can run ipcs to get a list of the ipc elements held by the system.    I generally run:
 
 
ipcs |grep mqm
 
 
It should give you a pretty good list.
 
 
You may have to track the number of items in thie ipcs output for an general trend upwards.   If you do this for a while you will probably understand how far you can go before MQ starts to crap out.
 
 
It is safe to say that if all queue managers  are shut down there should be no  entries in ipcs for mqm.   Often after shutting down a queue manger you will find stuff out here.
 
 
you can use ipcrm to remove the items left.    It may requrie root access to remove all of them.  If so coordinate with your sysadm.
 
 
Since ipcs resources are used for inter process communication it is safe to assume that you have a lot of mq processes running.   Check your process table for channel recievers that have been in the table for too long.    the channel reciever processes get created in profuse amounts and are known for how long they can stay around.
 
 
Based on how your channels are managed, look for long disconnect intervals,  channels may be lingering for a long time after they are used.   This is especially true if you are using server connector channels.   These processes could be holding resources that the system needs.
 
 
I used to run a script from cron that would go and look for channel reciever processes and remove them if they were really old..   Define really old in terms of what is appropriate for how long a transaction is in your enterprise.
 
 
I would start  by writing a script that monitored the mqseries specific ipc elements on the machine and looked for the createion of FDC files with respect to time.    This will at least give you an idea if what is occuring and where.    From there you should be able to see either that you are maintining a constant level if ipc objects or if they are steadly increasing.    I would also pull a copy of all processes in the process table that are associated with mqm.    This will also show you if you have a steadly increasing number of processes that aren't going away.
 
 
I hoep this starts you on your way. | 
			   
			 
		   | 
		 
		
		  | Back to top | 
		  
		  	
		   | 
		 
		
		    | 
		 
		
		  | gperera | 
		  
		    
			  
				 Posted: Thu Jul 10, 2003 6:37 am    Post subject: Solaris Kernel Parameters for MQSeries | 
				     | 
			   
			 
		   | 
		 
		
		   Newbie
 
 Joined: 30 May 2001 Posts: 8 Location: Minneapolis, MN 
  | 
		  
		    
			  
				Have you tuned your Solaris Kernel per the Quick Beginnings manual?   We received this from Level 2 support about 2 yrs ago b/c at the time the manual was incomplete/inaccurate.  
 
 
 
 
                  Solaris Kernel Parameters for MQSeries
 
 
 
     MQSeries makes extensive use of IPC (Inter-Process Communication) resources, including shared memory, semaphores, and message queues (the IPC kind).  Many Solaris systems will require some adjustment of the kernel parameters which govern these resources in order to able to run MQSeries comfortably, or to support heavily-used MQSeries installations. Indications that MQSeries lacks enough IPC resources may be an inability to start MQSeries, or difficulty in running many MQSeries programs concurrently.  Furthermore, MQSeries may generate FDC files to /var/mqm/errors which contain error messages from IPC-related functions like semget, shmget, or shmat.
 
 
     In order to make more IPC resources available to MQSeries, it is necessary to modify the kernel parameters on your machine using facilities like configure and idtune.  Use the values given in this note in preference over those listed in the MQSeries Quick Beginnings for Solaris book.  In cases where this note mentions new parameters, or overlooks some listed in the Quick Beginnings book, again give preference to this note.  For more information on modifying your kernel, refer to your Solaris documentation or contact Solaris support.
 
 
     We strongly urge you to save your current kernel configuration before trying to make any changes.  When you make changes, realise that other programs (databases, for example) which make much use of IPC resources  may force you to modify these parameters so that both MQSeries and those programs will run.  The values msgmax, msgmnb, msgssz, semaem, semume, semvmx, shmmax, and shmseg should not in general require augmentation if you are running databases or other IPC-intensive programs.  The values msgmap, msgmni, msgseg, msgtql, semmap, semmni, semmns, semmnu, and shmmni may require augmentation depending on the other programs running on the system.  Refer to the meaning of each parameter listed below and other vendors' instructions to help you with that determination.
 
 
     In general, the values that follow are only policing values.  In other words, they can usually be over-allocated without causing harm to your system.  This means that if your existing programs are not already running up against the limits you have specified, they will not use more kernel resources after modifying your kernel parameters.
 
 
 
==IPC Message Queue Parameters ==================================================
 
 
mesg       1        This should not be changed.
 
 
msgmap     1026     This is the number of entries in the kernel's message
 
map
 
                    table.  This value should equal msgtql+2, and is should
 
                    always be less than msgseg.  A value roughly half of msgseg
 
                    should be good.
 
 
msgmax     4096     This is the maximum size of a single message in bytes.
 
 
msgmnb     4096     This is the maximum number of bytes that all the
 
messages on
 
                    a single message queue can occupy.
 
 
msgmni     50       This is the maximum number of message queues allowed on
 
the
 
                    system at any time.
 
 
msgseg     2048     This is the number of memory segments allocated by the
 
                    kernel at system startup to hold messages.  Each system will
 
                    have a limit on the total memory allocated (msgseg*msgssz),
 
                    often 128KB.
 
 
msgssz     8        This is the size in bytes of the memory segments used
 
for
 
                    storing messages.  Valid values must be multiples of 4.
 
 
msgtql     1024     This is the number of system messages headers which the
 
                    kernel can store, which is effectively the maximum number of
 
                    unread messages at any time.
 
 
 
==IPC Semaphore Parameters ======================================================
 
 
sema       1        This should not be changed.
 
 
semaem     16384    This is the maximum adjust-on-exit value for a
 
semaphore.
 
                    It can be set to 32767 if necessary, but MQSeries does not
 
                    require this.
 
 
semmap     1026     This is the size of the kernel's map of semaphore sets.
 
                    This value should equal semmni+2.
 
 
semmni     1024     This is the maximum number of semaphore sets that can
 
exist
 
                    on the system at any time.
 
 
semmns     32768    This is the maximum number of semaphores in the system.
 
A
 
                    value of 16384 will generally work for a small MQSeries
 
                    installation, but setting it to 32768 is advisable for
 
                    larger systems.
 
 
semmnu     2048     This is the number of semaphore undo structures
 
allocated
 
                    by the system.
 
 
semmsl     128      This is the maximum number of semaphores per semaphore
 
set.
 
 
semopm     128      This is the maximum number of semaphore operations that can
 
                    be done by one semop() call.  If this is set to semmsl,
 
                    one semop() call can operate on every semaphore in a
 
                    semaphore set, although MQSeries does not require this.
 
 
semume     256      This is the number of semaphore undo entries for each
 
                    process.
 
 
semvmx     32767    This is the maximum value that a semaphore can have.
 
 
 
 
==IPC Shared Memory Parameters ==================================================
 
 
shmem      1        This should not be changed.
 
 
shmmax     4194304  This is the maximum size in bytes of a shared memory
 
                    segment.
 
 
shmmni     1024     This is the maximum number of shared memory segments
 
that
 
                    can exist on the system at any time.
 
 
shmseg     1024     This is the maximum number of shared memory segments
 
that a
 
                    single process can have at any time.  It should always be
 
                    less than or equal to shmmni.
 
 
 
==Miscellaneous Parameters ======================================================
 
 
 
maxusers   32       This controls the number of users which can log in to
 
the
 
                    system.  More importantly, it controls other system values
 
                    which limit the number of processes that can run at once.
 
 
Rather than changing maxusers, we would recommend that you alter the nproc and maxuprc values as follows:
 
 
  nproc: The maximum number of processes on the system
 
 
           1 for each non-MQSeries process on the system         PLUS
 
           3 for each MQSeries queue manager (strmqm)            PLUS
 
           2 for each MQSeries receiver or svrconn channel       PLUS
 
           1 for each MQSeries sender channel                    PLUS
 
           1 for each other MQSeries process (runmqtrm, etc.)
 
 
  maxuprc: The maximum number of processes for a single user
 
 
           1 for each non-MQSeries process run by 'mqm'          PLUS
 
           3 for each MQSeries queue manager (strmqm)            PLUS
 
           2 for each MQSeries receiver or svrconn channel       PLUS
 
           1 for each MQSeries sender channel                    PLUS
 
           1 for each other MQSeries process (runmqtrm, etc.)
 
 
 
     Users of Sun Solaris 2.5.1 or better may wish to verify that they are not in fact using more than 25% of their kernel resources for semaphore structures.  In order to calculate this in bytes, use the formula given below.  Also, if you are letting the kernel determine nproc for you, you can find this value by typing 'sysdef | grep v_proc':
 
 
  kernel_memory = semmns * 16 +
 
                  nproc * 16 +
 
                  semmni * 92 +
 
                  semmnu * ((semume + 1) * 16) * 4
 
 
     Solaris 2.5.1 users must also be certain that they are not using more than 25% of their kernel resources for shared memory structures.  In order to calculate this in bytes, use the formula given below:
 
 
  kernel_memory = shmmni * 120
 
 
     Of course, simply calculating the bytes needed for shared memory and semaphore structures is not terribly useful if you don't know what the overall kernel resources are.  Kernel memory is limited by your kernel architecture as well as by your available RAM.  Type 'uname -m' to see what your kernel architecture is.  The maximum kernel memory that common Sun architectures can use today is given below:
 
 
     Kernel  Resources  Machines
 
     ======  =========  ===============================================
 
     sun4m   256 MB     ------
 
     sun4d   576 MB     SS1000, SC2000
 
     sun4u   4 GB       UltraSPARC | 
			   
			 
		   | 
		 
		
		  | Back to top | 
		  
		  	
		   | 
		 
		
		    | 
		 
		
		  | Michael Dag | 
		  
		    
			  
				 Posted: Thu Jul 10, 2003 8:02 am    Post subject:  | 
				     | 
			   
			 
		   | 
		 
		
		    Jedi Knight
 
 Joined: 13 Jun 2002 Posts: 2607 Location: The Netherlands (Amsterdam) 
  | 
		  
		    
			  
				| Also check the Probe Id in the FDC files and see if this is a known issue on the IBM support website the current CSD level of 5.2 is 6. | 
			   
			 
		   | 
		 
		
		  | Back to top | 
		  
		  	
		   | 
		 
		
		    | 
		 
		
		  | 
		    
		   | 
		 
	   
	 | 
   
 
  
	     | 
	 | 
	Page 1 of 1 | 
   
 
 
 
  
  	
	  
		
		  
 
  | 
		  You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
  | 
  		 
	   
	 | 
   
 
  	 | 
	  |