This document is an introduction to Postfix queue congestion analysis.It explains how the qshape(1) program can help to track down thereason for queue congestion. qshape(1) is bundled with Postfix2.1 and later source code, under the "auxiliary" directory. Thisdocument describes qshape(1) as bundled with Postfix 2.4.

When mail is draining slowly or the queue is unexpectedly large,run qshape(1) as the super-user (root) to help zero in on the problem.The qshape(1) program displays a tabular view of the Postfix queuecontents.

When the output is a terminal intermediate results showing the top 20domains (-n option) are displayed after every 1000 messages (-N option)and the final output also shows only the top 20 domains. This makesqshape useful even when the "deferred" queue is very large and it mayotherwise take prohibitively long to read the entire "deferred" queue.

Large numbers in the qshape output represent a large number ofmessages that are destined to (or alleged to come from) a particulardomain. It should be possible to tell at a glance which domainsdominate the queue sender or recipient counts, approximately whena burst of mail started, and when it stopped.

When a site you send a lot of email to is down or slow, mailmessages will rapidly build up in the "deferred" queue, or worse, inthe "active" queue. The qshape output will show large numbers forthe destination domain in all age buckets that overlap the startingtime of the problem:

All new mail entering the Postfix queue is written by thecleanup(8) service into the "incoming" queue. New queue files arecreated owned by the "postfix" user with an access bitmask (ormode) of 0600. Once a queue file is ready for further processingthe cleanup(8) service changes the queue file mode to 0700 andnotifies the queue manager of new mail arrival. The queue managerignores incomplete queue files whose mode is 0600, as these arestill being written by cleanup.

Note that whenever the queue manager is restarted, there mayalready be messages in the "active" queue directory, but the "real""active" queue in memory is empty. In order to recover the in-memorystate, the queue manager moves all the "active" queue messagesback into the "incoming" queue, and then uses its normal "incoming" queuescan to refill the "active" queue. The process of moving allthe messages back and forth, redoing transport table (trivial-rewrite(8)resolve service) lookups, and re-importing the messages back intomemory is expensive. At all costs, avoid frequent restarts of thequeue manager (e.g. via frequent execution of "postfix reload").

Generally there are many ways to check the emails in the queue on Postfix such as:- mailq postqueue -qBut Qshape is another very nice tool which shows the number of emails in the mail queue and which domain it is waiting to deliver by the Postfix MTA. We can check the mail queue as below:- qshape hold qshape activeqshape deferqshape deferredThe above commands will display the output for the number of emails which are respectively in the hold queue, active queue, defer queue and deferred queue.While running a command above if you get the message "Qshape not found", that means that we need to install it on the server and we can install it with the following package:- yum install postfix-perl-scriptsNow run the Qshape command and it'll run and give the result.

Note: Using mailx to send test emails from a single host is sufficient for the purpose of this lab. In a production environment, you should use the registered domain that you configured in /etc/postfix/ within the sender email address instead, for example

postqueue -s domain.tld should cause the backup relay machine to flush all the email for your site. The default setup for postfix enables per-site flushing for all domains in relay_domains. postqueue -f will do this too, but will also push out mail for external sites, i.e. it does more than you need.

Make sure the cache file in /etc/snmp/postfixdetailed is some place that snmpd can write too. This file is used for tracking changes between various values between each time it is called by snmpd. Also make sure the path for pflogsumm is correct.

Run /etc/snmp/postfixdetailed to create the initial cache file so you don't end up with some crazy initial starting value. Please note that each time /etc/snmp/postfixdetailed is ran, the cache file is updated, so if this happens in between LibreNMS doing it then the values will be thrown off for that polling period.

The postfix_policy_time_limit key is set because by default the Postfixspawn(8) daemon kills its child process after 1000 seconds. This is tooshort for a policy daemon that might run as long as an SMTP client isconnected to an SMTP process.

