Thoughts on reliable syslog

One of the most important issues in a logging system (i.e. syslog) is reliability. But sometimes I get the impression most people asking for reliability do not really want it. IMO a reliability requirement has to be tested against the following extreme case: suppose the logging system fails and you enter a command — should the command be executed although it cannot be logged? If the answer is “Yes” then you do not really want reliable logging (at least for the command you thought about).

I admit that this constructs a dichotomy that might not be necessary. But so far I have not heard of some “weak” or “semi-reliability”, thus the binary distinction only reflects the state of discourse on the subject. I also admit that most related real-life problems were solved if the syslog-world would eventually throw out UDP in favor of TCP or TLS. So this whole reasoning is not about pressing needs but rather abstract and mainly a response to people claiming “just implement a rate-limit, then everything is solved”.

At first sight the idea of rate-limiting seems fine: just do not send more data than either the channel is capable of transmitting or the receiver is capable of processing. Then (just like in a file transfer with flow control) we never use more resources than available and can log happily ever after ;-). But the big difference is: logging is an open system that cannot be totally controlled. You cannot process log messages at the optimal rate (like you can use flow control to get optimal network utilization) but you have to process them as fast as they are generated.

The whole situation basically boils down to a producer-consumer-problem: You have a buffer (i.e. shared memory, local socket, or network connection), one or more producers sending messages into the buffer, and a consumer taking messages out of the buffer. Now the question is: What can we do if the producers generate more messages than the consumer takes out and the buffer fills up, either for a short time (during a peak) or constantly?

I see three possible reactions:

  • enlarge the buffer and/or
  • stop the producers and/or
  • throw away messages.

Enlarging the buffer is possible if the producers have a growable send buffer. If using a socket then the OS handles this up to configurable limit (maxsockbuf); if implementing ones own program one can allocate buffer memory up to a configured limit or until physical memory + swap is full.
This is a good solution to bridge short network outages or load peaks, but it still has its limits and cannot ‘solve’ the problem.

Stopping the producer is the classical ‘operating systems lecture’-solution, since this is the desired behaviour for ‘normal’ programs. If one communicates over a stream socket or a pipe then the OS takes care of it and the send() call is blocking, i.e. it does not return until the message is sent, thus causing the program to sleep.
Wether this is a solution for syslog depends on the program. Under the assumption that reliability matters this is the best strategy for network server applications (e.g. Postfix or Apache) since stopping/slowing down the server directly leads to no/less messages and solves the problem.
BUT not every program can or should be stopped in case of log problems: It is not feasible to stop the kernel, and it is not desirable to stop sshd because the admin might have to log in to fix the cause of the problem.
Some programs might also have time constraints (e.g. it does not matter if the daily backups runs an hour late, but continuous blocking must not delay it mor than e.g. 6 hours after schedule).

Throwing away messages is the easiest way out. If the system is supposed to be reliable then this is to be avoided at all costs; it means you failed and could not meet the requirement.
If reliability is not an issue, then just throwing away all messages that cannot be taken care of is the best strategy: It is simple, efficient, easy to implement, the sender does not care about the state of the buffer, and there will never be no locking issues or risk of deadlock. — This was the design of the original BSD syslog, which was intended to be just a debugging aid.
But wait, there is a third alternative: the reliability requirement does not have to be binary. If we look at syslog, then every message has a severity indicating its importance; possible values range from debug to emergency. This gives way to a new strategy: Throw away as much messages as necessary but only those of the lowest severity. Without knowing the future (i.e. in any real program ;) this clearly needs a mutable buffer as it is not possible to ‘unsend’ a message. An implementation will need its own random access buffer; then once the buffer gets too full (over high waterwark) you start iterating its content and throw away first all debug messages, then all informational messages and so on until enough memory is freed (usage below low watermark).
Just like including the buffer this approach might make a difference (if there are lots of less severe events) and lead to a sufficient level of reliability where no message above a certain severity is lost. But with respect to the more abstract reasoning this is no solution. For every desired severity level there still might be so many messages that some have to be thrown away.

Conclusion:

  • Every try at a “perfectly reliable syslog” faces the dilemma of blocking the system or still loosing messages.
  • For “real-life syslog implementations” there are ways to become more robust and dependable. In practice every modern implementation choses a trade-off among the three alternatives discussed here. And there is even a way to cheat and implement different trade-offs for different kinds of messages: just use more than one channel (e.g. most systems have one channel for the kernel and one for applications, FreeBSD also has a third one for privileged applications only).

One Response to “Thoughts on reliable syslog”

  1. Rainer Gerhards says:

    Hi, this is a a really good summary of the problems encountered with reliable syslog. You just missed one thing, I think: a buffer must not necessarily exist only in main memory. When the in-core buffer runs out of space, you may also use a disk-based buffer, which offers much more capacity. Of course, even the largest disk-based buffer may be exhausted at some point, where one needs to resort to other strategies. But a disk-based buffer is an excellent solution for temporary (but lengthy) receiver outages. As as side-note, rsyslog has implemented all three buffer handling options. If you are interested, you may want to have a look at it’s queue module description (yes, this proves the point of being a produce-consumer problem ;)):

    http://www.rsyslog.com/doc-queues.html

    Rainer