“Yeah, Lennie. There’ll be msgids.”
In Solaris 8 (7 if you install patch the kernel patch), we changed the format of the messages logged by syslog by adding a new bracketed field at the front. The format if the new field is “[123456 kernel.debug]”. The second part is obviously the facility and level of the message, but what is that first number?
That number is the msgid. The msgid is simply the results of a hash function as applied to the format string passed to the syslog library function. When we added the field, we also made an attempt to make all messages in Solaris unique in the msgid. We were largely successful, although there are exceptions.
So, what can you do with the msgid? I am glad you asked. Suppose you found a message in the messages file that looked like this: “automountd: [ID 784820 daemon.error] server nfs-server1 not responding”, and you wanted to find out what it means. Well, along with the change in format, we added the “msgid” command, which takes strings on stdin and rewrites them with there hash to stdout. Classic Unix filter. So, what we can do is this:
strings /usr/lib/autofs/automountd | msgid | grep 784820
The result of which is
784820 server %s not responding
So, this tells us that the line of code that produces this message has the exact string “server %s not responding”
on it. So, we go to the file and look for this string and lo and behold…its not there. So we try looking for the beginning part and find that the message is really split across two source lines, and is in file autod_nfs.c at line 982 and from there we can try to work out what it really means, in true “Use the source, Luke” fashion. And aren’t you glad you can use the source now?
Now, suppose you didn’t want to use the source, but wanted to find out if this has been a problem for other users? Well, if you are a Sun contract customer you could enter “784820 automountd” into SunSolve to see if any one else had the same problem, or you could check Google. Looking at some of the results of either search might give you a clue about the problem.
Now, I admit that this example is a little contrived, since the meaning of the message is reasonably clear and the format of the syslog call is pretty easy to figure out, but sometimes they are much more obscure. The technique is the same irregardless.