Logging Part 2

In my previous post I wrote about inline logging, that is, using logging in the code without a configuration file of some kind.

In this post I'm going to go over setting up a configuration file to support the various different needs you may have for logging.

Previously I mentioned this scenario:

Perhaps the DevOps team wants robust logging messages on anything ERROR and above, but the application team wants to have INFO and above in a rotating file name schema, while the QA team needs to have the DEBUG and up output to standard out.

Before we get into how we may implement something like what's above, let's review the parts of the Logger which are:

Formatters

In a logging configuration file you can have multiple formatters specified. The above example doesn't state WHAT each team need, so let's define it here:

DevOps: They need to know when the error occurred, what the level was, and what module the error came from
Application Team: They need to know when the error occurred, the level, what module and line
The QA Team: They need to know when the error occurred, the level, what module and line, and they need a stack trace

For the Devops Team we can define a formatter as such¹:

'%(asctime)s - %(levelname)s - %(module)s'

The Application team would have a formatter like this:

'%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'

while the QA team would have one like this:

'%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'

Handlers

The Handler controls where the data from the log is going to be sent. There are several kinds of handlers, but based on our requirements above, we'll only be looking at three of them (see the documentation for more types of handlers)

From the example above we know that the DevOps team wants to save the output to a file, while the Application Team wants to have the log data saved in a way that allows the log files to not get too big. Finally, we know that the QA team wants the output to go directly to stdout

We can handle all of these requirements via the handlers. In this case, we'd use

FileHandler for the DevOps team
RotatingFileHandler for the Application team
StreamHandler for the QA team

Configuration File

Above we defined the formatter and handler. Now we start to put them together. The basic format of a logging configuration has 3 parts (as described above). The example I use below is YAML, but a dictionary or a conf file would also work.

Below we see five keys in our YAML file:

version: 1
formatters:
handlers:
loggers:
root:
  level:
  handlers:

The version key is to allow for future versions in case any are introduced. As of this writing, there is only 1 version ... and it's version: 1

Formatters

We defined the formatters above so let's add them here and give them names that map to the teams

version: 1
formatters:
  devops:
    format: '%(asctime)s - %(levelname)s - %(module)s'
  application:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
  qa:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'

Right off the bat we can see that the formatters for application and qa are the same, so we can either keep them separate to help allow for easier updates in the future (and to be more explicit) OR we can merge them into a single formatter to adhere to DRY principals.

I'm choosing to go with option 1 and keep them separate.

Handlers

Next, we add our handlers. Again, we give them names to map to the team. There are several keys for the handlers that are specific to the type of handler that is used. For each handler we set a level (which will map to the level from the specs above).

Additionally, each handler has keys associated based on the type of handler selected. For example, logging.FileHandler needs to have the filename specified, while logging.StreamHandler needs to specify where to output to.

When using logging.handlers.RotatingFileHandler we have to specify a few more items in addition to a filename so the logger knows how and when to rotate the log writing.

version: 1
formatters:
  devops:
    format: '%(asctime)s - %(levelname)s - %(module)s'
  application:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
  qa:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
handlers:
  devops:
    class: logging.FileHandler
    level: ERROR
    filename: 'devops.log'
  application:
    class: logging.handlers.RotatingFileHandler
    level: INFO
    filename: 'application.log'
    mode: 'a'
    maxBytes: 10000
    backupCount: 3
  qa:
    class: logging.StreamHandler
    level: DEBUG
    stream: ext://sys.stdout

What the setup above does for the devops handler is to output the log data to a file called devops.log, while the application handler outputs to a rotating set of files called application.log. For the application.log it will hold a maximum of 10,000 bytes. Once the file is 'full' it will create a new file called application.log.1, copy the contents of application.log and then clear out the contents of application.log to start over. It will do this 3 times, giving the application team the following files:

application.log
application.log.1
application.log.2

Finally, the handler for QA will output directly to stdout.

Loggers

Now we can take all of the work we did above to create the formatters and handlers and use them in the loggers!

Below we see how the loggers are set up in configuration file. It seems a bit redundant because I've named my formatters, handlers, and loggers all matching terms, but 🤷‍♂️

The only new thing we see in the configuration below is the new propagate: no for each of the loggers. If there were parent loggers (we don't have any) then this would prevent the logging information from being sent 'up' the chain to parent loggers.

The documentation has a good diagram showing the workflow for how the propagate works.

Below we can see what the final, fully formed logging configuration looks like.

version: 1
formatters:
  devops:
    format: '%(asctime)s - %(levelname)s - %(module)s'
  application:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
  qa:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
handlers:
  devops:
    class: logging.FileHandler
    level: ERROR
    filename: 'devops.log'
  application:
    class: logging.handlers.RotatingFileHandler
    level: INFO
    filename: 'application.log'
    mode: 'a'
    maxBytes: 10000
    backupCount: 3
  qa:
    class: logging.StreamHandler
    level: DEBUG
    stream: ext://sys.stdout
loggers:
  devops:
    level: ERROR
    formatter: devops
    handlers: [devops]
    propagate: no
  application:
    level: INFO
    formatter: application
    handlers: [application]
    propagate: no
  qa:
    level: DEBUG
    formatter: qa
    handlers: [qa]
    propagate: no
root:
  level: ERROR
  handlers: [devops, application, qa]

In my next post I'll write about how to use the above configuration file to allow the various teams to get the log output they need.

full documentation on what is available for the formatters can be found here: https://docs.python.org/3/library/logging.html#logrecord-attributes ↩︎

Posted by ryan on Thu 07 April 2022

Logging Part 1

Logging

Last year I worked on an update to the package tryceratops with Gui Latrova to include a verbose flag for logging.

Honestly, Gui was a huge help and I wrote about my experience here but I didn't really understand why what I did worked.

Recently I decided that I wanted to better understand logging so I dove into some posts from Gui, and sat down and read the documentation on the logging from the standard library.

My goal with this was to (1) be able to use logging in my projects, and (2) write something that may be able to help others.

Full disclosure, Gui has a really good article explaining logging and I think everyone should read it. My notes below are a synthesis of his article, my understanding of the documentation from the standard library, and the Python HowTo written in a way to answer the Five W questions I was taught in grade school.

The Five W's

Who are the generated logs for?

Anyone trying to troubleshoot an issue, or monitor the history of actions that have been logged in an application.

What is written to the log?

The formatter determines what to display or store.

When is data written to the log?

The logging level determines when to log the issue.

Where is the log data sent to?

The handler determines where to send the log data whether that's a file, or stdout.

Why would I want to use logging?

To keep a history of actions taken during your code.

How is the data sent to the log?

The loggers determine how to bundle all of it together through calls to various methods.

Examples

Let's say I want a logger called my_app_errors that captures all ERROR level incidents and higher to a file and to tell me the date time, level, message, logger name, and give a trace back of the error, I could do the following:

import logging

message='oh no! an error occurred'
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s - %(name)s')
logger = logging.getLogger('my_app_errors')
fh = logging.FileHandler('errors.log')
fh.setFormatter(formatter)
logger.addHandler(fh)
logger.error(message, stack_info=True)

The code above would generate something like this to a file called errors.log

2022-03-28 19:45:49,188 - ERROR - oh no! an error occurred - my_app_errors
Stack (most recent call last):
  File "/Users/ryan/Documents/github/logging/test.py", line 9, in <module>
    logger.error(message, stack_info=True)

If I want a logger that will do all of the above AND output debug information to the console I could:

import logging

message='oh no! an error occurred'

logger = logging.getLogger('my_app_errors')

ch = logging.StreamHandler()
fh = logging.FileHandler('errors.log')

formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s - %(name)s')

fh.setFormatter(formatter)
ch.setFormatter(formatter)

logger.addHandler(fh)
logger.addHandler(ch)

logger.error(message, stack_info=True)
logger.debug(message, stack_info=True)

Again, the code above would generate something like this to a file called errors.log

2022-03-28 19:45:09,406 - ERROR - oh no! an error occurred - my_app_errors
Stack (most recent call last):
  File "/Users/ryan/Documents/github/logging/test.py", line 18, in <module>
    logger.error(message, stack_info=True)

but it would also output to stderr in the terminal something like this:

2022-03-27 13:18:45,367 - ERROR - oh no! an error occurred - my_app_errors
Stack (most recent call last):
  File "<stdin>", line 1, in <module>

The above it a bit hard to scale though. What happens when we want to have multiple formatters, for different levels that get output to different places? We can incorporate all of that into something like what we see above, OR, we can stat to leverage the use of logging configuration files.

Why would we want to have multiple formatters? Perhaps the DevOps team wants robust logging messages on anything ERROR and above, but the application team wants to have INFO and above in a rotating file name schema, while the QA team needs to have the DEBUG and up output to standard out.

You CAN do all of this inline with the code above, but would you really want to? Probably not.

Enter configuration files to allow easier management of log files (and a potential way to make everyone happy) which I'll cover in the next post.

Posted by ryan on Wed 30 March 2022