Illustration by Dusan Mirkovic

Practical Log Viewers with Sanic and Elasticsearch - Designing CI/CD Systems

2019-10-26 python engineering api docker Cristian Medina

One of the critical pieces in a build system is the ability to view build and test output. Not only does it track progress as the build transitions through the various phases, it’s also an instrument for debugging.

This chapter in the continuous builds series covers how to build a simple log viewer. You’ll find details on retrieving log entries from Docker containers, serving them through Python, linking from a GitHub pull request, and highlighting the data for easy reading.

Creating a log viewer is not as complicated as you might think. You’ll need permanent storage for the logs, a REST API to retrieve them, and some web code to help highlight areas of interest and offer a “live reload” function.

Since our build system already uses a REST API to receive webhooks, let’s add an endpoint to that Sanic application. One that looks for log data and presents it to the user.

Even better, you can provide two endpoints: one to return raw text data for consumption by third-party automation, and one with “styled” data that returns a webpage for viewing in the browser.

Storing logs

Before serving the logs, we need to decide what is a log and where to store it.

While any database can handle saving the output - I’ve used both SQL and NoSQL systems for this - it’s more important to determine how to manage live data first. You want a mechanism that allows users to watch execution as it happens, but avoid a complicated system that’s periodically pushing information from every container into this database. It just won’t scale.

Given that we’re using Docker containers to perform the actual builds and tests, you can leverage the log output from those containers as the data source. These include both the stdout and stderr of the commands we’re executing with subprocess.

Docker’s API can also retrieve the output from active containers, as well as those that already exited. It handles the live watcher use case and allows us to store the data when build execution finishes, before cleaning up the container.

In other words, the log viewer workflow is to check the database for the container log first. If it can’t find it, then assume it’s still executing and check the Docker service.

In terms of database technologies, there are a few things to consider in making your choice:

Disk space available and the ease with which to expand - will you need horizontal scaling?
Longevity - how long should you store the logs for? do you need a rotation schedule?
Support for indexing and searching - will you go back through history and search through the data?
Granularity - do you need to store each line as a separate record? Or is one record per execution enough?
High availability - is it ok to lose this data if your build service dies? Is it ok to have an outage of the log subsystem while the rest of the build service continues?

Your choice varies greatly depending on the answer to those questions. For simple things, sqlite works just fine. For more searchability, you should look at Postgres, or check out Mongo for flexibility. Ultimately though, you’ll probably end up with Elasticsearch or something similar.

To store the data, it’s better to make an endpoint requesting our system to store the output from a given container, rather than retrieving that output and sending it to the endpoint. It saves time, compute, and network resources.

Here’s an example endpoint that pulls the info and puts it in Elasticsearch with some extra metadata:

@app.post('/logs', version=1)
async def store_log(request):
    """Store container logs permanently"""

    ts = datetime.now().timestamp()
    body = request.json
    container = str(body['container'])

    log = _get_container_log(container)
    if log is None:
        return response.json({'status': 404, 'code': 'NotFound', 'Description': f'No logs found for {container}.'}, headers=RESPONSE_HEADERS, status=404)

    es = Elasticsearch(ELASTICSEARCH_HOST)
    es.create(
        id=str(ts),
        index=ELASTICSEARCH_INDEX,
        body={
            'logged_at': ts,
            'container': container,
            'repo': body['repo'],
            'sha': body['sha'],
            'log': log,
            'host': request.ip,
        }
    )

    return response.json({}, headers=RESPONSE_HEADERS)

The function requires a POST body with the repo and commit hash that generated the logs, plus the container identifier to retrieve them from, all of which are available as environment variables. Our system configures the first two when creating the container, while Docker sets the last one under HOSTNAME.

Note that the extra metadata added to the Elasticsearch document is meant to help find this record in the future. You’ll need it when writing the viewer function.

To retrieve the output from Elasticsearch and Docker, this is what you do:

def get_container_log(container, since=None):
    """Check Elasticsearch for the container log, try searching through each swarm node if it's not there"""

    # Check Elasticsearch first
    es = Elasticsearch(ELASTICSEARCH_HOST)
    hits = es.search(index=ELASTICSEARCH_INDEX, body={'query': {'match': {'container': container}}})['hits']

    if hits['total']['value'] > 0:
        # Found it
        return hits['hits'][0]['_source']['log']

    # Didn't find it, check Docker
    return _get_container_log(container, since)


def _get_container_log(container, since=None):
    """Check Docker Swarm nodes for the container log"""

    dock = docker.DockerClient(DOCKER_HOST)
    nodes = dock.nodes.list()

    # Iterate through the nodes looking for the log
    for node in nodes:
        try:
            nodeclient = docker.DockerClient(f"{socket.gethostbyname(node.attrs['Description']['Hostname'])}:{DOCKER_NODE_PORT}")
            return nodeclient.containers.get(container).logs(since=since).decode()

        except (docker.errors.NotFound, docker.errors.APIError, requests.exceptions.ConnectionError):
            # This node didn't list the container
            pass

    # Couldn't find it
    return None

The Docker Swarm portion of this code gets the list of nodes in the Swarm and goes through each of them until it finds the one hosting the container identifier you need.

If you’re wondering why there are two functions with the same name, look again. One has an underscore prefix. This is a Python language convention that denotes private methods or variables. It’s a way of telling other developers that the code should only be used internally within this module.

Another observation is the Elasticsearch query. If you haven’t used this database before, you may find the body parameter passed into the search() method strange. That’s because ES is not a SQL database and has a query mechanism that works over a REST API. In this case, we’re asking it to search all records where the container field matches the identifier passed to our endpoint.

The response from Elasticsearch is a dictionary containing a hits field with the matches it found. That structure also has a count of the results. A quick way of checking if the query didn’t find anything.

Highlighting information

Providing a web version of the log viewer enables you to guide the user when reading. You can highlight important information and make it easier for them to scan through the data. Things worth pointing out are:

Any success or failure messages should appear green or red, respectively. For example, the output of a pytest run.
Warning messages also deserve some attention.
Any informational log entry written by the build system that shows the command being executed.

You can do all of this by returning HTML with some CSS classes that assign the colors from the server-side. Or by embedding some JavaScript code that runs after the page loads and processes the elements in it.

Here’s what that would look like:

@app.get('/logs', version=1)
async def get_styled_log(request):
    """Return a pretty webpage with a container log. Use query parameters to specify the container and time to start from"""

    container = request.args.get('container')
    if container is None:
        return response.json({'status': 422, 'code': 'MissingParameter', 'Description': f'Missing "container" query parameter.'}, headers=RESPONSE_HEADERS, status=404)

    since = request.args.get('since')
    if since is not None:
        try:
            since = int(since)

        except ValueError:
            return response.json({'status': 422, 'code': 'InvalidParameter', 'Description': f'Invalid "since" query parameter.'}, headers=RESPONSE_HEADERS, status=404)

    # Search for the log
    log = get_container_log(container, since)

    if log is None:
        # Didn't find it
        return response.json({'status': 404, 'code': 'NotFound', 'Description': f'No logs found for {container}.'}, headers=RESPONSE_HEADERS, status=404)

    count = int(request.args.get('count', 0))

    output = []
    if since is None:
        # Document header, styles and base div elements
        output.append("""<html><head><style>.ansi pre,p,a {
            padding: 0px 10px 0px 0px;
            margin: 0px;
            background: #222222;
            color: #C0C0C0;
            font-style: normal;
            font-family: monospace;
            font-size: 12px;
            font-weight: normal;
            line-height: 20px;
            word-wrap: break-word;
            white-space: pre-wrap;
            box-sizing: border-box;
            min-width: 55px;
        }</style></head>
        <body>
        <div id="watch" style="float: right;
            background-color: #34495E;
            padding: 10px;
            box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);
            margin: 10px 10px 10px 180px;
            position: fixed;
            opacity: 0.8;
            font-weight: bold;
            font-size: 10px;
            font-family: courier;">
        Watch
        </div>
        <pre id="output" class="ansi">""")

    # Iterate through each line and add a new paragraph element with the data
    for i, line in enumerate(log.split("\n")):
        output.append('<p')

        if line == "":
            # Needed to show blank lines
            line = " "

        elif "::" in line:
            # When showing pytest output, set the line red for FAILED / ERROR, orange for SKIPPED and green for PASSED
            if "FAILED" in line:
                output.append(' style="color:#ff7272"')
            elif "ERROR" in line:
                output.append(' style="color:#ff7272"')
            elif "SKIPPED" in line:
                output.append(' style="color:orange"')
            elif "PASSED" in line:
                output.append(' style="color:#b9ff4c"')

        elif "[INFO]" in line:
            # Set a white line color to white for output from the logging.info()
            output.append(' style="color:white"')

        elif "[WARNING]" in line:
            # Orange for logging.warning()
            output.append(' style="color:orange"')

        elif "[ERROR]" in line:
            # Red for logging.error()
            output.append(' style="color:#ff7272"')

        elif "=====================" in line:
            # Found a break in the pytest report, set colors for failed or passed output
            if "failed" in line:
                output.append(' style="color:#ff7272"')
            elif "passed" in line:
                output.append(' style="color:#b9ff4c"')
            elif "test session starts" in line:
                # Highlight the line indicating where a pytest session starts
                output.append(' style="color:#ffff56"')

        # Add a line number and an anchor for easy viewing and linking
        output.append(f'><a id="L{i + count}" style="color:#616666;text-align:right;float:left;">{i + count}</a>')

        # Print the log entry, but don't forget to escape it so it doesn't break HTML
        output.append(f'<span style="display:block;margin-left:55px;">{escape(line)}</span></p>')

    if since is None:
        # If we're not pulling a subset of log data add some JavaScript that auto-refreshes the page and polls for new records
        output.append(f"""</pre><script>
        var frequency = 5000; // in miliseconds
        var interval = 0;
        var d = new Date();
        var startTime = d.getTime();

        function refresh()
        {{
            console.log("Refreshing");

            let output = document.getElementById("output")
            output.removeChild(output.lastChild)

            let count = parseInt(output.lastChild.childNodes[0].text) + 1

            let xhr = new XMLHttpRequest();
            xhr.addEventListener("load", function() {{
                output.appendChild(document.createRange().createContextualFragment(xhr.responseText));
                output.lastChild.scrollIntoView();
            }});
            xhr.open("GET", "{FORGE_URL}{LOG_VIEW_ENDPOINT}{container}&since=" + parseInt(startTime / 1000) + "&count=" + count);
            xhr.send();

            d = new Date();
            startTime = d.getTime();
        }}

        function startRefresh() {{
            if (interval > 0) clearInterval(interval);
            interval = setInterval("refresh()", frequency);
        }}

        function stopRefresh() {{
            clearInterval(interval);
        }}

        document.getElementById("watch").onclick = function() {{
            console.log("Refresh")
            if (this.textContent == "Watch") {{
                startRefresh();
                this.textContent = "Stop Watching"
            }}
            else {{
                stopRefresh();
                this.textContent = "Watch"
            }}
        }}
        </script></body></html>""")

    return response.html(''.join(output))

Going through the code, you’ll want first to do a couple of checks that validate the request and return appropriate errors. Since the REST API is Sanic, you can leverage the response.json() method and response.html(), which take the HTTP status code to return and automatically set the content-type headers of the responses.

Contrary to the usual web framework model, we’re not using a templating system to formulate the HTML response. There’s no need to add more dependencies on third-party modules just to avoid concatenating a long string.

The HTML response requires a short header with CSS to help format output into a monospaced font. Plus, some styling for a floating div that serves as a button to start or stop a “live watch” session.

Since usability and readability are the focus of this response, the text style is paramount. The closer to console output it looks, the easier it is on a developer to debug.

Another useful improvement is including line numbers in the output and anchor tags so that users can share links to a given line easily.

You can add those in by splitting the output on newline characters, and iterating using the built-in enumerate() method. It takes any iterable, producing a new list of two-item tuples where the first item is the element’s index in the list.

While looping through each line, we then apply the highlight styles based on the contents. Since we’re primarily using Python and pytest for building and testing, the styles are optimized for Python logging output and the pytest reporting format.

With some extra JavaScript, you can also add a polling system that periodically requests new output from the server. This works best if your REST endpoint takes a timestamp - the since parameter in our case - that can serve as the lower bound for the log. It saves network resources because you’ll transfer less data, less often.

Note that the JS code is not using the latest and greatest standards or any third-party modules. You don’t need external dependencies to implement a refresh mechanism with setInterval() and clearInterval() functions. Nor do you need fetch() to perform the HTTP GET that refreshes info.

Linking with GitHub

In the previous status reporting chapter, we covered how to use GitHub’s Status API to submit information when progressing through build steps.

One of the fields in those HTTP POST requests is the target_url, which can be used to show extended details. It can point to anything, a minimal text log, a web page, or any other downloadable file.

Point a GitHub status to the detailed log by sending a URL to the log viewer endpoint you just created. Include a parameter with the correct container identifier, which is available inside the execution script as the HOSTNAME environment variable.

Security implications

As with almost everything discussed in this series, don’t forget to think about security.

People print all kinds of stuff in log messages to help them debug. Especially if they think the build is internal and unavailable to customers.

You should actively discourage anyone from including passwords or secrets in their output. The risk isn’t just about an unauthorized user viewing the log, but also about the exposure in transferring that information over the network - even if it’s internal.

What’s Next?

At this point, you have a fully functional build system that integrates with GitHub to manage the source and report progress, while using Docker to distribute work across compute. You’re storing log information, helping with debug and making it easily accessible to users.

There are many other subsystems that live around the periphery of a build process to provide credentials management, chat integrations,artifact storage, resource management, and other functions. Stay tuned for more articles in all of these spaces.

python ci cd builds github REST elasticsearch sanic