1. An Architecture Blueprint for a Central Logging System

1.1. Introduction

Logging is mostly treated as a local affair: Of an application or solution, a team, or even a single developer, maybe a group thereof. But the increasing complexity of software systems increases the effort to draw the right conclusions from a large heap of heterogeneous logging records.

The trend from large monolithic application building blocks to smaller — but therefore intensely interconnected — software components is present. Therefore s IT Solutions AT (IT subsidiary of Erste Bank and Sparkassen in Austria, www.s-itsolutions.at/) started a project to care for a central logging and journalling data lake.
The architectural pattern here is a description along the line of this "Central Logging & Journalling" solution/application of sIT. It is targeted not at smaller systems but tries to deal with enterprises and larger IT landscapes.

So, if you feel your path to wisdom by reading logs looks like this, you’re a happy developer already.
forest

But if it looks a bit more like this, further reading could maybe help you.
jungle

1.2. Goals

Logging is not an end in itself — it enables many use-cases that can be grouped into four partitions:

[A]. Support: Find out about what the system did in its runtime, in order to detect the source of problems or give information to other stakeholders. Mostly use-cases here investigate in exceptional program behaviour.

[B]. Compliance: The number of regulatory use-cases increases; the run-time behaviour and intermediate data of software must be documented, often for many years.

[C]. Monitoring & Alerting: When a stream of logging data exists, it is self-evident to use this stream also to find out about the system state as well as problems and report those via multiple channels.

[D]. Analytics & Intelligence: Sophisticated tools allow data mining, BI etc. to find out ways to improve the business, may it be by exploring customer behaviour, may it by predicting operations problems, or may it be something we don’t even dream of yet.

Table 1. Use-case groups
Support [A] Compliance [B] Monitoring & Alerting [C] Analytics & Intelligence [D]

* Customer Care
* Issue research
→ Access Security
→ Searchable, neartime

* Regulatory queries → Long term
→ Safe data store
→ Ideally certified
→ Unfrequent queries

* Stream analysis
* Alerting endpoints
→ Needs rules
→ High performing

* Statistics
* Big Data Analysis
* Machine Learning
* Predictive Analysis
→ Highly specialized toolset

1.3. Architecture

A possible architecture could make use of the following building blocks:

logical arch
Figure 1. Logical architecture of CLJ

The functional as well as the non-functional, or quality, requirements of the use-case groups aforementioned are very different from each other. Therefore it makes sense to use different software products to fulfill those requirements.

1.3.1. Messaging Brick

This building block cares for a reliable (real 24/7) component where the applications can load their logging records up to. It is high-performant, lean, stable and therefore capable to also swallow extreme high-load peaks.

This building block also cares for the use-case group [C]. The type of product is queue-like, our implementation uses Apache Kafka (kafka.apache.org/). Another example would be to use Amazon’s Firehose/Kinesis in an AWS-based environment.

1.3.2. Online Research Store

This record store is responsible for the structured and fast search for log records and to find out about connections between those.

Our implementation uses ElasticSearch (www.elastic.co/de/) together with a self-written ReST service and an Angular-base front-end

1.3.3. Compliance Store

Selected log records (defined by the solution) are persisted in this record store. It is very reliable, needs a back-up, is fast with writing and stores the record before any tampering with it. On the other hand it does not need a super-sophisticated query facility.

In our implementation we decided for Apache Cassandra (cassandra.apache.org/). Other possibilities would be to store those selected records in flat files and archive them, or use a RDBMS.

1.3.4. The Client Side

The applications for logging can send their log records either directly into the messaging brick or by harvesting them from the filesystem or another data store. Both methods have pro’s and con’s.

Table 2. Harvesting methods.
Direct transfer Logfile harvesting

+ Fastest
+ Possibly eliminates one component (file system)
- Technically a tight coupling

+ Non-intrusive to existing applications
- Needs another process (ressources + monitoring)

Direct transfer

Possibilities for applications to integrate are:

  • Own client libs of messaging brick

  • APIs for creating messages that fit to the data model of CLJ

  • Appenders for existing logging frameworks (e.g. log4j2 in Java, or log.net for C#)

Generally it is a good idea to offer integration libraries that care for the situations where the messaging brick suffers from a failure. In those cases the using application should not be brought down by logging, to mitigate the tight coupling issue.

Logfile harvesting

There are a lot of tools for that use-case, ranging from light-weight native apps that are integrated into the operating system up to full-scale ETL tools (en.wikipedia.org/wiki/Extract,_transform,_load).

A few examples:

In certain architectures, some of these products could server as the messaging brick itself.

2. Central Logging Datamodel

2.1. Partitioning of the log record space

Each record has its own id value, making it unique in all of the data stores.

For managing the stores (especially the online research store for [A]), though, it is necessary to organize the records in a number of dimensions. This separation then supports in the determination of

  • Access rights/permissions

  • Retention times

  • Backup strategy

With that, the integrated applications can gain a lot of control and flexibility for their data.

The dimensions that are suggested is a combination, fitting to the actual need, of those fields:

  • tenant (in case of a real multi-tenant system with separated accounts)

  • environment (if environments are not separated pyhsically or logically on the server side)

  • solution, this determines the organizational owner of the log records within a tenant

  • recordType, to distinguish between different needs of building blocks and types of logging and journal data.

2.2. Fields

This list of fields is a gross list of common values a log system could care for. Different applications in different contexts might use one or another subset of this enumeration, hardly setting them all. But, and that is the main reason for this list, values of similar semantics in a log record store should be named equally, to make traversing logs of different applications easier.
Mandatory fields are printed bold.

Table 3. NDM fields
Type Field Name Short Description Long Description

String

id

Technical id for the log record

This can be set by the client (if trusted to care for uniqueness) or be omitted and set by the client. The server allow the id to be reused (=update) for semantics like records of timespans (like, e.g., sessions). Proposed algorithm is UUID.

Header Fields, meta data of each record

String

recordType

Type of the record.

This is an unbounded enumeration, it’s free to select from the solution; anyway, it is recommended to use a known value (see subpage) to make recognition of the semantics of the record easier. Record types can be shared between solutions, e.g. session, activity and techInfo are record types that are used by applications. The record type is used for partitioning the CLJ data stores for the permission system, as well as a key in defining retention periods and archiving strategy.

String

recordSubType

Additional field to identify the event

Can be used as a type of the source log record. For example, if the recordType is 'serverLog', the recordSubType could be "tomcat" oder "weblogic".

String

tenant

Institute number

If needed, for organizations serving multiple iurisdictional tenant, this is the tenant code.

String

environment

Environment identifier

If needed, when the development, test, staging, production etc. environments are not separted by dedicated data store instances but merged in one, this identifier determines from which environment a log record originates.

DateTime-WithFractionSeconds

recordTimestamp

When the log record has been created

If the client does not provide this, or the given value cannot be parsed on server side, the processing engine will create a timestamp as next best guess.

Long

sequence

Determines order

Often the record timestamp is not sufficient to discriminate and order a set of logrecords. E.g. ElasticSearch does not care for finer granularity than miliseconds. In this case the sequence field could store micros or nanos. Another possibility to use this field is that a client can care for a gapless sequence to be sure that no records are lost during transmission, procession and retrieval. A logging front-end can use this field as sole default order attribute or as secondary order attribute after recordTimestamp.

String

logLevel

Level of importance, as provided by many low-level logging systems

This field is optional, it is also not normalized, meaning that whatever the client solution provides here will be taken as-is. A lot of logging libraries have their own mind on this topic.

User Info, information about the person or technical systems connected to the log record

String

user

Unique user id in its userType domain

This identifies the user or system uniquely within the domain given in "userType". This value gets more gain in the context of current data protection laws.

String

userType

Domain this user account belongs to

Needed if different user domains should be distinguished — like internet users (customers) and intranet users (employees). Or when user domains ob subsidiaries are not clearly separated by the user ids.

Source Info, which component wrote this log record

String

solutionCode

Unique identifier of a solution

Identifies the Solution as unit in the IT landscape.

String

solutionFunctionCode

Id of functional building block

If needed, more fine-grained organizational partitioning.

String

sourceApplication

Building block

More technical/architectural paritioning key.

String

sourceHostname

System name of the server initiating the logging call

e.g. DNS of physical or virtual system

String

sourceIp

Client IP, originator of the log

The value might differ in regard of the nature of the originator (e.g. a browser-based application, or a batch)

String

userAgent

Software that initiated the call

This field is used when the software and its version of the user/client is relevant. e.g. In web front-ends this identifies the Browser that has been used. The writing solution can give any information if it thinks that information about its caller makes a difference.

String

agentVersion

TODO

deprecated, might be removed in the future.

String

serverInstanceName

Identifies the server instance

e.g. the docker pod

Initiating solution

String

clientId

Code from initiating system

Inititiating systems are mostly user front-ends or batch processes.

Harvesting Info, where was the log record first persistet, might be different from the source solution

String

sourceType

Syntax of the incoming data

Syntax of the incoming data (into the messaging brick). 'generic' means using this data model in JSON, this is the default value. If the syntax is not 'generic' the central logging service might be able to to a proper transformation.

String

loggingHostname

Server Host Name

like sourceHostname

String

loggingHostIp

Server IP address

The system that provided the logging information, e.g. Apache host for access logs, or any other harvisting service running logstash, fume, rsyslog or a similar tool.

String

logFile

file name and path from which the log record has been harvested, if applicable

If logrecords are not sent directly to the messaging building block, but harvested from a logfile (by Logstash or a similar software) here this filename and path of the appropriate format (Windows, Unix, Mainframe, …) can be sent if needed.

Context

String

parentId

Hierarchical predecessor of this log record.

Could be of a functional or sequential order Here a key of a hierarchical higher-level record can be set. So a tree-like structure of log records can be created.

String

contextId1

Mapping context id field 1

Example: The id of a user session.

String

contextId2

Mapping context id field 2

Example: The (use case) id of a user’s activity.

String

contextId3

Mapping context id field 3

Example: The id of a explicit technical log record.

String

contextId4

Mapping context id field 4

DateTime-WithFractionSeconds

startDate

Start date of the record

For journalling records that have a time span, this field of the event signals the begin timestamp.

DateTime-WithFractionSeconds

endDate

End date of the session

For journalling records that have a time span, this field of the event signals the end timestamp.

String

correlationId

Correlation ID for a synchronous or quasi-synchronous call

Unique Id that is created as early as possible (ideally by the initiator) and then guided through the whole call hierarchy to create traces of calls.

Unstructured and semistructured data

String

message

Log Message

All the information that is not part of other fields

String

additionalInfo

semi-structured data

Business or other data. Technically this is a text field. It is recommended, though, to use JSON syntax, because the front-end can interpret it and display a tree structure. Special Case of additionalInfo: External Links. This can be rendered in the UI as Link with following Syntax: additionalInfo.extlink.ref : The URI for the external Link; additionalInfo.extlink.name : The DisplayName for the Link.

Result section

String

resultCode

Code if the record represents a task of any kind

HTTP record code, Exception, Error

String

errorMessage

Error Message

Any standardized code or message the sending solutions wants to log.

Boolean

businessError

Business Error

Sometimes business errors are stored as normal messages. It is up to the application to decide which message is a business error or a message. This value should be true for business errors

Status

normalizedStatus

Status field red/yellow/green

This field is for the user, giving a hint about whether this log record represents ok status, a warning or an error. enum Status { red yellow green }

Technical information

String

thread

Name of the server thread

String

logger

Software origin

Name of the class and method(optional) which logs this message

Long

durationMs

Duration of a call in milliseconds

String

logProcessingError

StackTrace of the log processing error.

This is not provided by the client solution but used if anything goes wrong in CLJ log record processing.

3. About CLJ

CLJ is a proposal to harmonize logging in an environment where multiple software building blocks are working together in order to fulfill shared requirements.

CLJ is a design blue-print, a proposal how to align a share logging environment.

  • Which building blocks to position in order to have smooth operations

  • Which fields to care for, having a common naming convention

  • Think about the use-cases that support the organization

  • Grounded in a running system of a not-so-small bank subsidiary

  • Feedback and contribution highly appreciated.

  • Source: CLJ’s asciidoc sources are hosted at CLJ sources.

  • Twitter: @mcaviti

Authored by the CLJ team in www.s-itsolutions.at [s IT Solutions AT], lead by Klemens Dickbauer.