1. An Architecture Blueprint for a Central Logging System

1.1. Introduction

Logging is mostly treated as a local affair: Of an application or solution, a team, or even a single developer, maybe a group thereof. But the increasing complexity of software systems increases the effort to draw the right conclusions from a large heap of heterogeneous logging records.

The trend from large monolithic application building blocks to smaller — but therefore intensely interconnected — software components is present. Therefore s IT Solutions AT (IT subsidiary of Erste Bank and Sparkassen in Austria, www.s-itsolutions.at/) started a project to care for a central logging and journalling data lake.
The architectural pattern here is a description along the line of this "Central Logging & Journalling" solution/application of sIT. It is targeted not at smaller systems but tries to deal with enterprises and larger IT landscapes.

So, if you feel your path to wisdom by reading logs looks like this, you’re a happy developer already.

But if it looks a bit more like this, further reading could maybe help you.

1.2. Goals

Logging is not an end in itself — it enables many use-cases that can be grouped into four partitions:

[A]. Support: Find out about what the system did in its runtime, in order to detect the source of problems or give information to other stakeholders. Mostly use-cases here investigate in exceptional program behaviour.

[B]. Compliance: The number of regulatory use-cases increases; the run-time behaviour and intermediate data of software must be documented, often for many years.

[C]. Monitoring & Alerting: When a stream of logging data exists, it is self-evident to use this stream also to find out about the system state as well as problems and report those via multiple channels.

[D]. Analytics & Intelligence: Sophisticated tools allow data mining, BI etc. to find out ways to improve the business, may it be by exploring customer behaviour, may it by predicting operations problems, or may it be something we don’t even dream of yet.

Table 1. Use-case groups
Support [A] Compliance [B] Monitoring & Alerting [C] Analytics & Intelligence [D]

* Customer Care
* Issue research
→ Access Security
→ Searchable, neartime

* Regulatory queries → Long term
→ Safe data store
→ Ideally certified
→ Unfrequent queries

* Stream analysis
* Alerting endpoints
→ Needs rules
→ High performing

* Statistics
* Big Data Analysis
* Machine Learning
* Predictive Analysis
→ Highly specialized toolset

1.3. Architecture

A possible architecture could make use of the following building blocks:

logical arch
Figure 1. Logical architecture of CLJ

The functional as well as the non-functional, or quality, requirements of the use-case groups aforementioned are very different from each other. Therefore it makes sense to use different software products to fulfill those requirements.

1.3.1. Messaging Brick

This building block cares for a reliable (real 24/7) component where the applications can load their logging records up to. It is high-performant, lean, stable and therefore capable to also swallow extreme high-load peaks.

This building block also cares for the use-case group [C]. The type of product is queue-like, our implementation uses Apache Kafka (kafka.apache.org/). Another example would be to use Amazon’s Firehose/Kinesis in an AWS-based environment.

1.3.2. Online Research Store

This record store is responsible for the structured and fast search for log records and to find out about connections between those.

Our implementation uses ElasticSearch (www.elastic.co/de/) together with a self-written ReST service and an Angular-base front-end

1.3.3. Compliance Store

Selected log records (defined by the solution) are persisted in this record store. It is very reliable, needs a back-up, is fast with writing and stores the record before any tampering with it. On the other hand it does not need a super-sophisticated query facility.

In our implementation we decided for Apache Cassandra (cassandra.apache.org/). Other possibilities would be to store those selected records in flat files and archive them, or use a RDBMS.

1.3.4. The Client Side

The applications for logging can send their log records either directly into the messaging brick or by harvesting them from the filesystem or another data store. Both methods have pro’s and con’s.

Table 2. Harvesting methods.
Direct transfer Logfile harvesting

+ Fastest
+ Possibly eliminates one component (file system)
- Technically a tight coupling

+ Non-intrusive to existing applications
- Needs another process (ressources + monitoring)

Direct transfer

Possibilities for applications to integrate are:

  • Own client libs of messaging brick

  • APIs for creating messages that fit to the data model of CLJ

  • Appenders for existing logging frameworks (e.g. log4j2 in Java, or log.net for C#)

Generally it is a good idea to offer integration libraries that care for the situations where the messaging brick suffers from a failure. In those cases the using application should not be brought down by logging, to mitigate the tight coupling issue.

Logfile harvesting

There are a lot of tools for that use-case, ranging from light-weight native apps that are integrated into the operating system up to full-scale ETL tools (en.wikipedia.org/wiki/Extract,_transform,_load).

A few examples:

In certain architectures, some of these products could server as the messaging brick itself.

2. Central Logging Datamodel

2.1. Partitioning of the log record space

Each record has its own id value, making it unique in all of the data stores.

For managing the stores (especially the online research store for [A]), though, it is necessary to organize the records in a number of dimensions. This separation then supports in the determination of

  • Access rights/permissions

  • Retention times

  • Backup strategy

With that, the integrated applications can gain a lot of control and flexibility for their data.

The dimensions that are suggested is a combination, fitting to the actual need, of those fields:

  • tenant (in case of a real multi-tenant system with separated accounts)

  • environment (if environments are not separated pyhsically or logically on the server side)

  • solution, this determines the organizational owner of the log records within a tenant

  • recordType, to distinguish between different needs of building blocks and types of logging and journal data.

2.2. Fields

This list of fields is a gross list of common values a log system could care for. Different applications in different contexts might use one or another subset of this enumeration, hardly setting them all. But, and that is the main reason for this list, values of similar semantics in a log record store should be named equally, to make traversing logs of different applications easier.
Mandatory fields are printed bold.

Table 3. NDM fields
Type Field Name Short Description Long Description



Technical id for the log record

This can be set by the client (if trusted to care for uniqueness) or be omitted and set by the client. The server allow the id to be reused (=update) for semantics like records of timespans (like, e.g., sessions). Proposed algorithm is UUID.

Header Fields, meta data of each record



Type of the record.

This is an unbounded enumeration, it’s free to select from the solution; anyway, it is recommended to use a known value (see subpage) to make recognition of the semantics of the record easier. Record types can be shared between solutions, e.g. session, activity and techInfo are record types that are used by applications. The record type is used for partitioning the CLJ data stores for the permission system, as well as a key in defining retention periods and archiving strategy.



Additional field to identify the event

Can be used as a type of the source log record. For example, if the recordType is 'serverLog', the recordSubType could be "tomcat" oder "weblogic".



Institute number

If needed, for organizations serving multiple iurisdictional tenant, this is the tenant code.



Environment identifier

If needed, when the development, test, staging, production etc. environments are not separted by dedicated data store instances but merged in one, this identifier determines from which environment a log record originates.



When the log record has been created

If the client does not provide this, or the given value cannot be parsed on server side, the processing engine will create a timestamp as next best guess.



Determines order

Often the record timestamp is not sufficient to discriminate and order a set of logrecords. E.g. ElasticSearch does not care for finer granularity than miliseconds. In this case the sequence field could store micros or nanos. Another possibility to use this field is that a client can care for a gapless sequence to be sure that no records are lost during transmission, procession and retrieval. A logging front-end can use this field as sole default order attribute or as secondary order attribute after recordTimestamp.



Level of importance, as provided by many low-level logging systems

This field is optional, it is also not normalized, meaning that whatever the client solution provides here will be taken as-is. A lot of logging libraries have their own mind on this topic.

User Info, information about the person or technical systems connected to the log record



Unique user id in its userType domain

This identifies the user or system uniquely within the domain given in "userType". This value gets more gain in the context of current data protection laws.



Domain this user account belongs to

Needed if different user domains should be distinguished — like internet users (customers) and intranet users (employees). Or when user domains ob subsidiaries are not clearly separated by the user ids.

Source Info, which component wrote this log record



Unique identifier of a solution

Identifies the Solution as unit in the IT landscape.



Id of functional building block

If needed, more fine-grained organizational partitioning.



Building block

More technical/architectural paritioning key.



System name of the server initiating the logging call

e.g. DNS of physical or virtual system



Client IP, originator of the log

The value might differ in regard of the nature of the originator (e.g. a browser-based application, or a batch)



Software that initiated the call

This field is used when the software and its version of the user/client is relevant. e.g. In web front-ends this identifies the Browser that has been used. The writing solution can give any information if it thinks that information about its caller makes a difference.




deprecated, might be removed in the future.



Identifies the server instance

e.g. the docker pod

Initiating solution



Code from initiating system

Inititiating systems are mostly user front-ends or batch processes.

Harvesting Info, where was the log record first persistet, might be different from the source solution



Syntax of the incoming data

Syntax of the incoming data (into the messaging brick). 'generic' means using this data model in JSON, this is the default value. If the syntax is not 'generic' the central logging service might be able to to a proper transformation.



Server Host Name

like sourceHostname



Server IP address

The system that provided the logging information, e.g. Apache host for access logs, or any other harvisting service running logstash, fume, rsyslog or a similar tool.



file name and path from which the log record has been harvested, if applicable

If logrecords are not sent directly to the messaging building block, but harvested from a logfile (by Logstash or a similar software) here this filename and path of the appropriate format (Windows, Unix, Mainframe, …) can be sent if needed.




Hierarchical predecessor of this log record.

Could be of a functional or sequential order Here a key of a hierarchical higher-level record can be set. So a tree-like structure of log records can be created.



Mapping context id field 1

Example: The id of a user session.



Mapping context id field 2

Example: The (use case) id of a user’s activity.



Mapping context id field 3

Example: The id of a explicit technical log record.



Mapping context id field 4



Start date of the record

For journalling records that have a time span, this field of the event signals the begin timestamp.



End date of the session

For journalling records that have a time span, this field of the event signals the end timestamp.



Correlation ID for a synchronous or quasi-synchronous call

Unique Id that is created as early as possible (ideally by the initiator) and then guided through the whole call hierarchy to create traces of calls.

Unstructured and semistructured data



Log Message

All the information that is not part of other fields



semi-structured data

Business or other data. Technically this is a text field. It is recommended, though, to use JSON syntax, because the front-end can interpret it and display a tree structure. Special Case of additionalInfo: External Links. This can be rendered in the UI as Link with following Syntax: additionalInfo.extlink.ref : The URI for the external Link; additionalInfo.extlink.name : The DisplayName for the Link.

Result section



Code if the record represents a task of any kind

HTTP record code, Exception, Error



Error Message

Any standardized code or message the sending solutions wants to log.



Business Error

Sometimes business errors are stored as normal messages. It is up to the application to decide which message is a business error or a message. This value should be true for business errors



Status field red/yellow/green

This field is for the user, giving a hint about whether this log record represents ok status, a warning or an error. enum Status { red yellow green }

Technical information



Name of the server thread



Software origin

Name of the class and method(optional) which logs this message



Duration of a call in milliseconds



StackTrace of the log processing error.

This is not provided by the client solution but used if anything goes wrong in CLJ log record processing.

3. About CLJ

CLJ is a proposal to harmonize logging in an environment where multiple software building blocks are working together in order to fulfill shared requirements.

CLJ is a design blue-print, a proposal how to align a share logging environment.

  • Which building blocks to position in order to have smooth operations

  • Which fields to care for, having a common naming convention

  • Think about the use-cases that support the organization

  • Grounded in a running system of a not-so-small bank subsidiary

  • Feedback and contribution highly appreciated.

  • Source: CLJ’s asciidoc sources are hosted at CLJ sources.

  • Twitter: @mcaviti

Authored by the CLJ team in www.s-itsolutions.at [s IT Solutions AT], lead by Klemens Dickbauer.