How to use Node.js and node-grok (1.0.8) module to define and reuse robust regular expressions and parse log files and other unstructured text data.

I was highly impressed with grok filter module for popular log collection framework logstash. This module takes an arbitrary string and using regular expression, defined with special extended syntax, transforms it into a nicely structured object. This syntax is extremely convenient for defining and reusing complex regular expressions. Another powerful feature is the ability to name different parts of expression defining to what destination field that bit of data will go. Grok comes with a large (and extensible) collection of predefined regular expressions for almost any case you can think of.

Ever googled for things like “Regular expression for IP” or “Regular expression for ISO time” and then spent precious minutes trying to understand which one works better? With grok syntax you could define it as simple as  "%{IPV4}" or  "%{TIMESTAMP_ISO8601}" . These are just two out of numerous built-in patterns you can use out of the box. Additionally, you can define your own patterns using either plain regular expressions

or combining already existing patterns

As you can see, pattern declaration in grok consists of a pattern name and pattern definition (which can be a plain regular expression, references to other patterns or combination of both).

Our final goal is to define a pattern with named fields that can be used to transform a string into structured object. Let’s define a simple pattern for combination of IP address and ISO-formatted time that will generate an object with fields client and timestamp respectively.

Now, if we apply this pattern to a string like

we will get a nice object in return

If we pass a string that doesn’t match expected pattern

we will get an empty object

As you can see, grok is an awesome tool. Only one problem, in my opinion. It’s implemented in ruby and my platform of choice is Node.js. Adding ruby to our software stack (and learning it) just for the sake of grok didn’t look like a good idea.

Goal

Have in Node.js functionality similar to the original grok module.

Process

First, I checked available npm packages like I always do, but couldn’t find any ports or even modules with similar functionality.

That’s when node-grok module was created. Initially, I tried to use JavaScript built-in Regex engine but very soon discovered that it lacks some important bits of functionality (look-behind constructions for example). After some research, a nice Regex library oniguruma was discovered and utilised (this is the same library that original grok module uses). It was a perfect fit – huge kudos to the developers as well as to all contributors of node-grok.

Result

Goal achieved, node-grok module was developed and published.

Node.js project example

Now, let’s create a simple Node.js project demonstrating capabilities of node-grok npm module. I assume, you have Unix-like OS and Node.js installed.

First, run following commands to create, then initialise new project and open vi editor.

Type in or just copy-paste following code into editor

Then save the changes and execute it

As you can see, all works as expected.

More examples

Most of methods in node-grok have synchronous counterparts. Here’s what fully synchronous version of our previous example would look like

In order to read and process a whole log file, we need to install line-reader module

And the code would look

 

 

API description

  • loadDefault(callback, [modules])  – initialises module. Callback  function receives GrokCollection  object with preloaded built-in patterns. Optional modules argument can be used to load only subset of patterns – string or array
  • loadDefaultSync([modules])  – synchronously initialises module and returns   GrokCollection  object with preloaded built-in patterns. Optional modules argument can be used to load only subset of patterns – string or array
  • GrokCollection() – class representing a collection of patterns
    • createPattern(expression, [name]) – creates a new  GrokPattern, adds it to the collection and returns it
    • getPattern(name) – returns existing  GrokPattern
    • load(filePath, callback) – loads patterns from the file. Callback is called when loading is complete. Method can be called multiple times to add more patterns to the collection
    • loadSync(filePath) – synchronously loads patterns from the file. Method can be called multiple times to add more patterns to the collection. It returns number  of newly loaded patterns.
    • count()  – returns number of patterns in collection
  • GrokPattern()  – class representing a single pattern. An instance of the class can only be obtained from existing GrokCollection object
    • parse(str, callback)  – parses sting and returns result in callback function(err, obj)
    • parseSync(str)  – synchronously parses sting and returns resulting object
Share This

Share This

Share this post with your friends!