The Power of a Graph Database

Written by: Dan Campbell

Reviewed by:
Updated: August 20, 2021

Table of contents

Recently, LogicGate's engineering team undertook the effort to migrate our application from a relational database to a graph database. To better understand this decision and its benefit to LogicGate users, it's important to know a little bit about LogicGate and a little bit about databases in general.

LogicGate: The origin of the company

LogicGate's founders, myself included, came from the world of management consulting, specifically within the regulatory and compliance space. We had seen that technology solutions for the market were lacking and saw an opportunity to provide value to organizations across many different industries. After a year of research, design, blood, sweat, and tears we created the first version of LogicGate, a software as a service (SaaS) based web application. With the birth of a minimum viable product, and all the knowledge gained along the way, we began to see a clear path forward.

LogicGate would provide value to customers in a way that its competitors couldn't by continuing to pursue two pillars of the product: flexibility and visibility. The application needed complete flexibility to conform to the specifics of a changing organization. It also needed the ability to provide process owners clear insight and high-level visibility into their processes. Often software bolsters one at the expense of the other, but if LogicGate could excel in both it would be able to provide a truly valuable service for end users, managers, and executives across an organization.

Databases: The role of a database within a web application

In a typical web application, a database acts as the storage unit for all the information that can be presented to and manipulated by users. The first version of LogicGate was built with MySQL, which is a relational database management system (RDBMS). Historically, this type of database has been broadly utilized regardless of application specifics due to the lack of viable alternatives. There is also widespread support for popular relational database vendors which makes them very accessible. In spite of this, a relational database isn't the right tool for all the problems that web applications attempt to solve.

LogicGate’s value is centered around a unique combination of flexibility and visibility, and it became clear that MySQL wasn’t the right database for LogicGate. Instead, we realized that using a graph database, specifically Neo4j, would most effectively align our technology with our application’s goals.

Flexibility: How a data model can be flexible and an advantage for LogicGate

Flexibility within a data model is an abstract discussion, so it can be helpful to think in terms of an analogy. Imagine you have been tasked with organizing a library. Before you begin, you're allowed to see everything you'll need to organize. You note that this library only houses books. There is a relatively even mix of fantasy, mystery, romance, and non-fiction content so you group the library into sections for each genre and sort each section by author and title. Everything seems great. People come to the library and recognize that the system is sensible and efficient. You are praised for your work.

Now, imagine you have the same task except this time you aren't allowed to see what the library stores before you must decide how to organize it. After the success of your first attempt, you decide to use the same pattern. Unfortunately, when the library's content is revealed you realize this isn't a library for books. Instead this library is for animals. Some might call it a zoo. As you contemplate whether or not a tiger falls into the category of "fantasy" or "romance" you wonder how you got here.

Clearly, your organizational pattern needs to be flexible if you’re not able to predict what you'll have to organize. In a world where the library could receive something new and unique every day, a fixed system doesn't make sense. A relational database is a great fit for highly structured data that fits well into predetermined categories and types, like the library of books. In these situations it would be inefficient to support flexibility. Unlike a relational database, Neo4j doesn't require that data be rigidly structured. It allows you to change quickly and dynamically define how you want to organize your data. If your database can’t provide the flexibility you need, you could wind up sorting animals by genre, or more realistically, you could wind up bending your process to fit into your technology rather than the other way around.

At LogicGate, we realized this was beginning to happen with customers when we were using MySQL and we knew the flexibility of Neo4j would give LogicGate an edge.

Visibility: Six Degrees of Kevin Bacon

An exploration of the problem caused by a lack of data visibility.

To illustrate the concept of data visibility, consider another scenario. Although you’ve never seen a movie, you wind up participating in a game of “Six Degrees of Kevin Bacon”. The rules are simple, you create paths through actors based on films in which they both had roles. From a given actor, you must find a path to Kevin Bacon. Take, for example, actor Ian McKellen.

  • Ian McKellen was in X-Men: Days of Future Past with Michael Fassbender
  • Fassbender was in X-Men: First Class with Kevin Bacon

Therefore, Fassbender has a Bacon number of one, and McKellen has a Bacon number of two.

To help you connect the dots, you’re able to view filmographies for actors as well as cast lists for movies. This sounds great at first, but you quickly realize that without prior knowledge of the world’s actors and movies this ability is quite limited. The way you’re limited in your ability to quickly find a Bacon path is actually quite similar to the problem graph databases, like Neo4j, were designed to solve. Let’s continue with the example to make the limitation more tangible.

You begin your first round with the actor Keanu Reeves. After browsing the impressive list of films in which he’s acted, you begin to feel overwhelmed. You navigate to the cast list for one of his films, The Matrix. You read through the actors, but alas, no Bacon. Should you go back and open the cast list for another one of Reeve’s classics or should you explore further down the path you’ve begun with The Matrix by viewing Carrie-Anne Moss’ filmography? After hours of endless searching you have no idea if you’re any closer to Bacon. You have no visibility down the path to Bacon because you can only see movies and actors one step away from you. You are lost.

Imagine if instead of just viewing filmographies and cast lists for specific actors and movies you had some way to view the paths that exist between them. Better yet, imagine if you had a system that let you specify two actors and it would find the path of movies between them for you. You’d be the champion of Bacon. Although this example is contrived, it highlights a problem that a graph database is able to solve.

Visibility: A LogicGate example of data visibility and its value to customers

Graph databases treat the relationships between data as first-class citizens, which allows you to explore data in a completely different way compared to a relational database. Using LogicGate, a process owner doesn’t need to step through each of her business processes to find out which risks have the highest threat to the organization. Instead, she can open a custom dashboard that summarizes this information. While a similar report could be developed in an application built on top of a relational database, the underlying logic would be more complex. Imagine a simple data set inline with the aforementioned process.

In order to retrieve the risk associated with the business unit, a relational database would require explicit querying to first locate the business unit, then find the associated business processes, and then find the risks associated with the business processes. Just like with the Bacon example, there is no way to see further than one step away at a time. Only after constructing the path step-by-step can a report be generated to provide the desired visibility. While this may not seem like an issue in such a simple scenario, the complexity of such searches can compound rapidly as data becomes more distantly linked and more tangled relationships are formed. Eventually, this will lead to slow, or incomplete searches.

To demonstrate, let’s assume a small nuance is added to this process. Instead of just linking risks to business processes, some risks may be directly linked to parent business units.

Generating the risk dashboard has become more cumbersome in a relational database. It must first find the business unit, then find all the risks directly linked to the business unit, then find all the business processes linked to the business unit, then find all of the risks linked to those business processes, and finally combine the two sets of risks. To make matters worse, suppose the report wasn’t originally designed to find risks unless they had been linked through business processes. Now the report would be incomplete and misleading (ironically adding a new risk to the organization).

A graph database, on the other hand, could use the same logic to build the report in both situations. Since it’s built to efficiently manage the relationships between data like this, you can simply ask it to find all of the risks through any path to the business process. It doesn’t matter if you don’t initially know how data might be connected because the database will figure it out for you. As a result, regardless of how complex processes and their associated relationships may become, LogicGate is able to provide visibility in a scalable way.

Conclusion

These are just some of the ways LogicGate is able to leverage Neo4j’s power and we are just beginning to unlock all of the exciting functionality and features we can build as a result. Is a graph database the right tool for everything? Absolutely not. The point is that the right database should be used for the right job. After realizing that LogicGate and Neo4j harmonized together around goals of data visibility and flexibility, we made the switch.

Since then, LogicGate has truly been able to hone in on the power of our platform - providing our customers with the ability to easily customize LogicGate’s risk management solution to align with their processes and increase visibility into risks & compliance activities throughout the organization.

Related Posts