Thursday, February 9, 2017

Graph databases and rapid prototyping (part 2)

In this part, I am providing a brief explanation of what a graph database is, with some examples on the data modeling and querying. I am also talking about the schema-less nature of the graph databases, and what advantages it provides when you just start your project.

Part 1: How I got hooked on graph databases
Part 2: What is a graph database
Part 3: Rapid prototyping
"Fool tidies up, a genius rules over chaos" - Albert Einstein

A graph database looks like a bunch of nodes with properties (key-value pairs) and relationships between them. Imagine you gathered data from multiple sources about people, their tweets, posts, likes, friend and followers. Each node may have some unique properties, and there could be similarities.
Rihanna is a friend with the monster, and she likes his posts
Similar nodes can be categorized using labels, such as "person", "monster" and "post". Relationships can have labels too, like "friend", "likes" and "publishes".

To retrieve information, you describe the shape of data you're looking for, and the database returns nodes and relationships that fit into this shape. To get posts that Rihanna liked, which were published by her monster friends, you can literally write something like this:
match (r:person)-[:likes]->(p:post)<-[:publishes]-(m:monster), (r)-[:friend]->(m) where = "Rihanna";
Did I just lose you? I apologize I am still figuring this out. Good news is that I have managed to bring is down from 3 code snippets to one. The point is that you can use labels and simple pseudo-graphic to describe sophisticated queries, and fish for insights from your data.

If you add a user interface that visualizes nodes and connections, and show their properties, you will get yourself a powerful data exploration, analysis and modeling tool. This is exactly what a graph database, like Neo4j, provides out of the box.1
"Let pedestrians define the walkways" - Anonymous
We can call this graph database schema-less 2, as you do not have to define a schema (e.g. tables and fields in SQL Server) to work with your data. You do not need to design your data structure in advance, and plan which queries you may need in the future. Instead, you can add properties, define relationships and assign labels as you go.

After some time, the structure of your graph database will become more-less stable. Now, you may want to clean-up unused properties and relationships, and revisit your labels. You may also add indexes to boost the performance of frequent queries and define constraints for labels. Constrains ensure that labelled nodes and relationships have a specific property, or the property value is unique.

Since indexes and constraints are schema elements, your database is no longer schema-less but rather a hybrid. In some systems, you can define the complete schema post-factum, resulting in a schema-full model.

Head to the Neo4j Getting Started page for more information. Great documentation there will help you get going in no time.
2 One may argue that a database with labels is not schema-less. I think that labels alone are just a convenience; you can use a property to achieve the same.

No comments:

Post a Comment