Life after 1 year of using Neo4J

Written by a.nikishaev | Published 2017/02/12
Tech Story Tags: neo4j | database | graph | java | high-availability

TLDRvia the TL;DR App

A year ago on one of my projects we got an idea that migrating to Neo4j would be great, cause we having data that would be ideal for graph. After that, our life changed forever.

I think Neo4j is like heroine, first you think that it’s the most awesome thing that you can even imagine, but after few months eyforiya goes down and you starting to understand that maybe it was not the best choice in your life.On the Neo4j site you can see many big clients like Ebay, LinkedIn, but in real i don’t know how and where they are using this DB, so if some of their developers can share some info in the comments it would be great. But for now i will tell you only my personal experience of using Neo4j.

Query language

Query language of Neo4j is called Cypher. It’s very simple and after few minutes of reading docs you can already make some non-trivial queries. Like most DBs it also has Explain and Profile commands, that are giving you possibility to understand what is happening under the hood of the query.But when you start making more and more complex queries you starting to see that you can’t understand how query is working, and after each change you need to use Profile. For example in SQL when you use Join directive, you already know that this will make query heavier, but in Neo4j you can change order of few rows that logically not lead to any changes, but on runtime it can lead to increasing query time from 0.05ms to 30sec. So making hard queries is some sort of magic. I think that one of the reason why Neo4j guys recommend to split all queries on small queries.

Query execution

Query watcherSo you make your first query, and run it on the server, but unfortunately you make a little mistake which return not 5 nodes but 5M nodes. In most DBs there is a watcher that looking for long running queries or queries that are using big amount of memory and can kill them to prevent DB from going down. Neo4j by docs also have one, but i didnt see that it really work. In the best scenario you will get error like “undefined — undefined”, in the worst you DB will go down, and maybe your server also.

Read all data first Remember, in Neo4j it’s not matter what are you doing it will make read query first. So for example in relational DB you want to delete all records — Db will go through each record and delete them, and it doesn’t matter how many do you have them. In Neo4j it first will try to get all the info from this records, and then run delete. In real life this mean, for example, that you can’t delete all 1m records from DB, cause you just don’t have much RAM for that. And to delete all of them, you will need to make script that will run queries with Limit until all records will be deleted. The problem of that is that you cant know how much data in the node and how to set limit so it doesn’t overflow RAM.

**Locking**That is another fun part. Locking works here in a different way that in many relational DBs. So for example in some rel DB when you making update query then query executor understand that and set write lock for field, record, etc, depend on locking policy. In the Neo4j there is only write lock which is set not before query starting to execute, but when part of the query will try to update something. So for example query MATCH (n:Test {id:1}) SET n.param=2 will add write lock to node only after making MATCH request. And that mean that on concurrent updates you will get problems. There is a big topic on Neo4j blog how to handle such problems, but for me it seems like a collection of hot fixes. Here it is: https://neo4j.com/blog/advanced-neo4j-fiftythree-reading-writing-scaling/

Connections

Another problem when you are going live — there is no connection proxy balancer like in Pg, and also there is no way to limit number of connection that DB can handle, and close other. This lead to use HAproxy and strange hand written scripts to achieve this and make DB more stable for live use.

High Availability

Neo4j has only one way for this and basically it’s a Master — Slave replication, it even doesn’t have master master replication. Also there is no way to set master priority for instances(which is good thing when you have installed plugins. i will tell about them further).

So for example you cant make DC-DC replication, you cant use Blue Green Deployment technic, or make two clusters. All this things you will need to make by you own by writing scripts, services and kernel plugins for Neo4J.Also developers of Neo4j in their blog also wrote that you should always check sync between master and slave cause sometimes it can fail.

Extensions

So there are 3 types of extensions for Neo4j — Unmanaged Extensions, Server Plugins and Kernel Extensions. I was working only with third one.Kernel extensions used when you need to add some additional functionality to how Neo4j work internally. In my case i was working with TransactionEventHandler which used to work with transaction events like beforeCommit, afterCommit, afterRollback.

It seems me a good way of adding some features that Neo4j doesn’t have from the box.

First thing that i understand that there is almost no documentation for that, and i was need to look over few created plugins, StackOverflow and other sites to get things together and make first try(Maybe if i would a Java developer it would be much faster, but i mostly work with python, and sometimes Android).

Second thing that i found that not all events works as they should. So for example in beforeCommit(which should be run when DB not changed) you cant access deleted nodes params, labels, relations cause they are already deleted. Yeah.. strange. Then afterCommit(which should be run after transaction commit and close) executed when transaction is still opened, which will lead to deadlock(without any info and exception) if you will try to update your local db(in some cases).

Third thing, is that every extension is run in global environment which lead to dependencies collisions, which if you have more then one plugin will lead you to fixing plugins for your case by hands.

I wrote some example for Kernel Extension that can help to start making new cool plugin: https://github.com/creotiv/neo4j-kernel-plugin-example

My conclusion

So i don’t saying that Neo4j is not working, as i know it’s one of the best graph DB that you can use for free, but it still very raw. And you should understand that if you task not very trivial then you will get some overhead for making it work with Neo4j. Also it’s not very fit HL+HA requirements.

I would be glad if you guys can share your experience of working with Neo4j here in comments.

Read my new story

How to forge a man out of yourself. Story of my life._Today I want to share with you few things that helped me become a better person than I was. It’s not some fast super…_medium.com


Published by HackerNoon on 2017/02/12