Somewhat recently I’ve been spending some time investigating graph databases, and Neo4j in particular. This first crept onto my radar a few months ago as it relates to an ongoing side project I’m involved with, but at the time I didn’t have the cycles to look into it. More recently another potential application for a graph database came up in the context of my regular day job. Paying attention to the ques I’m getting from life, therefore, it seemed like the time was right to get better acquainted with graph databases. Not that I’ve spent a bit of time on this I thought it would be worth writing about that.
So far I’ve focused more on Neo4j, which has been around for a little while now. I have started to look at Titan some, and Giraph is also on my radar. I know there are other graph databases out there as well, but for the moment it seems that looking at these three are plenty to keep one busy. For what it is worth, I have been able to get up and running more quickly with Neo4j.
Setting up the environment
Neo4j — as well as Titan — are java-based server applications, so you’ll want to have java set up on your server. There are some pretty good instructions for doing this on the Neo4j site too actually, so mainly I’m documenting things for (and from) my own experience. In my case I was setting up Neo4j 2.1.2 to run on a Fedora Linux cloud server.
1) Install Java
You’ll need Java set up. I’ve run this okay with OpenJDK but it is probably better to run with Oracle’s Java. So download that and get that set up since you’ll need that anyway before you can actually do anything interesting. (In my case I wanted the full JDK but if you just want to run things the JRE should be fine.)
To go with OpenJDK.... % yum install java-1.7.0-openjdk.x86_64
Or with Oracle’s Java… I kinda like to get the .tar.gz and unpack this under /opt myself. You can also get the RPM and install that. In the default AWS AMI Linux, /usr/bin/java points to /etc/alternatives/java, which itself points to openjre. I just change that to point to Oracle’s Java.
% mv jdk-8u20-linux-x64.tar.gz /opt % gzip -d jdk-8u20-linux-x64.tar.gz % tar -xf jdk-8u20-linux-x64.tar % ln -s jdk1.8.0_20 jdk % cd /etc/alternatives/ % mv java openjre-java % ln -s /opt/jdk/bin/java java % which java /usr/bin/java % java -version java version "1.8.0_20" Java(TM) SE Runtime Environment (build 1.8.0_20-b26) Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)
If you work in Python, then while you are at it you may want to install the python library to access the database.
% yum install python-py2neo.noarch
2) Create a neo4j user on the system.
This is recommended practice I think, but isn’t super-required. Pretty much you can run under whatever user you like.
3) Install Neo4j
Go and download the Community Edition of Neo4j appropriate for your environment. (There is also an Enterprise Edition but you should have a license for that, although if you fill in a form on the Neo site then you can play around with the Enterprise Edition for free if you are just kicking the tires, doing a student project, etc.) The Community Edition is released under an Apache License.) Unpack this somewhere convenient… perhaps just /home/neo4j or /opt/neo4j, as suits your preference.
4) System adjustments
Increase the number of open files allowed.
If you were to start up the Neo4j server you will probably get a message that looks like this:
[ec2-user@ip-172-31-22-223 neo4j]$ ./bin/neo4j start WARNING: Max 1024 open files allowed, minimum of 40 000 recommended. See the Neo4j manual. Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -Dlog4j.configuration=file:conf/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled Starting Neo4j Server...WARNING: not changing user process ... waiting for server to be ready....
So, edit /etc/security/limits.conf to add these lines:
neo4j soft nofile 40000 neo4j hard nofile 40000
5) Configure Neo4j
The default configuration should be fine to start with, particularly if you are just running on your laptop. But if you are running on a server you set up somewhere you will need to adjust the configuration to have it be accessible on something besides localhost (127.0.0.1).
In the ~neo4j/conf directory there are several configuration files. Edit the neo4j-server.properties file and un-comment this line:
6) Start the neo4j server.
In the neo4j bin directory you should see a file named simply neo4j. That’s a shell script that lets you start, stop, and get the status of the server.
[neo4j@yourserver neo4j]$ ./bin/neo4j start Using additional JVM arguments: -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -Dlog4j.configuration=file:conf/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled Starting Neo4j Server...WARNING: not changing user process ... waiting for server to be ready...... OK. http://localhost:7474/ is ready.
At this point you can bring up your web browser at the appropriate url for your server and you are good to go with working with Neo4j. The server has a very user friendly interface that allows you to start doing queries and adding in graph data. There is a lot of helpful tutorial information there right off the bat too.
In the next post I’ll cover more about the graph data model and working with Cypher — Neo4j’s query language.