Tag Archives: neo4j

Posts related to the Neo4j Graph Database.

Cypher and Neo4j: Part I

A few months ago I started working with graph databases. This post is part of a series aimed at documenting how to work with a graph database, particularly for those coming from a relational database background.

At a practical level, when first working with a database you want to know how to get it installed and running (which was the subject of an earlier post) and then how to do the basic CRUD operations: creating data, retrieving it, making updates, and then deleting things. The purpose of this post is just to focus on using Neo4j and the query language particular to that database, Cypher.

One of the very nice features of Neo4j 2 is that they included a very friendly way to interact with the database by just pointing your web browser at it. This is not how you will work with Neo4j at scale but when learning to use it, it is invaluable.  When working within the browser you generally enter one statement at a time. If you try to put two statements separated by a semi-colon, Neo4j will get confused.

In Neo4j you work with a property graph data model, which is to say a set of nodes with connecting edges, both of which can have properties — or attributes if you prefer — attached to them. In Neo4j nodes and edges can also have Labels, which you can think of as allowing you to define a type of node or edge. (This is in contrast to Titan, where edges can have labels, but not nodes.)

The first thing to note is that in Neo4j, you do not need to set up a schema first before you can start loading in and using the database. In fact there isn’t really a concept of a schema like you would have in a relational database, and in Cypher there isn’t an equivalent of the Data Definition Language (DDL) that we have in SQL; there aren’t equivalents for ‘create table’, ‘drop table’, ‘alter table’. If you are coming from primarily a relational database background then this feels a bit odd certainly.

Creating a Node

The example below shows how we might create a single node with a couple of attributes. Note that the indentation here is purely to enhance readability — Neo4j will process this fine if it is all on the same line.

CREATE ( Person { name: "Alice",
                  email: "alice@wherever.com"
                }
      )

Cypher is deliberately designed to feel like SQL. Instead of SQL’s ‘INSERT’ though we add new nodes to the database with a ‘CREATE’ command. Now we’ll deconstruct the rest of this statement: The ‘Person’ part is a label that allows you to identify the particular kind of node this is.  Unless you have a graph in which all the nodes are the same type, then you will likely want to add a descriptive label here. The label needs to start with a letter but it can be composed of letters and numbers.  All the properties for this node are in curly braces as a comma-separated list of key-value pairs, with a colon separating keys from values (instead of an equals sign.)

Note that there is nothing here in this statement that identifies a primary key. Nor do we have a concept of referential integrity between kinds of nodes. Enforcing uniqueness is possible using the MERGE command, which we’ll get to in a later post. Internally in Neo4j there is a unique node id and you can make use of that, but you shouldn’t rely on that much.

Just for comparison, if this was a relational database the equivalent statement in SQL would be something like:

INSERT INTO Person(name, email) VALUES('Alice', 'alice@wherever.com');

Searching for a Node

Now that we have a node in here, how would we search for it? In general we can search using Cypher’s MATCH statement, which you can think of as the equivalent of SQL’s SELECT. Like SELECT, we use MATCH all the time when working with Neo4j.

MATCH ( k{name:"Alice"}) RETURN k

In this statement we have a ‘k’ in there before the attribute list. That is a variable that we can use elsewhere in the statement, and in fact we use it at the end in the RETURN clause to actually return the value. In fact, MATCH requires a RETURN, SET, or DELETE clause at the end, otherwise it the statement is considered incomplete.

If you run this command in the browser, Neo4j 2.x will give you a D3-based visualization of your result set. You can click on the node(s) to show all the attributes. This kind of feedback makes learning and developing your graph statements in Neo4j very helpful in fact.

Our statement here returned the entire node. If you want just a particular attribute you can return that.

MATCH ( k{name:"Alice"}) RETURN k.email

Updating and Deleting a Node

If we have a node that we just want to update a value in, or add another key-value pair? Again, we use MATCH but this time we end with a SET clause.

MATCH ( k{name:"Alice"}) SET k.email='alice@wherever.com'

In this case we updated the email column. If we wanted to add a new property we would just list it — there is no syntactic difference between updating an existing property or adding a new one. In SQL we would first have to alter the table to add a place for the new column and then we could set a value.

Deletion is very similar to updating — we just specify to delete the node at the end instead of returning it.

MATCH ( k{name:"Alice"}) DELETE k

 A point on deleting nodes — Neo4j will not let you delete a node if it still has edges connected to it. You first have to delete the edges and then the node. But as we’ll see you can do that in one statement.

Creating Relationships

So, this is a graph database. A graph database with only nodes is kinda dull and uninteresting really. So how do we create connections?

First let’s add a few more nodes to our system for demonstration purposes. We’ll also re-create ‘Alice’ since we deleted that node above. And, while we’re at it, we’ll also do this in one statement to show how to add multiple nodes at a time.

 CREATE ( p0: Person { name: "Alice",
                       email: "alice@wherever.com"
                      } ),
        ( p1: Person { name: "Ezekial",
                  email: "zeke@nowhere.com"
                } ),
        ( p2: Person { name: "Daniel",
                   email: "dan@nowhere.com"
                  } ),
        ( p3: Person { name: "Bob",
                   email: "bob@nowhere.com"
                  })

A couple things to note here: we added a variable before each person we added (p0, p1, p2, p3). When adding multiple nodes we need to add something to distinguish between these, and as we create more involved queries the utility of that will become more evident.  For now take it as read that if you omit that, Neo4j will complain that ‘Person’ was already declared.

Now let’s find ‘Alice’ and create a relationship to Bob.

MATCH ( p1 {name:"Alice"}), ( p2 {name:"Bob"}) CREATE (p1)-[r:IS_FRIENDS_WITH]->(p2)

So… this takes a little deconstruction. We started with our MATCH statement but instead of just retrieving one node we retrieved two. This is where those variables — p1 and p2 — come into place. You can think of them as being kinda/sorta like aliases in SQL.

 Once we find the two nodes we can create the link between them. Edges are always directed edges in Neo4j, and the edge is represented with the ‘start’ node followed by the relationship label and then the second node. The usual way of describing this is to think of an arrow connecting the two, as you might write it in ascii-art:  ‘(first node)-[r:RELATIONSHIP_LABEL]->(second node)’.  That ‘r’ is arbitary but you do need a variable there, otherwise Neo4j will give you an error ‘A single relationship type must be specified for CREATE’.

 Searching on Relationships

At this point we have something that is becoming a more meaningful graph, albeit a small one. We have a few nodes and a relationship between a couple of them.

MATCH (p1 {name:"Alice"})-[r:IS_FRIENDS_WITH]->(p2) RETURN p2

Again, we use MATCH like we would use SELECT in a relational database. In this case we specify the relationship with that ‘arrow’-like syntax. You’ll notice that we specified p1 to be ‘Alice’ by specifying the attribute, but we didn’t do so for p2 — p2 is what we want to find in this query. When you run this you should see just one node returned, ‘Bob’.

Part 1 Summary

At this point we have covered the very basic operations involved in creating, updating, and deleting nodes, and we started in on how to create and query on edges. In this next post on this topic we’ll continue the discussion on setting up edges and more involved queries.