Getting started with Janusgraph

one from siliconangle.com doc

In recent days, companies are increasingly using graph database applications in respective domains. Graph databases, such as Amazon Neptune, Janusgraph, Neo4j, IBM Graph, Anzograph, etc. are good for several kinds of applications involving highly connected data sets, such as providing recommendations based on the social graph, performing fraud detection, and providing knowledge graph-based product recommendations. This is where traditional SQL joins on huge dataset becomes inefficient on the relational database system.

Graph concept

One from docs.microsoft.com
One from docs.microsoft.com

I believe, just having a look at the above two images, you are smart enough to identify what vertices, edges, and properties are. So, I am not going into details here.

One of Benevolent AI’s Knowledge Graphs

As Janusgraph is found out to be an open-sourced & scalable🙂transactional database that supports the property graph model, we can build a social graph, knowledge graph representing data in context in a manner that machines and humans can readily understand. Now, it gives us enough reasons to get started with Jansugraph.

Let’s talk about Janusgaph :)

Janusgraph is open-source, distributed graph database with pluggable storage and indexing backends

Architecture for JanusGraph

one from programmersought.com doc

Janusgraph modular architecture supports third-party adapters.

In the Storage Backends section, we can plug and play with Cassandra, Hbase, or Bigquery, etc. into our needs. For example, Apache Cassandra is generally used for real-time cases when needed scalability and high availability without compromising performance and Hbase is for analytics stuff, etc. Here in this tutorial, we will set up with Cassandra.

Here as Index-Backend, we will go with Lucene. Later incoming tutorial, we will plug and play with the Elasticsearch setup which is required for indexing on multiple properties, full-text, geo-mapping, and string-search, etc.

The next question comes how are we going to interact with the graph? The easiest way is Gremlin Console. We can connect with python too, but that’s not our goal now. Usually, getting Janusgraph connected with python comes later when a basic setup is made and to develop an application in python that will execute queries against Janusgraph.

Think of Gremlin Console as a tool working with any TinkerPop enabled server. Janusgraph is TinkerPop enabled database engine. Gremlin Console is an interactive shell that gives you access to the data managed by the Janusgraph server also commonly known as Gremlin Server.

JanusGraph setup

Cassandra configuration

./cassandra -f

Janusgraph configuration

To configure Cassandra as storage backend and Lucene as index backend, we need to change the gremlin-server.yaml file, which sits inside /janusgraph-0.5.3/conf/gremlin-server directory.

Inside /janusgraph-0.5.3/conf/ directory you will see files with .properties extension which is basically used to configure storage backend and index-backend. For example janusgraph-cassandra-es.properties file, what it means Janusgraph provides cassandrathrift storage backend protocol and elasticsearch (for indexing purpose) for use with cassandra. cassandrathrift is the outdated protocol now. We will use Cassandra's newer communication protocol( at the time I am writing) cql to work on.

We will work with the janusgraph-cql-lucene.properties configuration file. Unfortunately, this file would not be available there. So let’s create that one and place it on the /janusgraph-0.5.3/conf/ directory.

janusgraph-cql-lucene.properties

gremlin.graph=org.janusgraph.core.JanusGraphFactorystorage.backend=cql#The hostname or comma-separated list of hostnames of storage #backend servers. This is only applicable to some storage backends, #such as cassandra and hbase.storage.hostname=127.0.0.1#This is the keyspace name where janusgraph will store the tables #and if this keyspace does not exist janugraph will create itstorage.cql.keyspace=janusgraphcache.db-cache = truecache.db-cache-clean-wait = 20cache.db-cache-time = 180000cache.db-cache-size = 0.5index.search.backend=luceneindex.search.directory=../db/searchindex

Janusgraph server does not use janusgraph-cql-lucene.properties file directly, in fact, it will use gremlin-server.yaml configuration file to point to added janusgraph-cql-lucene.properties file. We will have to edit the gremlin-server.yaml file to do this.

Edited section of gremlin-server.yaml

graphs: {
#graph: conf/janusgraph-inmemory.properties
graph: conf/janusgraph-cql-lucene.properties
}

Now we have to fire up another terminal window (we already kept Cassandra server running using one terminal window) and execute the following command.

./bin/gremlin-server.sh

Now Janusgraph will be up and listening on port 8182. Now we have Cassandra and Janusgraph servers running on the same machine.

Connect to the JanusGraph Server

./bin/gremlin.sh

It will open a gremlin console and now we will connect it to Gremlin Server by executing the following commands on the same console.

:remote connect tinkerpop.server conf/remote.yaml session
:remote console

To check everything is good so far, write “graph” in Gremlin Console and hit enter, it will show the following details.

==>standardjanusgraph[cql:[127.0.0.1]]

What this means graph uses cql as storage backend protocol and Cassandra server running on 127.0.0.1.

Explore Janusgraph a bit

g.addV(‘User’).property(‘name’,’subhendu’)

Executing the command, a node has been stored in the Cassandra database. To have a look at how Cassandra is storing data, fire up another terminal window, go inside the /apache-cassandra-3.11.0/bin directory, and execute the following command in the terminal.

./cqlsh

It will open up Cassandra client.

PS- If it shows an error, make sure you install python because cqlsh is a python tool.

Now execute the next command on the opened-up client console to see keyspaces available.

DESCRIBE KEYSPACES;

If everything goes fine so far, you will find one of the keyspaces named janusgraph as well. Let’s use this keyspace.

USE janusgraph;
DESCRIBE TABLES;

Then, we would be able to see all the tables created under the janusgraph keyspace. we can see one of the most important tables called edgestore where graph info, vertices, and edges are stored. If you execute the command below, you will have an idea of how Cassandra stores data. Though, this comes under Cassandra expertise area.

SELECT * FROM edgestore;

PS- We just used index backend Lucene here but not done with indexing. Once data will be loaded into the graph, we will do indexing

Wrapping Up

Oh yeah! you made it this far, Kudos to you😎!
If you find this article helpful then please hit the clap button and feel free to catch up in case you need help regarding this topic.

Machine learning engineer | Analytics Lead | Data science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store