Quantcast
Viewing latest article 4
Browse Latest Browse All 10

Lesson 8: Graph Databases

Summary:  Graph databases are your go-to choice when a relationship among the data items is key.

Up to about 1999 web search engines evaluated each web page as a standalone entity, ranking them based on content without regard to any other pages.  But in 1999 Google adopted PageRank, a graph-centered approach invented by co-founder Larry Page.  PageRank evaluates web pages in relationship to other pages.  Users quickly recognized that ranking pages based on their relationship to others resulted in much better recommendations and may be the single factor that moved Google rapidly ahead of its competitors. 

We are so used to thinking of databases as tables or at least buckets of information that it can be a little challenging to wrap your head around the concepts of graph databases.  That said, Graph DBs can do things that none of the other types of NOSQL or RDBMS DBs can do.  Making the effort to understand and utilize this type can offer big returns.

Characteristics

There are no classical indexes for Graph DBs.  Rather, each object stored is mapped with “nodes” and Image may be NSFW.
Clik here to view.
“edges”.
  A node is a single record that has at least one and potentially many named properties.  Edges define the relationship among nodes and both the nodes and their relationships have some predefined properties.  Nodes can have multiple edges defining many different kinds of relationships they have with other nodes.  Both nodes and relationships (edges) can be addressed with key values.

Search or query with Graph DBs is called “traversal”.  These queries are designed to start at a specific node and explore its relationship with other nodes based on the relationships requested.  A common example would be ‘what books are my friends reading that I haven’t yet read’.  In this mode, Graph DBs are often associated with ‘recommender’ engines widely used in social and ecommerce applications.

As Graph DBs become more dense, traversal search may require stopping at the same node several times which can slow the search.  As a result Graph DBs learn and index these common relationships to speed up search.

Advantages

  • Extremely fast for connected data.  While RDBMS can be made to replicate graphical ones, the extensive use of joins would make the technique quite slow.
  • Easy to query.
  • Able to quickly handle complex queries involving multiple levels of related data.

Disadvantages

  • Traditionally Graph DBs have been scaled vertically but not horizontally as searching nodes on different machines would dramatically slow the process.  Vendors did not support distribution or sharding.  This has made it difficult for Graph DBs to scale beyond a certain size.  However, some vendors are challenging in this area.
  • Requires a conceptual shift in thinking for developers so some learning curve will be required.

Particular Opportunities and Project Characteristics

  • Traditionally recommender engines (any ‘recommended for you’ rating) have been based on Graph DBs.  Note that some recommenders are now also being built using Column Oriented DBs.
  • Use where objects have both dynamic properties and dynamic relationships among objects.
  • Applications requiring very deep and complex joins in RDBMS can be replaced by Graph DBs typically with increases in speed greater than 100X.

Some sample use cases:

  • Model and store 7 billion people objects and 3 billion non-people objects to provide an earth-view drill down from planet to sidewalk. (Neo4J)
  • Tracking food sources from seed to table. (Objectivity, Inc.)
  • Ad placement applications (Objectivity, Inc.)
  • Network management.
  • Genealogy
  • Public Transport links, road maps

Representative Vendors (not a recommendation): Neo4J, Infinite Graph, InfoGrid, HyperGraphDB, AllegroGraph, BrightstarDB, and many others.

 

July 23, 2014

Bill Vorhies, President & Chief Data Scientist – Data-Magnum - © 2014, all rights reserved.

 

About the author:  Bill Vorhies is President & Chief Data Scientist  of Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001.  He can be reached at:

Bill@Data-Magnum.com

This original blog can be viewed at:

http://data-magnum.com/lesson-8-graph-databases-including-object-dbs/

All nine lessons can be downloaded as a White Paper at:

http://data-magnum.com/resources/white-papers/

 

Originally posted on Data Science Central


Viewing latest article 4
Browse Latest Browse All 10

Trending Articles