An overview on NoSql, Big Data and Graph database

13 May 2013 Estimated reading time: 3 minutes

NoSQL database on the web and the relation among them, big data and graph database. I want to understand how this technology work. BigData, NoSql and Graph databases are related.

NoSQL database

We have a very big and interesting topic. Handling big data with databases, infrastructure or data warehouse. Big Table. Look at Big Data, a concept about handling a large amount of data on big infrastructures and \ or data warehouse.

MapReduce

Check out the Map Reduce paradigm and its possible solutions

Acunu
Azkaban
Amazon Elastic MapReduce
Cascading data workflow
Cascalog fully featured data processing and querying library for Clojure or Java
Flume, distributed system for log
Greenplum
Hadoop
Hive data warehouse system for Hadoop
Oozie, workflow scheduler system
Pig platform analysis for Hadoop
MrJob
Caffeine
S4 Yahoo!
MapR

Storage

Amazon S3 simple storage service
Hadoop Distributed File System (HDFS)

Servers

Processing

R project
Yahoo! Pipes
Amazon Mechanical Turk, artificial intelligence
Solr/Lucene
ElasticSearch
Datameer
IBM BigSheets
Tinkerpop

Graph database Another kind of db strictly related to NoSql are graph databases. This innovative technology can let you think about a data structure based on graphs and not on tables. This will change the relations and it will let you think to create relationships more easier. But this wants to be an overview. I'll dedicate more posts on pratices. Now let's take a look to the most common graph databases on the web.

Allegro Graph
Arango Db
Core Data
DEX. From My Popescu: a high-performance graph database written in Java and C++. Its main characteristic is its performance storage and retrieval for large graphs, in the order of billions of nodes, edges and attributes, implemented with specialized structures.
Filament
Flex
Flock Db (on github)
Graph Base
Graph Pack
HyperGraphDB
Infinite Graph
InfoGrid
Neo4j
OpenLink Virtuoso
Orient Db
Sail RDF Store
Sones
Titan
Oracle Berkeley Db
VertexDB
Trinity

Graph database properties and features:

Blueprints, generic graph API. Check out BlueRedis feature
TinkerGraph, lightweight, POJO based, in-memory property graph
ACID set of properties that guarantee that database transactions are processed reliably (Wikipedia)
SPARQL and Gremlin are graph trasversal language
Frames object-to-graph mapper
Pipes
Rexster feat DogHouse browser-based interface
Tinkerpop technologies:

Natural Language Processing

Checkout natural language processing on wikipedia.

Natural Language Toolkit, leading platform for building Python programs to work with human language data.
OpenNLP
Boilerpipe
OpenCalais

Machine Learning

Handling big data and applying an artificial intelligence branch: Machine Learning!

WEKA, java project for data mining and machine learning
Mahout
scikits.learn, machine learning in Python

Visualization

Gephi, open graph visualization and exploration platform
GraphViz
Processing

Key-Value Stores

Amazon SimpleDB
Azure Table Storage
Berkeley DB (Oracle)
Chordless
Dynomite
GenieDB
GT.M / M.DB
HamsterDB
Hibari
KAI
KaTree
Kumofs
LightCloud
Membase
Memcachedb
Mnesia
NorthScale
Orient Key/Value Server
Pincaster
PNUTS/Sherpa
Project Voldemort
Redis
Riak
Scalaris
ScalienDB / Scalien Keyspace: a distributed, consistent key-value store
Tokyo Cabinet

Resources

As you can see all of these links contain a lot of material to read and study. And it's impossibile to know all of this stuff. But I simply want to understand how NoSQL and Big Data generally work. The main technology behind them and the future about these technologies. To go deep on these concepts can be very very changelling. Simply have fun, read, study and learn as much as we can. That's the only way to improve :)

People

Marko Rodriguez
Martin Fowler. Check out the great book NoSql distilled.
Eric Brewer