
An overview on NoSql, Big Data and Graph database
NoSQL database on the web and the relation among them, big data and graph database. I want to understand how this technology work. BigData, NoSql and Graph databases are related.
NoSQL database
We have a very big and interesting topic. Handling big data with databases, infrastructure or data warehouse. Big Table. Look at Big Data, a concept about handling a large amount of data on big infrastructures and \ or data warehouse.
MapReduce
Check out the Map Reduce paradigm and its possible solutions
- Acunu
- Azkaban
- Amazon Elastic MapReduce
- Cascading data workflow
- Cascalog fully featured data processing and querying library for Clojure or Java
- Flume, distributed system for log
- Greenplum
- Hadoop
- Hive data warehouse system for Hadoop
- Oozie, workflow scheduler system
- Pig platform analysis for Hadoop
- MrJob
- Caffeine
- S4 Yahoo!
- MapR
Storage
- Amazon S3 simple storage service
- Hadoop Distributed File System (HDFS)
Servers
- Amazon EC2
- Google App Engine
- Amazon Elastic Beanstalk
- Heroku cloud application platform
Processing
- R project
- Yahoo! Pipes
- Amazon Mechanical Turk, artificial intelligence
- Solr/Lucene
- ElasticSearch
- Datameer
- IBM BigSheets
- Tinkerpop
Graph database Another kind of db strictly related to NoSql are graph databases. This innovative technology can let you think about a data structure based on graphs and not on tables. This will change the relations and it will let you think to create relationships more easier. But this wants to be an overview. I'll dedicate more posts on pratices. Now let's take a look to the most common graph databases on the web.
- Allegro Graph
- Arango Db
- Core Data
- DEX. From My Popescu: a high-performance graph database written in Java and C++. Its main characteristic is its performance storage and retrieval for large graphs, in the order of billions of nodes, edges and attributes, implemented with specialized structures.
- Filament
- Flex
- Flock Db (on github)
- Graph Base
- Graph Pack
- HyperGraphDB
- Infinite Graph
- InfoGrid
- Neo4j
- OpenLink Virtuoso
- Orient Db
- Sail RDF Store
- Sones
- Titan
- Oracle Berkeley Db
- VertexDB
- Trinity
Graph database properties and features:
- Blueprints, generic graph API. Check out BlueRedis feature
- TinkerGraph, lightweight, POJO based, in-memory property graph
- ACID set of properties that guarantee that database transactions are processed reliably (Wikipedia)
- SPARQL and Gremlin are graph trasversal language
- Frames object-to-graph mapper
- Pipes
- Rexster feat DogHouse browser-based interface
- Tinkerpop technologies:
Natural Language Processing
Checkout natural language processing on wikipedia.
- Natural Language Toolkit, leading platform for building Python programs to work with human language data.
- OpenNLP
- Boilerpipe
- OpenCalais
Machine Learning
Handling big data and applying an artificial intelligence branch: Machine Learning!
- WEKA, java project for data mining and machine learning
- Mahout
- scikits.learn, machine learning in Python
Visualization
Key-Value Stores
- Amazon SimpleDB
- Azure Table Storage
- Berkeley DB (Oracle)
- Chordless
- Dynomite
- GenieDB
- GT.M / M.DB
- HamsterDB
- Hibari
- KAI
- KaTree
- Kumofs
- LightCloud
- Membase
- Memcachedb
- Mnesia
- NorthScale
- Orient Key/Value Server
- Pincaster
- PNUTS/Sherpa
- Project Voldemort
- Redis
- Riak
- Scalaris
- ScalienDB / Scalien Keyspace: a distributed, consistent key-value store
- Tokyo Cabinet
Resources
As you can see all of these links contain a lot of material to read and study. And it's impossibile to know all of this stuff. But I simply want to understand how NoSQL and Big Data generally work. The main technology behind them and the future about these technologies. To go deep on these concepts can be very very changelling. Simply have fun, read, study and learn as much as we can. That's the only way to improve :)
People
- Marko Rodriguez
- Martin Fowler. Check out the great book NoSql distilled.
- Eric Brewer