Thursday, March 06, 2014

Evaluating NoSQL databases - MongoDB

Introduction

MongoDB is undoubtedly one of the most popular document-oriented databases.  It brings to the table the powerful queryability of relational database and distributed nature of NoSQL databases like HBase or Cassandra.   We will see in this post that MongoDB provides a sophisticated set of unique features and you can decide how it meet your needs.

Key Features 
  • High Availability through replicated servers and automatic master failover
  • ACID compliance at single document level including nested documents
  • Scalability is achieved through automatic sharding 
  • Provides a distributed filesystem known as GridFS and can be accessed from the command line
  • Built-in functions and UDFs are written using JavaScript
  • Provides in built support for Map/Reduce/Finalize
Administration
  • Provides an admin shell for administrative tasks
  • There is an UI for Administration through MongoDB Management Service (Third Party Service)
Migration
  • Migration of a system (application) from RDBMS to MongoDB requires complete redesign and refactoring of code not just switching of drivers.   Please see the Drivers section for supported integration interfaces.
Time-to-market
  • With a straightforward installation and administration model it has a very fast time-to-market.  It is better than HBase
Drivers
  • Drivers are available for Java,  C, C++, Erlang, C#, Perl, Scala, Ruby, Phython, PHP
Community Support
  • MongoDB's huge installation base is expanding very rapidly.
  • Sources of support are listed here
Cost
  • Since HBase is opens-source practically you don't have to spend anything for the product.   But there is talent and infrastructural costs
  • Runs on commodity hardware so the cost is reasonable
  • Vendors offer professional support which varies based on your needs
Prominent Users
  • SAP, MetLife, eBay, MTV,  SourceForge and many others use MongoDB in their production environments
Security
  • Control access to MongoDB instances using authentication and authorization
  • Controls access to sharded clusters with key files
Supported Operation Systems
  • Windows, Linux, Mac OS X, Solaris
Resources
Conclusion

MongoDB is widely adopted NoSQL database which can be used from mid-to-huge data volume requirements unlike HBase and other databases which are suited for huge volume of data.  It provides a familiar programming paradigm with JavaScript and provides Drivers for popular languages which makes it a convenient choice.

Saturday, March 01, 2014

Evaluating NoSQL Databases - Apache HBase

Introduction

There is no doubt that there is a growing trend of adoption of NoSQL databases from startups to large enterprises.  There is also array of database choices in the market and the purpose of this series of posts is to assist you in evaluating various NoSQL databases.  In this post we will look at Apache HBase, the column-oriented database.  Here we will explore various criteria with a goal of helping you make an informed decision.  HBase is open-source Apache top-level project since May 2010 and is part of Apache Hadoop ecosystem.  It touts itself as a fault tolerant and consistent database and is based upon Google's BigTable.

Key Features 
  • ACID compliant database that can run transactional applications
  • Each row may have one column to millions of columns and billions of rows.  It is recommended to use HBase for huge volume of data.
  • HBase supports two types of compression algorithms: Gzip (GZ) and Lempel-Ziv-Oberhumer (LZO).  LZO is highly recommended over Gzip but due to licensing issues LZO doesn't come packaged with HBase
  • Bloom Filter is a really cool data structure supported by HBase which answers the question: "Is this data present before?".
  • Out of the box versioning support, a unique feature which makes HBase stand out
  • Supports High availability through automatic failover
  • Architecture facilitates scaling out quite nicely so hardware can just be added on an on-demand basis
Administration
  • Thanks to tools from vendors like Cloudera, Hortonworks administration has become easier over the years and is improving. 
  • In order to achieve fault tolerance, data replication can be configured within data center or between data center racks.
Migration
  • Migration of a system (application) from RDBMS to HBase requires complete redesign and refactoring of code not just switching of JDBC drivers.   Please see the APIs section for supported integration interfaces.
Time-to-market
  • Availability of tested HBase packages from commercial vendors has enabled faster time-to-market
  • Thanks to Hortonworks, HBase is now packaged for Windows so its easier for .NET shops to ship to market faster
APIs
  • APIs are available for Java, Thrift and REST protocols.  Support is also available for Avro.
  • Spring-Hadoop integration supports HBase
Community Support
  • HBase has a fast growing community of companies.  Hadoop vendors are also investing heavily on HBase development as they see the adoption rate growing in the enterprise.
  • Sources of support are IRC channel: irc://irc.freenode.net/#hbase and mailing lists
Learning Curve
  • As we have seen before there is support for a variety of APIs in popular platforms and this must shorten the learning curve.
  • Since HBase is part of Hadoop ecosystem the talent pool is increasing quite rapidly.
  • Simple syntax for developers to learn and remember
Cost
  • Since HBase is opens-source practically you don't have to spend anything for the product.   But there is talent and infrastructural costs
  • Runs on commodity hardware so the cost is reasonable
  • Vendors offer professional support which varies based on your needs
Prominent Users
  • Facebook, Meetup, eBay, Ning, StumbleUpon and Yahoo! use HBase in their production environments
Product Roadmap
  • Ability to take snapshots/backups and restore them at later point of time in an on-demand basis
  • Monitoring and diagnostics tools
  • Improvement to reliability and high-availability
  • Cell-level security
Resources
Conclusion
As they say one size doesn't fit all and hopefully this post addresses the questions/concerns you have in your mind.  We have seen that we can use HBase where there is huge volume of data with columnar requirements.   Those are obviously not the only criteria but we may need consider other factors listed above while making the selection.

Saturday, January 24, 2009

APIs in Action

There are a number of APIs released by various services everyday. I always "want" to play around with them and understand the possibilities that they provide but most of the time not do so.

Thursday, January 01, 2009

MyEclipse HQL

Though it has been few versions since MyEclipse had support for Hibernate and HQL, I only had a chance to play with it recently. It has a nice feature set and the HQL editor is tied Dynamic Query Translator which gives on-the-fly translation for HQL queries. This provides for an easy way to debug queries and validate them.

twitterclone

In order to open a HQL editor you need to create a file with .hql extension and also have an active hibernate configuration file. This took me a while to figure out. In order create an active Hibernate Configuration use File -> New -> Hibernate Configuration File and setup the connection parameters for the database.

Sunday, November 30, 2008

Enterprise Social Networks

There is no doubt that with the growing trend in the usage of Social Networks in the consumer space strongly suggests that they are here to stay. The benefits of the networks as an enabler of online social interactions can be derived to add social intelligence in enterprise context. Such networks will also facilitate the transformation of organizational structure from traditional hierarchies into networked hierarchies. In this post I will walk through some of the technology options that are available out there for building Enterprise Social Portals/Networks:

Apache Shindig:

Apache Shindig is a Reference Implementation for OpenSocial specification. With this you can expose your existing Social Graph and it acts as a container for OpenSocial widgets.

SocialSite:

SocialSite is an open source initiative from Sun Microsystems providing a complete end-to-end user interface and an API for social networking. It supports widgets by providing an OpenSocial container by leveraging Apache Shindig in its architecture.

RingSide Networks:

RingSide Networks is an open source offering that enables any website with social networking capabilities by providing implementation for both Facebook and OpenSocial specifications. RingSide acts as a powerful container hosting social information and as a bridge to Facebook social information platform hence providing interoperability. The only issue with this is its still in Beta.

Liferay:

Liferay is also an open source offering which provides both enterprise portal and social collaboration platform software. The solutions provide components for Web 2.0 features like Blogs, Wiki, Calendar, etc. It also provides a way to host social graph and promises OpenSocial Apache Shindig integration.

Disqus for techtalk