ODP brings forth substantial result

Over the past couple of years, the ODP’s effort to produce a basic Hadoop distribution have been deemed controversial by many. Now it seems that these efforts are finally beginning to bear fruit, but it’s highly unlikely that it comes up to the critics’ standards.

Two key players in the SQL in Hadoop endeavors by ODP, Hortonworks and Pivotal, have joined forces in order to certify Hawq by Pivotal, an SQL layer for operating on Hadoop Data, on the Hortonworks Data Platform (HDP) distribution of Hadoop.

Hawq was, and still is a very much key part of the Big Data Suite by Pivotal, a set of tools intended for Hadoop, which Pivotal had previously made only available as a patented product. That said, Pivotal conceded and open sourced the components of the Big Data Suite (Hawq, Greenplum Analytical System and the NoSQL Gemfire database) under different licenses.

Now Pivotal is ensuring that every single one of these components will work as they are intended to with HDP and they are claiming that this shifts the focus from the proprietary configuration to an open source native environment, in order to provide with a lower TCO and a better SQL in Hadoop solution. Pivotal is now pitching Hawq to enterprises which are already investing in HDP at the same time showing a strong interest in SQL engines in order to build offload tasks from traditional organizational warehouses, use cases for analytical purposes and execute at a huge Hadoop scale.

Satisfying the organizations is only a small part of the big picture with this coalition. It is also meant to serve as a great example of building towards a better SQL on Hadoop solution. Since all the Hadoop distributions based on the ODP share some underlying parts, it is much easier for engineers to build on top of the Hadoop Platform with a mission to extend on it, lifting the restrictions of any of the Hadoop distributions.

Since its inception, the Open Data Platform has inspired a lot of controversy and dissension as much as its acquisitions of contributors and adherents. Some very prominent platforms have opted to stay out of this coalition, for instance there is Cloudera and MapR (not a very prominent but a significant distribution of Hadoop). In MapR’s opinion, the existing efforts of Apache Software Foundation makes the existence of Open Data Platform redundant if nothing else.

One of the Apache-sponsored projects, Ambari is a CMS (configuration Management System) for all the single components of Hadoop which Hawq integrates itself with. Claims made by Ambari reveal that MapR is being used by less than 25% of the users in the market, and thus falls short of offering a substantial offering than the other SQL in Hadoop solutions.

MapR said in a recent blog post that the interoperability of projects and subprojects in Hadoop is very good. Applications which are built on any one specific distribution can now be easily migrated with almost zero switching costs to the other distributions.

The Pivotals’ and Hortonworks’, ODP (Open Data Platform) yields some of the first tangible results, but it is highly unlikely that they’ll dispel controversy over the mission of the Open Data Platform.

Beginning With Big Data Analytics

It is very much undeniable that beginning with Big Data could disturb even the most sophisticated organizations, which could lead to a whole lot of confusion as well a plethora of money spent in the wrong places. It could lead to your organization to hiring a few quant big data analytics which could further mess up the situation. However, if you know how to begin with the whole ordeal, you wouldn’t even need to make use of these professional services, especially if the platform of your choice comes with some predefined and pre-configured modules.

Interesting enough, if the platform you chose is good enough and supports a massively parallel process (MPP) architecture, you might even be able to take complexity out of the analytics. Also, mesmerizing pattern matching capabilities as well as predictive analytics capabilities will be easily handled with the help of the right platform. So you need to do the following:

Identify the right platform

When looking for a suitable Big Data Analytics platform, you must look for automatic parallelization in a platform, along with full embedded processing. At the same time, it is very important that the platform of your choice utilizes the resources to their full potential. However, it will only be conceivable if the data loading procedures and capabilities are parallelized. In fact, choosing the platform also requires automated backups as well as their easy recovery to their previous state. Last but not the least, the platform of your choice must be easy to install and configure as well as upgrade. It should ensure that the whole platform will undergo parallelization if required.

Some important considerations

If you have some conventional data warehouses or RDBMSs deployed at your system, you should check if the platform of your choice could be easily integrated with them. If they don’t connect with your existing tools then you should look for some suitable connectors for carrying out effective and efficient Big Data Analytics. That said, once you have the right connector in place, you shouldn’t worry about what is going to happen after the deployment, all while considering the analytical richness of the reports, which cannot be ignored under any circumstances. Of course, if the platform if your choice fails to meet your analytical requirement then you might be choosing the wrong platform after all.

Be very cautious

Finding the right platform might be very helpful for getting you started with Big Data Analytics, but you’ll only be as good as the insights you derive from the data. For instance, you must make sure that the platform of your choice returns the same result as it did for the others. What it will do is provide you with great analytical insights. That said, while some consider the existence of MapR in the picture an essential but the abrupt rise in adoption for SQL in Hadoop solutions has rendered the Big Data playground very much different and dynamic for many organizations. So be very careful when selecting a Big Data platform for your organization.

In order to begin with Big Data Analytics, one must tread very carefully looking for the best possible platform for their organization.