Hadoop: What It Is And Why It’s Such A Big Deal

Guest blog post by Francesca Krihely.

Here’s a prediction and a challenge, rolled into one. Whatever the level of your present understanding of Hadoop, in short, you’re going to hear a lot more about Hadoop in future.

And the challenge? Well, it’s this: whatever the level of your present understanding of Hadoop, you’re also likely to be missing critical pieces of the jigsaw. Which pieces? Read on.

Hadoop, let’s first of all remind ourselves, is an open source data platform which performs a very neat trick. Simply put, Hadoop is a tool for tying together multiple servers into single, easily-scalable clusters, ideal for distributed data storage and processing.

So it’s not too difficult to see just why Hadoop has been so phenomenally successful.

For one thing, by allowing organizations to piece together clusters from inexpensive commodity x86 servers, Hadoop sharply cuts the cost of cluster construction. And being open source, Hadoop not only works well with other open source technologies, but also offers an attractive—and surprisingly affordable—cost of initial acquisition and ongoing ownership.

All of which does a lot to transform the prospects of Big Data within even the most cash-constrained organizations. And so, by happy coincidence, even as Big Data has become all the rage, the price of entry to the party is pretty much open to all.

In short, thanks to Hadoop—and other allied open source technologies—organizations can readily store, extract and analyze data in volumes that would recently been unthinkable. And, what’s more, do it at costs that would recently have been considered unbelievable.

Now, why does this matter? Why is the ability to inexpensively store and analyze large data sets so valuable?

For one thing, it’s relatively new. Until quite recently, the large data sets within most organizations might have been large, but they certainly weren’t all-encompassing. Lots of data was simply thrown away, as the cost of storing it exceeded the likely value of keeping it or analyzing it.

The result? There are likely to be all sorts of interesting—and profitable—linkages and relationships out there, just waiting to be discovered. Structured or unstructured data, within existing database schema or not, to Hadoop it makes very little difference.

Then there’s the silo effect. Think large data sets in the corporate world, and you’ll typically think of ERP systems, where transaction volumes are high. Maybe so, but—to pick just one example—the data captured second-by-second on the factory floor by machine tools and quality systems is every bit as extensive. Stored separately, managed separately, and analyzed separately, such data exists in a silo of its own. No longer, perhaps: with Hadoop, all of it can be captured, and stored as fast as it is generated.

Better still, to repeat the point, Hadoop lowers the cost of entry to Big Data—and impressively so. So Hadoop—and the open source tools typically deployed alongside it—can be thought of as having something of a leveling effect, bringing Big Data to all organizations, and not just those with the biggest budgets.

The economic benefits of this? It’s difficult to say. But the last time we saw a truly transformational step change in technology—this time in terms of connectivity and inexpensive processing power—business startups certainly benefited disproportionately.

For proof, look no further than three such startups: Amazon.com, eBay and Google—the latter, as it happens, today delivering some of the driving force in bringing the technology behind Hadoop to fruition.

Even so, the road ahead isn’t without a few bumps. Chief amongst which is Hadoop’s lack of query and analysis tools. Truth be told, Hadoop is arguably more of a data warehouse than a database—a great way of inexpensively storing data, but not such a great tool for making sense of that data.

In the short term, this isn’t a problem: organizations and their IT functions are simply grateful to have Hadoop at all, opening up the possibility of running queries and analytics against data sets of such size.

But in the longer term, it’s clear that these organizations and their IT functions will have to engage with more than just Hadoop if they are to deliver on the promise of Big Data.

For profitable Big Data insights, in short, Hadoop is a necessary condition, but not a sufficient one.

Author Byline

Francesca Krihely is the Community Marketing Manager for MongoDB, the leading NoSQL database. In this role she supports MongoDB community leaders around the globe. She lives in New York City.

Originally posted on Data Science Central

Hadoop: What It Is And Why It’s Such A Big Deal

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112