I fortunately beat Hurricane Sandy by seven days and spent most of last week at Strata + Hadoop World 2012, in what’s become the industry’s main event for Big Data.
Here are some quick thoughts from an action-packed week:
- Energy. A lot of people wonder if Big Data is just big hype. And while I think many in the industry are amused by the recent interest around data, I was pleasantly-surprised to see the genuine enthusiasm from folks at the conference. Attendees were very excited to meet, collaborate and find ways to pull the industry forward. As a data point, consider the conference lunches. While I find attendees at most conferences at mealtime tend to stare at their mobile phones or laptops while munching on a stale turkey sandwich and avoiding eye contact, the folks at Hadoop World that I sat with (randomly chosen) were eager to meet and exchange ideas.
- Real Customers. Another way to gauge the stage of an industry is to visually scan the badges of attendees. At many conferences, you’ll find consulting companies, vendors and job-seekers. At Hadoop World, I was shocked by the number of actual customers (e.g., data scientists, business analysts) that were present. And instead of carefully avoiding booth salespeople or stealthily stopping by to grab tchotchkes, customers were eager to engage with vendors and find solutions to their problems.
- New Stack. What I find most promising about the Hadoop ecosystem is that technologists are re-imagining the entire IT stack around data – from storage (whether HDFS itself or alternative storage layers like those from MapR, CleverSafe or others) to statistics and predictive analytics (like Alpine Data Labs) to app servers (with the annoying-spelled Continuiity targeting big data-driven applications) to data visualization (like Platfora). Existing vendors are aiming to (successfully or not) adapt to this new stack, while emerging companies want to use this shift to rewrite the vendor pecking order.
- OLAP. In particular, the most active part of the big data stack these days, is around OLAP. While Hadoop and the MapReduce programing framework are designed to process and analyze data, they are not known for the speed and interactivity that business analysts expect based upon existing Business Intelligence tools. As such, many vendors announced or marketed offerings around iterative, OLAP-style big data query capabilities. Cloudera announced the open source Impala project while Metamarkets open sourced its own related work on Druid. Vendors like Platfora separately had in-memory OLAP as a core part of their offering.
- Hadoop As Platform. On that note, what’s interesting is that because Hadoop is becoming such a standard and because customers will end up having so much data in HDFS, the industry is realizing that many capabilities (e.g., iterative OLAP queries) need to rebuilt on top of Hadoop – even though they don’t necessarily use the MapReduce batch programming model that drove Hadoop adoption in the first place. This has happened to other platforms in the past. Browsers were originally designed to link academic and scientific documents together. Databases were originally created for transaction processing. Yet once a platform becomes the standard, it gains inertia and it’s often easier to do new things (even things that the platform isn’t built for) on the platform, rather than to do them elsewhere.
- Storage. In any data-intensive platform, storage is often one of the key constraints. As such, there is a lot of energy around the storage layer in Hadoop. HDFS has been dramatically improved by the community over the past year, including adding ability for the central NameNode to be failed over and to no longer be a single point of failure. The HDFS roadmap sessions were completely packed at the conference. Similarly vendors like MapR, CleverSafe and others are trying to innovate around and augment the Hadoop storage capabilities. No matter which way you slice it though, the community is being forced to bake in enterprise-class storage functionality like global replication, disaster recovery and space optimization (e.g., through erasure encoding and compression) as Hadoop-based apps become more mission critical.
- Integrated Solutions vs. Components. Another big debate in the Big Data community is whether customers will choose end-to-end solutions from individual vendors or mix-and-match best-of-breed components. Like in all software categories, it will likely be a mix of both, with consolidation happening over time. Vendors like Datameer and Platfora offer capabilities across multiple layers of the stack (in their cases, ETL, OLAP cubes and visualization interfaces) while you can alternately seek and stitch together very low-level components like ODBC connectors to Hadoop, dedicated OLAP cube technology and targeted ETL tools.
- BI Players. In this mix, the incumbents that seem to have the most power in the Big Data ecosystem are the traditional Business Intelligence tools such as Tableau and SAS. These products are widely used by business analysts and data scientists and thus many Big Data platform vendors are scrambling to make sure they integrate with the existing interfaces. Indeed, some companies like Cirro are trying to make Big Data accessible via office productivity tools such as Microsoft Excel.
- Hiring. With all of this energy and innovation, needless to say, every vendor seems to be aggressively hiring. Heck, there was even a booth for the CIA (you got that right). I asked the CIA representatives what they were doing at Hadoop World and they said they are looking for data scientists. Or perhaps they were just there to watch us. I thought of taking a picture of the CIA booth but I figured it could have inadvertently placed me into some Langley file somewhere. I’m watching too much Homeland, I guess.
- Hilton. Finally, I still don’t understand why people hold conferences at the New York Hilton, where this event was staged. The place is old and stuffy. The WiFi can barely handle web browsing, let alone “big data.” And the rooms were constantly cold. Hopefully next year is Vegas!