What is Strata + Hadoop World?
Presented by O’Reilly and Cloudera, Strata + Hadoop World is where big data, cutting-edge data science, and new business fundamentals intersect and merge.
The conference now takes place four times a year in London, San Jose, New York and Singapore.
This is my guide to the biggest themes and presentations from two days of Strata+Hadoop World London 2015.
- The first day involved software tutorials and deep-dive’s, relating mostly to software in the Hadoop ecosystem, many given by the software authors or contributors. This provides an excellent opportunity to take a closer look at a particular technology and ask in-depth questions of people in the know.
- On the second day, the conference proper starts. Despite there now being four Strata + Hadoop World conferences a year, offers a packed schedule of speakers from many of the industry’s leading organisations. Speakers this year including people from Barclay’s Bank, Google, CERN, Accenture, Pivotal, Databricks, Dato, MapR, comparethemarket.com and a great many more.
There were two big themes evident through the conference, developments in Apache Software Foundation and the architecture required to deal with large quantities of batch and streaming data.
- Apache Foundation Open-Source Software has become the industry standard for Big Data processing, storage, and increasingly querying and analysis.
- Some examples you may have heard of: Hadoop, Spark, Cassandra, HBase, Kafka.
- Spark is likely to supplant Hadoop as the Big-Data processing platform of choice.
- Data Lakes and how to deal with large quantities of streaming data are two hot topics in architecture
- A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. They enable greater agility and range of applications as the raw data is always available.
- Lambda architecture is the (current) common solution to processing large quantities of streaming data.
More details can be found in my slide deck.