A blog about the people, culture, processes, and technology change needed to deliver value from data.
Home Resources Data Science in Business reading list

Data Science in Business reading list

by Harvinder
Reading list

I’ve put together a Data Science in business reading list from books I’ve found both practical and inspirational.

Getting started (and keeping up) with the field of data science can be daunting. These books are a great place to start.

Data Science for Business

Data Science for Business: What you need to know about data mining and data-analytic thinking

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

Predictive Analytics

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die

An entertaining read from former Columbia University professor and Predictive Analytics World founder Eric Siegel revealing the power, and perils, of prediction.

Thinking with data

Thinking with Data: How to Turn Information into Insights

Before you dive into the tools and techniques of Data Science you should make sure you’re answering the right questions. Max Shron shows you how to put the why before the how and ensure you’re turning data into the right insights.

Analytics at work

Analytics at Work: Smarter Decisions, Better Results

This book doesn’t focus on statistics and machine learning however it does detail the success factors required for analytical projects to succeed through the ‘DELTA’ framework: Data, Enterprise, Leadership, Targets and Analysts.

Elements of data analytical style

The Elements of Data Analytic Style

This book is focused on the details of data analysis that sometimes fall through the cracks in traditional statistics classes and textbooks. The author is one of the co-developers of the Johns Hopkins Specialization in Data Science the largest data science program in the world. Perhaps a bit basic for some people it is still useful as a companion to introductory courses in data science or data analysis.

Mining of massive datasets

Mining of Massive Datasets

The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining). There is a companion Coursera MOOC based on the book: Mining Massive Datasets MOOC

Data mining

Data Mining: Practical Machine Learning Tools & Techniques, 3/E (PB)

Written by the developers of the WEKA machine learning library this book provides an excellent grounding in machine learning techniques, especially tree-based models. A large chunk of the book is devoted to a hands-on walkthrough of the Weka data mining software.

Machine Learning Peter Flach

Machine Learning: The Art and Science of Algorithms that Make Sense of Data

Peter Flach gives a gentle introduction but thorough introduction to the diverse field of machine learning. It is packed full of examples that bring the subject to life but still covers the statistics and mathematics necessary to fully understand the topics.

An Introduction to Statistical Learning

An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)

This book provides an introduction to statistical learning methods. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

The elements of statistical learning

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)

This book is a valuable resource for anyone interested in data mining in science or industry. The book’s coverage is broad, from supervised learning (prediction) to unsupervised learning. While the approach is statistical, the emphasis is on concepts and examples rather than mathematics.

Data Science from scratch

Data Science from Scratch: First Principles with Python

Learn data science the hard way: by coding algorithms from scratch. A great way to learn the subject from the ground up as long as you have a little maths/stats and programming knowledge.

Python for data analysis

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

Although titled Python for Data Analysis this book is really about the Pandas library in Python created by the book’s author Wes McKinney. Pandas offers data structures (similar to R’s dataframes) and operations for manipulating numerical tables and time series. Along with the Numpy and Sci-kit libraries it is one of the major reasons for the popularity of Python with Data Scientists. Although slightly dated now (the Pandas API has moved on) it is still a useful reference.

Python machine learning

Python Machine Learning

A very practical data science book covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras. It features guidance and tips on everything from deep learning, data wrangling to data visualization

R programming for data science

R Programming for Data Science

Written by one of the instructors of the Coursera Data Science Specialisation this book teaches you to program in R and use it for effective data analysis. Just remember though, Data Science is not statistics + R.

Machine Learning with R

Machine Learning with R

As long as you have basic programming knowledge this book will get you up and running with R and machine learning. it covers common tasks, including classification, prediction, forecasting, market analysis, and clustering.

hello, startup

Hello, Startup: A Programmer’s Guide to Building Products, Technologies, and Teams

Don’t let the title mislead you. This isn’t just a book for budding startup founders or programmers; it’s a book for anyone interested in creating the best products, choosing perfect technologies and building outstanding teams. Packed with incredible references and insight into how Google, Facebook, LinkedIn, Twitter, GitHub, Stripe, Instagram, AdMob, Pinterest, and many others have created success.

The flaw of averages

The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty

Written in plain English this book explains why businesses and governments are so poor at decision making in the face of uncertainty and risk (hint: it happens when you pluck a single number from Excel and use it to present a range of uncertain and unknown future outcomes). You will never present outcomes and forecasts the same way ever again after reading this book.

Thinking Statistically

Thinking Statistically

A short but entertaining book focusing on three key topics in statistics we should always try and be aware of: selection bias, endogeneity and Bayes theorem.

Agile project management

Agile Project Management in easy steps, 2nd edition

Data Science has a lot to learn from the development community especially when it comes to delivering well-tested production ready data products. This book covers all the major methodologies under the Agile development banner (Scrum, Lean, XP, DSDM, etc.) and shows you in practical terms how they are implemented.

Ry's git tutorial

Ry’s Git Tutorial

Proper version control is a must for data science. Git is the most popular version control software around, it’s extremely powerful but the learning curve is steep. This free book is the best I’ve found at explaining the concepts of git and guiding the reader through tutorials from the basics to advanced usage.