I’ve put together a Data Science in business reading list from books I’ve found both practical and inspirational.
Getting started (and keeping up) with the field of data science can be daunting. These books are a great place to start.
Data Science for Business: What you need to know about data mining and data-analytic thinking
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
An entertaining read from former Columbia University professor and Predictive Analytics World founder Eric Siegel revealing the power, and perils, of prediction.
Thinking with Data: How to Turn Information into Insights
Before you dive into the tools and techniques of Data Science you should make sure you’re answering the right questions. Max Shron shows you how to put the why before the how and ensure you’re turning data into the right insights.
Analytics at Work: Smarter Decisions, Better Results
This book doesn’t focus on statistics and machine learning however it does detail the success factors required for analytical projects to succeed through the ‘DELTA’ framework: Data, Enterprise, Leadership, Targets and Analysts.
The Elements of Data Analytic Style
This book is focused on the details of data analysis that sometimes fall through the cracks in traditional statistics classes and textbooks. The author is one of the co-developers of the Johns Hopkins Specialization in Data Science the largest data science program in the world. Perhaps a bit basic for some people it is still useful as a companion to introductory courses in data science or data analysis.
The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining). There is a companion Coursera MOOC based on the book: Mining Massive Datasets MOOC
Data Mining: Practical Machine Learning Tools & Techniques, 3/E (PB)
Written by the developers of the WEKA machine learning library this book provides an excellent grounding in machine learning techniques, especially tree-based models. A large chunk of the book is devoted to a hands-on walkthrough of the Weka data mining software.
Machine Learning: The Art and Science of Algorithms that Make Sense of Data
Peter Flach gives a gentle introduction but thorough introduction to the diverse field of machine learning. It is packed full of examples that bring the subject to life but still covers the statistics and mathematics necessary to fully understand the topics.
An Introduction to Statistical Learning: with Applications in R (Springer Texts in Statistics)
This book provides an introduction to statistical learning methods. The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.
This book is a valuable resource for anyone interested in data mining in science or industry. The book’s coverage is broad, from supervised learning (prediction) to unsupervised learning. While the approach is statistical, the emphasis is on concepts and examples rather than mathematics.
Data Science from Scratch: First Principles with Python
Learn data science the hard way: by coding algorithms from scratch. A great way to learn the subject from the ground up as long as you have a little maths/stats and programming knowledge.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Although titled Python for Data Analysis this book is really about the Pandas library in Python created by the book’s author Wes McKinney. Pandas offers data structures (similar to R’s dataframes) and operations for manipulating numerical tables and time series. Along with the Numpy and Sci-kit libraries it is one of the major reasons for the popularity of Python with Data Scientists. Although slightly dated now (the Pandas API has moved on) it is still a useful reference.
A very practical data science book covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Keras. It features guidance and tips on everything from deep learning, data wrangling to data visualization
R Programming for Data Science
Written by one of the instructors of the Coursera Data Science Specialisation this book teaches you to program in R and use it for effective data analysis. Just remember though, Data Science is not statistics + R.
As long as you have basic programming knowledge this book will get you up and running with R and machine learning. it covers common tasks, including classification, prediction, forecasting, market analysis, and clustering.
Hello, Startup: A Programmer’s Guide to Building Products, Technologies, and Teams
Don’t let the title mislead you. This isn’t just a book for budding startup founders or programmers; it’s a book for anyone interested in creating the best products, choosing perfect technologies and building outstanding teams. Packed with incredible references and insight into how Google, Facebook, LinkedIn, Twitter, GitHub, Stripe, Instagram, AdMob, Pinterest, and many others have created success.
The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty
Written in plain English this book explains why businesses and governments are so poor at decision making in the face of uncertainty and risk (hint: it happens when you pluck a single number from Excel and use it to present a range of uncertain and unknown future outcomes). You will never present outcomes and forecasts the same way ever again after reading this book.
A short but entertaining book focusing on three key topics in statistics we should always try and be aware of: selection bias, endogeneity and Bayes theorem.
Agile Project Management in easy steps, 2nd edition
Data Science has a lot to learn from the development community especially when it comes to delivering well-tested production ready data products. This book covers all the major methodologies under the Agile development banner (Scrum, Lean, XP, DSDM, etc.) and shows you in practical terms how they are implemented.
Proper version control is a must for data science. Git is the most popular version control software around, it’s extremely powerful but the learning curve is steep. This free book is the best I’ve found at explaining the concepts of git and guiding the reader through tutorials from the basics to advanced usage.