15 Apr 2019
4 min read
Statistics is difficult. Of course it is, as mostly that’s the actual science part in data science. But it doesn’t mean that you couldn’t learn it by yourself if you are smart and determined enough. In this article, I am going to list 6 books that I recommend to start with to learn statistics. The first three are lighter reads. These books are really good for setting your mind to think more numerical, mathematical and statistical. They also present why statistics is exciting (it is!) really well. The second three books are more scientific — with formulas and Python or R codes. Don’t get intimidated though! Mathematics is like LEGO: if you build the small pieces up right, you won’t have trouble with the more complex parts either! Let’s see the list!
When I first saw the title, I loved it already! This is a very well written book, it comes with many stories — and everything in it is based on real experiments and real scientific research. David McRaney introduces one sad but true fact of life: that our brain constantly tricks us and we are not even smart enough to realize it. For an aspiring data scientist this book is essential, because it lists many common statistical bias types. It points out classic mistakes like the self-serving bias, the availability heuristic, the confirmation bias, and it also shows why people tend to be tricked by fake news, by scams or why people do not help when seeing someone having a heart attack on a busy street. Being aware of these biases should be basic, but I see even the practicing data professionals are falling for them from time to time… (I wrote a detailed article about Statistical Bias Types. Find it here.)
The previous book was about why we are not so smart. But this one is about how to be smart! Think Like a Freak shows us how critical and unconventional thinking can lead to huge success… and, hey, that’s something that as a data scientist, you should practice day by day. The book lists a bunch of case studies from everyday life, goes into details and analyzes why a solution for a problem is good or bad. Reading it will definitely boost your analytical thinking.
If you have hated mathematics in the mid/high school, it was for one reason: you had a bad teacher. A good teacher turns mathematical equations into mystical puzzles, probability theory into detective stories, and linear algebra into the ultimate solution for all the big question in life. Luckily, I had really good math teachers, so I was always generally excited by mathematics and statistics. Looking back, this really affected my life. If you didn’t have a good math teacher, John Allen Paulos is here to make up the loss for you: he’s the awesome teacher you wish you’d had. Innumeracy is focusing mostly on one specific segment of statistics: probability theory and calculations. It explains the math behind it, shows the formulas and puts everything into a very logical context. And it’s doing it by showing the real life relations of these calculations, so you can immediately understand the advantage of being more math-minded.
I have already highlighted this book in my previous article, but I can’t stand to add it to this list either. It’s the perfect transition between the previous light-read statistics books and the next two more scientific ones. Reading it, you can easily understand basic concepts like mean, median, mode, standard deviation, variance, standard error or the more advanced things like the central limit theorem, normal distribution, correlation analysis or regression analysis. Almost needless to say that all of these are packed into metaphors for the ease of understanding.
This is a relatively new book and it contains everything that a Junior Data Scientist has to know about the practical part of statistics. In my opinion, the biggest advantage of the book is the structure. It really makes it clear how things are built on top of each other. But it also goes into details with the most common prediction and classification models — and it tells a bit about Machine Learning and Unsupervised Learning too. The book comes with R code examples, but if you don’t know R, that’s not a problem, you can just simply skip those parts.
Topic-wise, Think Stats is really similar to Practical Statistics for Data Scientists. I wanted to have it on the list though because even if the topic is the same, different writers usually are approaching things differently. On a topic this complex as data science, I think it’s worth to see different angles and have things explained by two different data professionals. Plus, this is a book from 2011. It’s good to see how much the interpretation of (even these standard) things have changed in as short as 6 years. Oh, and I almost forgot to mention that Think Stats is available for free in PDF format, here: http://greenteapress.com/thinkstats/
By reading these 6 books you can get a solid understanding of Statistics for Data Science! What’s the next step to become a data scientist? You can read even more books: here’s my 7 favorite data books. Or you can start to learn coding in SQL or in Python.
Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ
108 E 16th Street, New York, NY 10003
Join over 111,000 others and get access to exclusive content, job opportunities and more!