3 Apr 2018
5 min read
I'm often asked how one should get started in Data Science.
A degree? An internship? An entry level job? Writing code? Kaggle? Prayer?
While all of these are certainly viable strategies, if you're really serious about embarking on what is (I think, inarguably) the most exciting, interesting - and flat-out fun (not to mention ridiculously remunerative) professions there is...
And that foundation, IMHO, comes from deeply understanding the conceptual underpinnings of what Data Scientists actually do.
In short, it's about (truly!) understanding and appreciating the math.
Now, for those of you who might have a mild case of arithmophobia, don't despair. It's not your fault - and it's not as hopeless as you might think. Most mathematical misanthropes are actually just victims of bad teaching - nothing more. There is a cure. Read on.
And for those of you who insist that Data Science is more about writing code than mathematics... Sorry, but you're just wrong. You're thinking of Data Engineering, or Software Engineering, or simply engineering excuses for why you don't want to take the time to learn what analytics is really all about.
Data Science is not about writing code. It's about the application of concepts.
...and as some of you have heard me say before: With a few hours, a bag of treats, and a shock collar, I can teach a cat to code a DL algorithm in TensorFlow. Writing code is trivial.
If you think I'm kidding, watch the video below...
As relatively simple as it is to bang out code (or, truth be told, to fork GitHub), understanding the conceptual underpinnings of what we do - and knowing how to apply those insights to real-world challenges... Well, that's the stuff that legends are built on.
In that spirit, I am pasting below an email response I recently sent to an aspiring Data Scientist who was interested in starting out on the right foot.
I hope you find it helpful.
Given your (laudable!) interest in wanting to first focus on shoring up your mathematical abilities - and your preference for video learning - I think you will enjoy this brief course from Duke University: Data Science Math Skills. It is free to audit (and only a minimal cost if you opt for the certificate).
If you find yourself wanting a refresher on some of the basic concepts, I highly recommend visiting Khan Academy. Their videos are absolutely excellent - and if you wish, they will take you (literally) from 1+1 to calculus and beyond.
If I could ask you one favor in this regard, it would be to please not hesitate to take several steps back anytime you are feeling confused. I have long found that the greatest impediment to success in mathematics comes from an unwarranted embarrassment on the part of learners to admit they are not understanding something (and, of course, the remarkably poor quality of mathematical teaching generally).
In my experience, students often become too self-conscious to admit they aren't fully understanding a concept, plow ahead, and then rapidly find themselves adrift. If you find you don't understand a concept, please go back - and back, and back, and back - until you feel comfortable.
Also keep in mind that while novels are meant to be read quickly and unconsciously, mathematics is intended to be chewed and savored.
Continually remind yourself that the unprecedented power of mathematics comes from just three things: Abstraction, Precision, and Compactness.
A single mathematical statement is intended to communicate the equivalent of pages and pages of prose - so of course, it should take much longer to read and comprehend.
On a related note: Worry far less about the actual calculations and more about understanding what the concepts are about (you would be shocked to find out how few statisticians actually understand variance; let alone more complicated concepts, like kurtosis or heteroscedasticity). If it gives you solace, you should also know that most mathematicians are terrible at arithmetic (we make stupid little addition mistakes all the time!) - but fortunately for us, that is what machines excel at - so no harm done.
To that point; be kind with yourself on the little things - but diligent about learning the fundamental concepts.
If hitting "rewind" isn't working for you, I suggest you try an alternative source (perhaps something like Better Explained) to get a more grounded conceptual understanding.
And as you tackle each concept, I nearly insist you avail yourself of what has come to be known as the Feynman Method. If you find you can teach what you've learned to your cat, you'll be several steps ahead of the competition - and will have a grounding that will last your entire life. I got through grad school teaching complex concepts to my children - and still teach every new concept I wrangle with to my grandchildren.
If, after all that, you are still having challenges, come to me. Mind you, I'm not trying to limit our interactions; I've simply found that the best (and most gratifying) way to truly learn mathematical concepts is to wrestle them to the ground yourself, rather than just giving up right away and asking for help.
Once you are done with that course from Duke, I would recommend moving on to an absolutely wonderful course offered as a self-paced MOOC out of Stanford University, titled Statistical Learning. It is taught by the authors of a book by the same title; a terrific read that has already become one of the go-to classics for starting in Data Science: An Introduction to Statistical Learning (affectionately referred to by the "cool kids" in DatSci as ISL).
ISL, along with the accompanying R code, data sets, etc., is also available as a PDF for free (!) here.
Once you are done with ISL, you will be ready to move on to the next volume in the series, ESL (The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition) - but I highly recommend you not rush through either one.
Again, this is material that should be chewed and savored. Get a firm grounding in the fundamentals, and you'll find that adding on additional techniques in the future will become nearly trivial - and a whole lot more fun.
There's a whole lot more math to come (Just wait till we get to proofs! That's where it starts getting really interesting!). But this should get you started for now.
Thanks JT for this insight and for letting us share it in our Learn section, I'm sure people will get a lot of value from this!
Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ
108 E 16th Street, New York, NY 10003
Join over 111,000 others and get access to exclusive content, job opportunities and more!