How did I get there?
Why this post?
I have been working in IT for 15 years now, including more than 10 years in the same company in which I have been a JAVA Technical Leader and in the last years I managed various IT teams (starting with support and assistance until finally leading technical teams in charge of the infrastructure of our company’s European e-commerce websites).
During these years I have accompanied the various technological changes: arrival of web services and SOA architecture, SPA or mobile applications up to micro-services architectures, the rise of the cloud and containerization, InfraAsCode and so on).
So for those who do not know me very well and do not know that AI has always been a subject of interest on which I invest personal time, necessarily the
question that burns their lips is: “but…why?”
It’s even more true in France where reactions are sometimes something like: ‘why the hell, when you “have a good job position” would you leave it to do ….. wait, could you remind me what are you actually doing already?’
Well, first, let’s say that this new career is not less valuable, it’s just something new, something else and I am proud enough to have had the courage to leave my comfort zone to embark on this new adventure.
I thank the company in which I work and who gave me the opportunity to realize this dream, which would probably not have been possible elsewhere in France.
No other french company would have hired me to do something else than what I already did in the past.
This new adventure began 3 months ago and since then I have already been approached by several people from my more or less close circle who told me: “what luck you have, DataScience is a subject I am interested in but I do not know where to start….”
That’s right. The same thing happened to me.
So I shared with them the following information to help them enter this wonderful world of DataScience.
This gave me the idea to write down this path so that it will be easier to share for the next person that will ask me this question.
How did it happen?
Before moving forward, let me tell you how it all started and what a strange coincidence it was. I am passionate about running. It is my favourite sport and I run (actually I used to run) several times a week, sometimes up to 55 miles (90km). Working at DECATHLON, the world leader in sports, I was lucky enough to be able to exercise this passion at lunchtime because in the workplace there is everything you need to practice and shower.
Unfortunately, what was supposed to happen happened and I hurt myself. And that was the big void: “what am I going to do with all this this free time at lunchtime?” I could have binge-watched all my series late on Netflix but I decided to dig deeper into this somewhat buzzy subject: Machine Learning.
Note: if I had not hurt myself maybe I would still be running at noon and I would never have really taken the plunge. Perhaps one of my other me in an alternative reality is actually still running…Who knows…
Why am I telling all this? Because, it must be said, learning DataScience is a process that might be long depending on your knowledge and skills. You will need to be brave and patient. But if I have succeeded, there is no reason why you should not.
Long is the path, hard is the way…
Starting point
I found this article from elitedatascience blog: “How to Become a Data Scientist”. It helped me a lot to structure my learning. I will not copy/paste what they say so take time to read it and come back here later.
DataScience is at the crossroads between business, statistics/maths and IT (programming in particular). So for me it was like:
- IT: let’s say it’s ok
- Maths:…ouch! It’s been a while…
- Stats: roughly…
OK, 1 over 3, hummm, you sure you wanna do that?
1st thing: learn Python!
I think there is no debate anymore between R and Python12. If you know R it is great and it will probably help you but if you don’t, wait a while before investing time on it. Indeed, Python offers many libraries to facilitate the development of DataScience projects. Sorry R defenders, no offense.
I used to be a developer in my previous work experiences and so have knowledge about OOP (Oriented Object Programmation). Sure knowing a language helped me to learn Python but you do not need to learn OOP to do Machine Learning. For the basics just grab some basic tutorials on the Internet, it will be sufficient to learn the syntax.
Math refresher!
I read in several blogs that you do not need a PhD in math to do DataScience, I was a little skeptical but I now want to believe them.
I had to revise my math lessons in linear algebra (Khan Academy is fine for that, otherwise youtube is your friend).
Let’s just say that today it’s not a barrier to understand algorithms or to discuss with my colleagues who have done DataScience oriented studies
and who are probably better than me at math ;-).
I would say that what is important to understand is the gradient descent and there are a multitude of good explanations on the net345. More globally, the most important thing is to understand what is happening mathematically in order to understand why the algorithm works or why it simply does not work
MOOC - Videos
There are many online courses available (Coursera, DataCamp, Udacity, Openclassrooms), all with different content and prices.
Nowadays Internet is a fantastic tool that can offer videos of people throwing bricks in running washing machines to see what it does as well as people who will take their personal time to explain advanced mathematical concepts to you.
While browsing the elitedatascience blog which I have already mentioned, I found this link to a youtube playlist containing all the videos from the
machine learning course given by Andrew Ng (a rock star in the
DataScience field) and I viewed them all. All of them. And some of them even several times.
Of course there are no exercises but I also found this blog of a person who had
fun redoing the exercises in Python. And that, it is really cool!
3Blue1Brown provides also a wonderful series of videos about Linear Algebra.
Pay attention to those videos from Siraj Raval - (Math of Intelligence).
Learn Python for DataScience
To learn python packages oftenly used in DataScience, I would highly recommend the Kaggle tutorials:
You will just need to create an account on Kaggle (but it’s free).
Practice with caution!
You will read in many blogs that it is certainly important to read articles, to do research, to be informed. That is true.
But it is also said that you will only really progress by practicing. And that’s also true.
There are a lof of algorithms6 and a lof of interesting domains so do not try to master them all, pick one, read informations, try, experiment then go to another one or go deeper if you want to focus on this particular one.
My heart will go on!
My very first project was to follow a very popular and known tutorial: the Titanic challenge. It is a tutorial proposed by Kaggle (a platform hosting Datascience competitions but also offering several very interesting tutorials, it is a privileged meeting place for DataScientists from all over the world).
The goal is quite simple: do you know the Titanic? You know it went pretty badly wrong, right? Well in this exercise you are given a list of passengers and you have to build and train an algorithm that will predict whether the person survived or not. It’s not a very joyful context but it is an excellent exercise to start with: not too simple or too complex. And of course if you like you can even submit your results on the platform and compare yourself with other DataScientists!
For my first project I was very inspired by these two links here and there If you’re interested, here is my project on github.
Is it a dog or a cat?
Then I wanted to discover the neural networks and for that I launched myself into a once again very classic challenge: build and train a model that will classify images and recognize if a new image proposed is a picture of a cat or a dog.
There is also a multitude of tutorials on the Internet. I was personally inspired by an excellent blog dedicated to computer vision, pyimagesearch by Adrian Rosebrock and especially this post “How to get started with Keras, Deep Learning, and Python”.
Note that whereas this topic is now very common, it was still something very challenging few years ago. As Jeremy Howard (founder of fastai) explained in its course, “in 2012, top researchers from Oxford University reached accuracy of 59% with a very specific model. In 2018, with basically about three lines of code, we got 94%”.
Some other interesting resources
There are plenty of blogs, websites talking about Machine Learning. I have not the pretention to list them all, here are the ones I often use:
- Kaggle of course
- towardsdatascience
- machinelearningmastery: with a lot of posts titled “gentle introduction to
". - elitedatascience
- pyimagesearch: computer vision with opencv and neural networks.
- fastai: more advanced but interesting videos vy Jeremy Howard.
To name but a few.
Of course Google, StackOverflow or medium are still your best friends forever.
If you have read this until the end, thank you. Hope this helps!
Author: nidragedd
-
Python vs. R - Choosing the Best Programming Language for Data Science ↩
-
Gradient Descent for Machine Learning on machinelearningmastery ↩
-
Gradient Descent or How Neural Network is learning? by 3Blue1Brown ↩
-
Gradient Descent derivation by Chris McCormick ↩
-
Modern Machine Learning Algorithms: Strengths and Weaknesses on elitedatascience ↩
Feel free to leave a (nice) comment if you want
Required fields are marked *