Advice for a Future Data Scientist

İlkem Erol

Ağustos 9, 2021

Blog

Share:

What is this about?

There are lots of great advice for data scientists or people working with data in general. But I noticed some of the ones that are most important to me were missing. So why not write a list of my own? This is for people who are planning to be a data scientist in the future. I think most of those might apply to programmers and big data folks as well.

Today is a good day to be a data scientist (or ML engineer or researcher). And the resources at your disposal are enormous (all those courses, libraries, books, videos, tutorials, …)

But the landscape is changing so rapidly. Bad news is that the cool state of the art library/model you are using today might be outdated tomorrow. The good news is you won’t ever get bored around here. Another thing on the bright side is that knowing your fundamentals and having the right mindset will go a long long way.

And this article is mostly about the right mindset for a data scientist.

Here is the list of advices before we start:

  1. Be curious
  2. Don’t forget the “science” of data science
  3. Be aware of how other people perceive you
  4. Know what you are working with very well
  5. Be aware of the hypes
  6. Tell a story
  7. Gain a T-shaped proficiency
  8. Be pareto-efficient
  9. Learn to learn
  10. Have a growth mindset
  11. Oppose your own ideas
  12. Be mindful of the changing world

For those who want the short version might read the bold and italic parts.But where’s the fun in that?

First technical stuff

Let’s get these out of the way. Chances are there are hundreds of good articles that can give you good advice on technical aspects of data science and machine learning(probably better than what might tell you now). I will attach some links at the end of the article. And you can surely google more.

But here is a quick summary anyway:

  • Which technologies should I learn?
    – The python ecosystem is currently the best IMO. Know how to use things like scikit-learn, pandas, numpyand a several visualisation libraries like matplotlib, plotly, seaborn will help.
    – Know at least one framework that you can build deep learning models. Tensorflow/keras, pytorch pick one.
  • Should I know statistics?
    – First of all, No, Machine Learning is not just glorified Statistics.
    – Well, yes because you should be using statistical concepts on daily basis,
    – No because you don’t have to know everything that a statistician knows,
    – Yes again because it makes your infrastructure a lot stronger.
    – What I can really recommend is to learn Statistical Learning
    – Make sure you know your general statistics terminology, probability, distributionsandhypothesis testing.
  • How much maths should I have to know?
    – There are some concepts that every data scientist or ML engineer should be familiar with like calculus, linear algebra, trigonometry, series, sums, …
    – It really depends on what you gonna do. There are those who need to understand and use some methods and there are those who needs to research and develop new methods. As you incline towards the research side more math will always be helpful.
  • Is programming important?
    – Yes, very much so. But again the importance will vary on what you do and whether or not you deploy solutions for production.
    – Either way learn basics of your language, object oriented programming, how to use version control, how to test your code and build a simple web server.

How should I start?

¯\_(ツ)_/¯

Again it really depends on where you are at and where you want to be. There is no one single right path to do it. I think you are good if you are pushing yourself forward.

But here are some tips anyway:

  • Learn programming
  • Strengthen your core
    It can be courses at school, online courses, books or even youtube. There are plenty of good resources. Spend time to learn the basics before jumping into state of the art and shiny stuff. For online courses I can recommend: fast.aiCourseraUdacity nano degrees.
  • Get your hands dirty
    Find and follow some tutorials. Examine some code. Try a kaggle competition. Try a model based on your own idea. The process of learning will be much more valuable than the outcome trust me.

Actual Advice for a Future Data Scientist

Now finally some non-technical but useful advice. I think these will help you regardless of the state of the art and popular technologies.

1. Be curious

If you do not regard yourself as a curious person maybe this is not a job suited for you. The world of data is surely full of wonders but not most of the days.

I think data mining is a useful analogy here. You have to mine through all the dirt and the dust to get to the ore. And as a wise man once said:

All that is gold does not glitter,

It is also like taking an adventure in the sea. But do you know what the sea mostly is? It’s water, lots of water.

So what makes you endure through the sea of boredom, is your curiosity. So be curious!

  • How does the world work?
  • How do these new technologies work?
  • What do the data you have tell you?
  • Where is the world heading?

If you let the curiosity be your compass you will eventually meet the promised wonders and treasures. So as the wise man continued:

Not all those who wander are lost;

2. Don’t forget the “science” of data science

There are lots of shiny , new and wonderful technologies out there and new ones are coming every day. All those attention models, GANs, feature pyramids, memory networks, proximal policy optimisations, … They really can make someone want to use them to deal with the problem at hand.

But the science part of the data science sometimes is the underappreciated little brother of the family. Chances are, adapting the principles of the scientific method will help you more than you know.

Threat your work as a scientific project as best as you can and have to things in mind: hypothesis testing and scientific scepticism. Sometimes most dangerous enemy for your models success is the biases on your own mind. And they can even enter your data or your model.

Perhaps it is a good time to tell the story of John Snow. I’m not talking about the one who knows nothing, no. I am talking about the real John Snow who is arguably the first data scientist in the world.

It was 1854. There was no sewer system. Cholera outbreak was on a killing rampage in London. This was the era before germ theory was a thing and people thought sickness was caused by bad air. So they tried to use nice smells to fight the disease(which obviously didn’t work well). And people were dying. This man, John Snow, discarded the theories of his time and listened to what his data had to say. He marked the cholera cases on the map.

He noticed the cases were not random at all. He had a hypothesis that this was related to water. And once the wells that were closer to cholera cases were closed the outbreak was contained.

Moral of the story is, if you are sceptic about what people have to say, and you are willing to look at the data, and you have an hypothesis you can test; you might possibly save a lot of lives.

3. Be aware of how other people perceive you

Knowing how other people might think of you might be a powerful thing. You might decide how to approach them or what to expect from them.

For customers, you are a spoiled kid with a laser gun. They think most of the things you can do are because you have a powerful laser gun. And if they happened to have one, they could easily do those things. But they are too busy to try anyway. You will hear things like “It’s just machine learning.” It’s just statistics.” “I recently read an article they can even do ….”

For co-workers (who are not data scientists), you are usually a wizard. You can do thing that seem strange to them. And just because you can do those things, they might expect some other miracles from you. So for most of the people what you do and how you do it is a mystery.

For human resources, you are unicorn. There are some rumors that unicorns exist. They are sure trying hard to find one but sometimes they are disappointed when they see you. Because you are not actually a magical beast.

Of course these stereotypes are not always true or they might be mixed up a little bit. But the bottom line is, you should communicate with “the idea of how you are perceived” in your mind. You might want to avoid feeding some misconceptions and might possibly break some of those.

4. Know what you are working with very well

Know what you are working with. Know how the technologies you are using work. For example take time to understand how tensorflow might be doing certain stuff or how is it that CNN’s can perform so much better on image data really. Take time to understand the domain you are working with.

Be the kid that opens up his/her toys. And sometimes even breaks them.

It is necessary to have a conceptual understanding of what you are working with. But If you happen to have a mathematical understanding of things, it might be like a super-power.

5. Be aware of the hypes

I am not saying that all these deep learning stuff is a hype. But when you look at the history AI and neural network research there are certainly ups and downs. There are unrealistic expectations that are impossible to meet. As fantastic as they are they might not be the solution for all of your problems.

If all you have is hammer, everything looks like a nail.
– Bernard Baruch

So don’t let the models you have to be a hammer. Sometimes even logistic regression might be an excellent solution.

6. Tell a story

We are proud of our technical achievements for sure. There are a few things that can beat solving a hard problem. But solving real life problems unfortunately have non-technical parts.

When working with data, we have error functions to measure how well we are doing. And we are trying to optimise them. But one thing the error function might not be measuring is the usefulness of your model or how successful it is seen. Predicting the results with good accuracy might not just cut it.

Pouring in accuracy, error metrics and confusion matrices will convince some people but not all of them you need to convince. You are simply not speaking their language.

Stories are everybodies universal languages. Put your results into frames. Provide scenarios, tell stories so that they can understand what you do. Use great visualisations for bonus points. Your job is not just with data. It is inevitably with other people too.

7. Gain a T-shaped proficiency

Know a wide variety of things because you might not know what kind of a problem you might encounter. The solution is most of the time a combination of different things

Also be proficient at at least one of the areas. It gives you the ability to solve some problems most of the others can’t. This can be powerful for your career. And it also has an overlooked side effect. Knowing something well proves that you have the capacity to do so. It proves you can do so again if necessary.

8. Be pareto-efficient

Building a model is like shopping online. You can always find a better one by spending just a little more money. And a little more, and a little more,… and you are way ahead of your budget. What I mean is, you can always put more time trying to make your model better.

Pareto principle says for most of the things you do 80% of the time you spend provides 20% of the results and that 20% important work you do provides 80% of the results you have.

Even though most of the times 20% results part cannot be avoided, being mindful on which part you are working with always helps. Always think about how you can get the better outcome with the time you spend.

9. Learn to learn

Give me six hours to chop down a tree and I will spend the first four sharpening the axe.
– Abraham Lincoln

Every minute you spend specifically to solve a problem, will be used for that problem. Likewise the time you spend on learning the details of the latest technologies will earn you that mostly. And trying to make sense of them without the proper background is just like trying to cut a tree with a dull axe.

The time you spend to improve your basic and theoretical knowledge will not directly give you the solution for a particular problem. But stronger your foundation is, the faster you will be able to learn those new things. It is just like sharpening the axe.

If you can learn how to learn however, it might improve the speed that you acquire any knowledge and skill. It is like making your whetstone better. It decreases the time you sharpen your every axe. So what should you do? Expose yourself to variety of topics (not only technical stuff), read a lot and be open minded to new ideas.

10. Have a growth mindset

In this field you will be working with (at least virtually) some of the most brilliant people of your time. There will be some geniuses among them. Things you sweat too hard might be a piece of cake for them. And you will at some point, inevitably compare yourself with them. Sometimes you will feel stupid. But you know what? Most of us does. This even has a name: imposter syndrome.

Good news is, you don’t have to be the best. You just need to be good. You need to be useful and solve problems. Everyone tries to do so in their own world.

So you will face problems and feel stupid as mentioned before. But one distinguishing quality that is predictor of academic and work success is having a growth mindset.

When failed, some people think they have reached the limits of their mental capacity and try other things. They were not just born with it or built for it. Other people, people who have the growth mindset, will know it’s not about just the capacity. They will know their mind is flexible. Given more time and systematic effort they will be able to learn what previously seemed to be too confusing. You will most probably have this feeling when reading academic papers. First you will fail to understand. But bottom line is if you put more time into it it will get easier you only need to know your mind is flexible and can grow.

11. Oppose your own ideas

This is about critical thinking and scientific scepticism. Before your ideas, and models are challenged with the data or the real word, it is a lot easier and faster to challenge them yourself.

Basically I am giving the same advice that Lord Petyr Baelish gave to Sansa:

Don’t fight in the North or the South. Fight every battle everywhere, always in your mind. Everyone is your enemy, everyone is your friend. Every possible series of events is happening all at once. Live that way and nothing will surprise you. Everything that happens will be something that you’ve seen before.

12. Be mindful of the changing world

With the invention of steam engine and industrial revolution, the muscle power of humans were no longer a defining quality. Machines were much more stronger and they never got tired. The only valuable quality of our kind was our mind. Our ability to learn, think calculate and solve problems. And now more than ever, this aspect of our existence is being challenged.

More interestingly, as the machine learning people, we have a part in this. I sometimes think it as the human labour that was building the steam engine. They were building something that would make them less useful but they were also building something that all humanity has gained something from.

Having an interest for cognitive science and having learned a bit about psychology and neuroscience, I think we are not that near to machines being able to do everything humans can. They are just good at things we find hard, but they have hard time doing things we find easy. (Think about humans being better at computer games but AI beating most of the human made games.)

So have a good understanding on what computers are good at and what humans are good at. Do not put too much time on things computers might do easily (like being an expert on multiplying numbers). Do not be the labour force of the new era.

Some useful links about more technical aspects to get started in data science. You can find much more by searching.


İlginizi çekebilir,


TR