135 West 41 Street. 5th Floor. New York, NY 10036. USA

©2020 by Byteflow Dynamics

Advice for the modern data scientist going into 2020!

Updated: Dec 21, 2019




As an AI consultant, I have been working on dozens of data science-related projects over the past few years. I have made many mistakes and learned from them to focus on value-based investigative data science and away from data engineering.


As a young data scientist, I was very excited to use the new toolbox and get on with the building and engineering. These projects looked impressive and had some cool shiny features but ultimately failed to be incorporated into the business processes of the client. I felt that my role was to build new things as fast as possible. I was wrong - even if that was or seemed as the reason I was hired for it lies in me, the expert, to push projects into more value-based directions.


My updated value-added data science process


1. Investigate

2. Find a signal

3. Get stakeholders involved

4. Preach


This is how I see the role of a data scientist. Data engineering is something else, I will go into that at some other time. The data scientist by definition has to investigate with data, raise business hypothesis, set benchmarks. Using the full stack of modern tools and methodologies she will get to the signal, the business value, faster than with more traditional tools. This process you more or less can do on your own and once you see something, tested some of the hypothesis raised you will need to incorporate some key stakeholders in the process. It may be other data scientists and engineers who can help you with the process; you also want to bring into the loop someone from the business side. You only want to do this though once you have seen something, there is a risk of pulling people in too soon and nothing comes of it. Next time you need them they may not come around. Now you have a team and everyone has a role and stake in the project. You cannot do this alone, you should not do this alone. As a team you will be working towards strengthening the signal, clarifying the business value, and putting together a proposal for the implementation. Here comes the preaching you and your team have worked hard and would like to see the fruits of your labor be implemented. You will go around other departments, investors, higher-ups to show off your work. It is not enough to do this once or twice. Everyone else is busy with their own projects and are reviewing a handful of projects like yours daily. It is your responsibility as a team leader to continue banging on the doors, modifying the pitch, looking at it from different angles until it finally gains some traction. This may be one of the most important steps. No one else will be able to present and sell your findings. It is your vision, your work, go sell it.


Getting started or improving your data science skills - Part I


Ok, I digress, now that you know what it means to be a data scientist onto the becoming of one. If you are just starting out, or wondering whether this is the right career path for you, what skills do you need to become one?


I will cover just the minimal tech skill you will need to get your foot in the door. In order to expedite the signal finding part you will need to master one of the two programming languages, R or Python. Choose either one, if you are not a software engineer and come from an Academic background R is easier to get started. Here is a book I use for teaching data science classes to get you started: https://r4ds.had.co.nz/, it's written by Hadley Wickha- Chief Scientist of RStudio, the IDE we use. If you master this book, you are about 70-80% there. You basically want to be a ninja when it comes to data wrangling, data visualization, and dynamic reporting with R Markdown. Do not move into machine learning until you have mastered all of this. This is one of the biggest mistakes data scientists is training make. It's a mistake that new data scientists also make.


To gain confidence, to make yourself more marketable and also to make the learning more interesting you will want to get your hands on some real data sets. There are plenty of readily available data sets out there. Find something that is interesting enough to you so you can spend a few weeks working with it - raising a series of hypotheses and testing them. Based on the findings write a few long-form Medium posts and find a community to give presentations. The presentations, writings, and publications should be around problem-solving and the process towards signal generation.


In follow up posts I will delve more into business advisory and machine learning tips. Feel free to connect on LinkedIn https://www.linkedin.com/in/andi-shehu/, send an email, ashehu@byteflows.com, if you are building an AI product and need an advisor to help you move in the right direction.