Elements of a Good Data Science Project

Elements of a Good Data Science Project

Β·

2 min read

Watched this video some time ago and I can say it was really informative because he was talking from the perspective of an employer or hiring manager. He talked about what a complete data science project should/should not have or things you should and shouldn't do when carrying out a project and these are what he listed:

  1. Don't work with ready made datasets like datasets found from "kaggle.com". It's better to find,clean,wrangle data you pulled by yourself through webscraping or working with apis.

    Personally I've done some webscraping using beautifulsoup4 and selenium. It's better because you can get real time data instead of data that has already been cleaned by someone else and shows your employer that you can work with real world data.

  2. He also says you should be able to work with databases in the cloud such as GCP(Google Cloud Platform, AWS(Amazon Web Services, Microsoft Azure etc. Carrying out projects on the cloud gives you an edge because it shows employers that you have knowledge of the cloud and working with databases on the cloud which is an important requirement when working with "big data" like you would in the real world.

  3. He talked about building models. Employers don't want to know if you're model is 99.99% accurate, they want to know your reasons behind using such model. Questions such as; Why did you pick your model?, How did you clean your data?, What are the assumptions made from your model?, How did you test your model?, Can you explain the math behind your model?, etc will come up and it'll be good if you can answer such questions.

  4. Deploying your Project. He talks about deploying your project and this is very important because most people aren't going to take you serious with just a notebook to show for your workπŸ˜…. Learn how to deploy your project so it can used/viewed by others. You can deploy your project as a Web app using technologies like streamlit,django,flask,FastApi and you can host them on sites like heroku,streamlit etc. This will go a long way and show your employers that you can create models and also deploy models.

One more thing to note; The most important thing is that whatever project you carry out, it should always be for the purpose of gaining insight or meaningful information because that's what Data Science is about.

Β