If you ask any experienced data scientist and machine learning engineer, what costs the most amount of time in their job? I guess many of them will say: data preprocessing — a step that cleans up the data and prepares it for sequential data analysis. The reason is simple — garbage in, garbage out. That is if you don’t prepare the data correctly, your “insights” of the data can hardly be meaningful.
Although the data preprocessing step can be rather tedious, Pandas provides all essential functions that allow us to complete our data clean-up job relatively easily. However, because of its versatility, not every user knows all the functionalities that the pandas library has to offer. In this article, I’d like to share 3 lesser-known, yet super useful, functions that you can try in your data science projects.