“Over half of the time, analysts are trying to import/cleaning the data.”
— By numerous John/Jane Does of data analysts
Data these days can be flown in from various sources: web, database, local files, user input, etc. Analysts now often have to work with various format of data input, in order to make them compatible with each other for analysis. Though sometimes considered to be a data engineer’s work, data preparation is still an essential skills for all data analysts, especially those who work in small to medium size firms (as I am doing now).
I am going to introduce data reading/manipulation with pandas library in Python 3. I have recently worked extensively with pandas in Python 3 and started realized the powerful component in the library. In this post, I will the one I used most frequently, groupby() with pandas.