Recently I completed the Leetcode 30 day Pandas Challenge. Pandas is a popular python library for data analysis and Leetcode has made a set of problems to learn it. Here I share my thoughts on the problems and whether you should try it too.
Some background. Before this challenge I felt I had a good understanding of Pandas. I understood what dataframes and Series are. How pandas builds upon numpy datatypes. I also used pandas professionally in the past to build ETL pipelines.
After the challenge I felt like I had a deeper understanding of pandas. I am much more comfortable renaming columns in a dataframe, changing dataframe datatypes, merging frames sql-style and more comfortable solving tasks I may run into daily.
The set consists of 30 problems to solve. Behind the scenes the problems are the same as other SQL problems on the website, except you have to solve them with python and pandas. This leads to many sql comments on the problems which can be annoying for someone trying to learn only pandas.
(SQL comments mixed with Pandas comments)
Also because the problem statements are for sql you end up getting very comfortable with the pandas apis. For example in many problems you have to construct the final dataframe with specific column names
If we were a python data analyst we wouldn’t be naming columns like this very often.
One problem is called “Customers who never order” in the Data Filtering section. You first get a schema describing the data
Then you get a task, here it asks to find all customers who never order anything
Then you get some example results when they run your solution on different data
Then you are to write your solution in the editor
The editor is in pandas mode. It is the same throughout the website for java questions, javascript question, ruby, etc. You write your solution and can try it on the tests by clicking Run. When you are happy with the solution you can submit and Leetcode will run your solution on a larger test suite serverside.
My solution to this problem is
First I merge the customer
and orders
table on the ids
Then I get all ids for the customers that made no orders with .isnull()
Then we use .query()
to get all the customers in the original dataframe that
we know have no orders (are in the no_order_ids series)
Finally we narrow down the original names
dataframe to get the customer’s name
and rename the column to fit the problem requirements.
By the way, there is more than one way to solve each problem. You are allowed to use the full expressivity of python and the pandas apis.
The 30 pandas questions are split into 6 section.
In the following sections we list the pandas apis we used to solve the problems.
Overall I would recommend doing this challenge. The questions seem realistic to the tasks a data analyst may encounter in everyday work. The questions force you to use a large amount of the pandas api. The problems are rated as “beginner” pandas difficulty. You could complement this challenge with some Kaggle competitions so you could apply your pandas knowledge.
If you need help solving your business problems with software read how to hire me.