How to drop one or multiple columns in Pandas Dataframe

Dropping columns from a DataFrame in Pandas is a common task in data manipulation and analysis. Whether you need to drop a single column or multiple columns, Pandas provides simple and intuitive methods to achieve this. In this blog post, we’ll walk through how to drop columns from a DataFrame using various techniques.

Introduction to Pandas

Pandas is a powerful and popular Python library used for data manipulation and analysis. It provides data structures like Series and DataFrame, which make it easy to handle structured data. The DataFrame is essentially a table with rows and columns, similar to an Excel spreadsheet.

Setting Up

Before we dive into dropping columns, let’s set up our environment and create a sample DataFrame. If you haven’t installed Pandas yet, you can do so using pip:

Bash

pip install pandas

Now, let’s import Pandas and create a sample DataFrame:

Python

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
    'Salary': [70000, 80000, 120000, 90000]
}

df = pd.DataFrame(data)
print(df)

The above code will produce the following DataFrame:

Markdown

      Name  Age         City  Salary
0    Alice   25     New York   70000
1      Bob   30  Los Angeles   80000
2  Charlie   35      Chicago  120000
3    David   40      Houston   90000

Dropping a Single Column

To drop a single column, we can use the drop method and specify the column name along with the axis. The axis=1 parameter indicates that we are dropping columns (for rows, we would use axis=0).

Python

# Dropping a single column
df_dropped_single = df.drop('City', axis=1)
print(df_dropped_single)

This will result in the following DataFrame:

Markdown

      Name  Age  Salary
0    Alice   25   70000
1      Bob   30   80000
2  Charlie   35  120000
3    David   40   90000

Dropping Multiple Columns

To drop multiple columns, we can pass a list of column names to the drop method.

Python

# Dropping multiple columns
columns_to_drop = ['Age', 'Salary']
df_dropped_multiple = df.drop(columns_to_drop, axis=1)
print(df_dropped_multiple)

This will produce the following DataFrame:

Markdown

      Name         City
0    Alice     New York
1      Bob  Los Angeles
2  Charlie      Chicago
3    David      Houston

Dropping Columns In-Place

The examples above create a new DataFrame without the specified columns. If you want to modify the original DataFrame directly, you can use the inplace=True parameter.

Python

# Dropping columns in-place
df.drop('City', axis=1, inplace=True)
print(df)

This will modify df itself and remove the ‘City’ column.

Markdown

      Name  Age  Salary
0    Alice   25   70000
1      Bob   30   80000
2  Charlie   35  120000
3    David   40   90000

Dropping Columns by Index

Sometimes, you might want to drop columns by their index rather than by name. You can achieve this by using the df.columns attribute to get the column names and then drop by index.

Python

# Dropping columns by index
df_dropped_by_index = df.drop(df.columns[1], axis=1)
print(df_dropped_by_index)

This will drop the second column (index 1, which is ‘Age’):

Markdown

      Name         City  Salary
0    Alice     New York   70000
1      Bob  Los Angeles   80000
2  Charlie      Chicago  120000
3    David      Houston   90000

Conclusion

Dropping columns in a Pandas DataFrame is straightforward and can be done using various methods depending on your needs. Whether you are dropping a single column, multiple columns, modifying the original DataFrame, or dropping by index, Pandas provides a flexible and powerful way to handle your data.

I hope this blog post has provided you with a clear understanding of how to drop columns in a Pandas DataFrame. Happy data manipulation!

Explore Also:

Introduction to Pandas

Setting Up

Dropping a Single Column

Dropping Multiple Columns

Dropping Columns In-Place

Dropping Columns by Index

Conclusion

Leave a Comment Cancel reply