Adding a New Column to an Existing DataFrame in Pandas

Pandas is a powerful and flexible open-source data analysis and manipulation library for Python. One of the common tasks when working with data is adding new columns to an existing DataFrame. This blog will guide you through various methods to achieve this, complete with examples and explanations.

1. Adding a Column with a Scalar Value in Pandas

The simplest way to add a new column is to assign a scalar value to a new column name. This method is useful when you want every row in the new column to have the same value.

Python
import pandas as pd

# Creating a sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
})

# Adding a new column 'Country' with the same value for all rows
df['Country'] = 'USA'

print(df)

Output:

Markdown
      Name  Age Country
0    Alice   25     USA
1      Bob   30     USA
2  Charlie   35     USA


2. Adding a Column with Different Values

If you want each row to have a different value, you can assign a list of values to the new column.

Python
# Adding a new column 'Score' with different values
df['Score'] = [85, 90, 95]

print(df)

Output:

Markdown
      Name  Age Country  Score
0    Alice   25     USA     85
1      Bob   30     USA     90
2  Charlie   35     USA     95


3. Adding a Column Based on Existing Columns

You can also create a new column based on the values of existing columns. This is useful for calculations and transformations.

Python
# Adding a new column 'Age_Score' by multiplying 'Age' and 'Score'
df['Age_Score'] = df['Age'] * df['Score']

print(df)

Output:

Markdown
      Name  Age Country  Score  Age_Score
0    Alice   25     USA     85       2125
1      Bob   30     USA     90       2700
2  Charlie   35     USA     95       3325


4. Adding a Column with ‘assign’ Method

The assign method allows you to add new columns in a method chain without modifying the original DataFrame. This can be particularly useful for maintaining the immutability of your data.

Python
# Adding a new column 'Category' using assign method
df = df.assign(Category=['A', 'B', 'A'])

print(df)

Output:

Markdown
      Name  Age Country  Score  Age_Score Category
0    Alice   25     USA     85       2125        A
1      Bob   30     USA     90       2700        B
2  Charlie   35     USA     95       3325        A


5. Adding a Column with ‘loc’

The loc method can be used to add a new column by specifying the new column name and assigning values directly.

Python
# Adding a new column 'Passed' with Boolean values
df.loc[:, 'Passed'] = [True, True, False]

print(df)

Output:

Markdown
      Name  Age Country  Score  Age_Score Category  Passed
0    Alice   25     USA     85       2125        A    True
1      Bob   30     USA     90       2700        B    True
2  Charlie   35     USA     95       3325        A   False


Conclusion

Adding new columns to a DataFrame in Pandas is a fundamental operation that you will often need to perform. Depending on your specific needs, you can use various methods to achieve this, from simple scalar assignments to more complex calculations based on existing columns. Understanding these different methods will help you manipulate your data more effectively and efficiently.

Happy coding!

Explore Also:

Leave a Comment