Pandas dataframe insert() method – Explained with examples

Pandas is one of the most popular libraries in Python for data manipulation and analysis. One of its powerful features is the DataFrame, a 2-dimensional labeled data structure with columns of potentially different types. In this blog, we will delve into the dataframe insert() function, which allows you to insert a new column into a DataFrame at a specific location.

What is insert() method in pandas?

The insert() method is used to insert a new column into a DataFrame at a specific position. This function provides greater control over where the new column is added compared to simply assigning a new column to the DataFrame.

Syntax
Python
DataFrame.insert(loc, column, value, allow_duplicates=False)
Parameters
  • loc: int
  • The position to insert the column. Must be within the range of the DataFrame’s columns.
  • column: str
  • The label of the new column.
  • value: scalar, array-like, or Series
  • The values to insert. These can be a single value or an array/Series of values.
  • allow_duplicates: bool, default False
  • Whether to allow inserting columns with duplicate names.
Returns

This method does not return a new DataFrame. Instead, it modifies the original DataFrame in place.

Example Usage

Let’s go through some practical examples to understand how dataframe.insert() works.

1. Basic Example

Suppose we have the following DataFrame:

Python
import pandas as pd

data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}
df = pd.DataFrame(data)
print(df)

Output:

Markdown
   A  B
0  1  4
1  2  5
2  3  6

Now, let’s insert a new column ‘C’ with values [7, 8, 9] at position 1:

Python
df.insert(1, 'C', [7, 8, 9])
print(df)

Output:

Markdown
   A  C  B
0  1  7  4
1  2  8  5
2  3  9  6

As you can see, the new column ‘C’ has been inserted at the specified position.


2. Handling Duplicates

By default, dataframe.insert() does not allow duplicate column names. If you try to insert a column with a name that already exists, it will raise a ValueError.

Python
# This will raise an error
df.insert(1, 'A', [7, 8, 9])

To allow duplicate column names, set the allow_duplicates parameter to True:

Python
df.insert(1, 'A', [7, 8, 9], allow_duplicates=True)
print(df)

Output:

Markdown
   A  A  C  B
0  1  7  7  4
1  2  8  8  5
2  3  9  9  6

3. Inserting a Column with Scalar Value

You can also insert a column with a single scalar value. The scalar value will be broadcasted to all rows.

Python
df.insert(2, 'D', 10)
print(df)

Output:

Markdown
   A  A  D  C  B
0  1  7 10  7  4
1  2  8 10  8  5
2  3  9 10  9  6

4. Inserting a Column from a Series

If you have a Series with an index that matches the DataFrame’s index, you can insert it as a column.

Python
series = pd.Series([11, 12, 13])
df.insert(3, 'E', series)
print(df)

Output:

Markdown
   A  A  D   E  C  B
0  1  7 10  11  7  4
1  2  8 10  12  8  5
2  3  9 10  13  9  6

5. Inserting a Column Based on a Condition

In this example, let’s say you want to add a new column to your DataFrame that categorizes the values in column 'B'. Specifically, you want to create a column 'E' that contains the string 'High' if the value in column 'B' is greater than 5, and 'Low' otherwise. This kind of operation is useful for categorizing or binning your data based on specific criteria.

You achieve this by using a list comprehension that checks each value in column 'B' and assigns 'High' or 'Low' accordingly. You then insert this new column into your DataFrame at the desired position using the insert() method.

For example, if your DataFrame initially looks like this:

Markdown
   A  B
0  1  4
1  2  5
2  3  6

Now, we create a condition-based column ‘E’,

Python
# Creating a condition-based column 'E'
df.insert(2, 'E', ['High' if x > 5 else 'Low' for x in df['B']])
print("\nDataFrame with condition-based column 'E':")
print(df)

After inserting the new column based on the condition, it becomes:

Markdown
   A  B    E
0  1  4  Low
1  2  5  Low
2  3  6  High

In this updated DataFrame, the new column 'E' clearly indicates which values in column 'B' are considered 'High' or 'Low' based on the threshold of 5. This addition helps in quickly understanding and categorizing the data without modifying the original values.


6. Inserting a Column from a Series with Matching Index

If you have a Series with an index that matches the DataFrame’s index, you can insert it as a column.

Python
import pandas as pd

# Initial DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Create a Series with matching index
scores = pd.Series([88, 92, 85], index=[0, 1, 2])

# Insert the Series as a new column 'Score' at position 1
df.insert(1, 'Score', scores)
print("\nDataFrame after inserting 'Score' column from Series:")
print(df)

Output:

Markdown
Original DataFrame:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

DataFrame after inserting 'Score' column from Series:
      Name  Score  Age
0    Alice     88   25
1      Bob     92   30
2  Charlie     85   35

In this example, we start with a DataFrame that has two columns: ‘Name’ and ‘Age’. We then create a Series named scores containing the values [88, 92, 85], with an index that matches the index of the DataFrame. This aligns the scores with the corresponding rows in the DataFrame based on their index.

When to use dataframe.insert()
  • Specific Positioning: When you need to add a column at a specific location in your DataFrame.
  • Control Over Duplicates: When you want to control whether duplicate column names are allowed.
Conclusion

The dataframe.insert() function is a powerful tool for adding columns to your DataFrame exactly where you need them. It offers flexibility and control, making it an essential function for data manipulation tasks in Pandas. Whether you’re adding new data, aligning columns for better readability, or ensuring your DataFrame meets specific format requirements, dataframe.insert() is a method worth mastering.

Also explore:

Leave a Comment