Pandas is one of the most popular libraries in Python for data manipulation and analysis. One of its powerful features is the DataFrame, a 2-dimensional labeled data structure with columns of potentially different types. In this blog, we will delve into the dataframe insert() function, which allows you to insert a new column into a DataFrame at a specific location.
What is insert() method in pandas?
The insert()
method is used to insert a new column into a DataFrame at a specific position. This function provides greater control over where the new column is added compared to simply assigning a new column to the DataFrame.
Syntax
DataFrame.insert(loc, column, value, allow_duplicates=False)
Parameters
loc
: int- The position to insert the column. Must be within the range of the DataFrame’s columns.
column
: str- The label of the new column.
value
: scalar, array-like, or Series- The values to insert. These can be a single value or an array/Series of values.
allow_duplicates
: bool, default False- Whether to allow inserting columns with duplicate names.
Returns
This method does not return a new DataFrame. Instead, it modifies the original DataFrame in place.
Example Usage
Let’s go through some practical examples to understand how dataframe.insert()
works.
1. Basic Example
Suppose we have the following DataFrame:
import pandas as pd
data = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
df = pd.DataFrame(data)
print(df)
Output:
A B
0 1 4
1 2 5
2 3 6
Now, let’s insert a new column ‘C’ with values [7, 8, 9]
at position 1:
df.insert(1, 'C', [7, 8, 9])
print(df)
Output:
A C B
0 1 7 4
1 2 8 5
2 3 9 6
As you can see, the new column ‘C’ has been inserted at the specified position.
2. Handling Duplicates
By default, dataframe.insert()
does not allow duplicate column names. If you try to insert a column with a name that already exists, it will raise a ValueError
.
# This will raise an error
df.insert(1, 'A', [7, 8, 9])
To allow duplicate column names, set the allow_duplicates
parameter to True
:
df.insert(1, 'A', [7, 8, 9], allow_duplicates=True)
print(df)
Output:
A A C B
0 1 7 7 4
1 2 8 8 5
2 3 9 9 6
3. Inserting a Column with Scalar Value
You can also insert a column with a single scalar value. The scalar value will be broadcasted to all rows.
df.insert(2, 'D', 10)
print(df)
Output:
A A D C B
0 1 7 10 7 4
1 2 8 10 8 5
2 3 9 10 9 6
4. Inserting a Column from a Series
If you have a Series with an index that matches the DataFrame’s index, you can insert it as a column.
series = pd.Series([11, 12, 13])
df.insert(3, 'E', series)
print(df)
Output:
A A D E C B
0 1 7 10 11 7 4
1 2 8 10 12 8 5
2 3 9 10 13 9 6
5. Inserting a Column Based on a Condition
In this example, let’s say you want to add a new column to your DataFrame that categorizes the values in column 'B'
. Specifically, you want to create a column 'E'
that contains the string 'High'
if the value in column 'B'
is greater than 5, and 'Low'
otherwise. This kind of operation is useful for categorizing or binning your data based on specific criteria.
You achieve this by using a list comprehension that checks each value in column 'B'
and assigns 'High'
or 'Low'
accordingly. You then insert this new column into your DataFrame at the desired position using the insert()
method.
For example, if your DataFrame initially looks like this:
A B
0 1 4
1 2 5
2 3 6
Now, we create a condition-based column ‘E’,
# Creating a condition-based column 'E'
df.insert(2, 'E', ['High' if x > 5 else 'Low' for x in df['B']])
print("\nDataFrame with condition-based column 'E':")
print(df)
After inserting the new column based on the condition, it becomes:
A B E
0 1 4 Low
1 2 5 Low
2 3 6 High
In this updated DataFrame, the new column 'E'
clearly indicates which values in column 'B'
are considered 'High'
or 'Low'
based on the threshold of 5. This addition helps in quickly understanding and categorizing the data without modifying the original values.
6. Inserting a Column from a Series with Matching Index
If you have a Series with an index that matches the DataFrame’s index, you can insert it as a column.
import pandas as pd
# Initial DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Create a Series with matching index
scores = pd.Series([88, 92, 85], index=[0, 1, 2])
# Insert the Series as a new column 'Score' at position 1
df.insert(1, 'Score', scores)
print("\nDataFrame after inserting 'Score' column from Series:")
print(df)
Output:
Original DataFrame:
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
DataFrame after inserting 'Score' column from Series:
Name Score Age
0 Alice 88 25
1 Bob 92 30
2 Charlie 85 35
In this example, we start with a DataFrame that has two columns: ‘Name’ and ‘Age’. We then create a Series named scores
containing the values [88, 92, 85], with an index that matches the index of the DataFrame. This aligns the scores with the corresponding rows in the DataFrame based on their index.
When to use dataframe.insert()
- Specific Positioning: When you need to add a column at a specific location in your DataFrame.
- Control Over Duplicates: When you want to control whether duplicate column names are allowed.
Conclusion
The dataframe.insert()
function is a powerful tool for adding columns to your DataFrame exactly where you need them. It offers flexibility and control, making it an essential function for data manipulation tasks in Pandas. Whether you’re adding new data, aligning columns for better readability, or ensuring your DataFrame meets specific format requirements, dataframe.insert()
is a method worth mastering.
Also explore: