Pandas DataFrame loc[] Method – Explained with Examples

Pandas is a powerful data manipulation library in Python, widely used for data analysis tasks. One of its key features is the DataFrame, a 2-dimensional labeled data structure that can hold data of different types (including integers, floats, and strings) in columns. The .loc[] method is an essential tool for accessing and modifying data within a DataFrame, allowing you to select rows and columns by labels or a boolean array. This blog will dive into the usage of the .loc[] method, covering its syntax, examples, and common use cases.

What is .loc[]?

The .loc[] method in Pandas is primarily label-based, meaning it is used to access a group of rows and columns by labels or a boolean array. It is part of the indexing and selection functionality provided by Pandas, which allows for precise and flexible data selection.

Basic Syntax

The basic syntax of the .loc[] method is:

Python
DataFrame.loc[row_indexer, column_indexer]

Here:

  • row_indexer specifies the labels of the rows to be accessed.
  • column_indexer specifies the labels of the columns to be accessed.

Both row_indexer and column_indexer can be a single label, a list of labels, or a slice of labels.

Examples

Let’s explore the .loc[] method with some practical examples.

1. Creating a Sample DataFrame

First, we’ll create a sample DataFrame to work with:

Python
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [24, 27, 22, 32, 28],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)

The DataFrame df looks like this:

Bash
       Name  Age         City
0     Alice   24     New York
1       Bob   27  Los Angeles
2   Charlie   22      Chicago
3     David   32      Houston
4       Eva   28      Phoenix

2. Selecting Rows by Label

To select rows by their labels (index values), use:

Python
# Selecting a single row by label
print(df.loc[2])

This selects the row with the label (index) 2, which contains the data for “Charlie”. The result is a series containing the Name, Age, and City of “Charlie”.

Output:

Bash
Name     Charlie
Age           22
City     Chicago
Name: 2, dtype: object

Now we will select multiple rows by label,

Python
# Selecting multiple rows by label
print(df.loc[[0, 3]])

This selects the rows with labels 0 and 3, which correspond to “Alice” and “David”. The result is a DataFrame with two rows, each showing the Name, Age, and City of these individuals.

Output:

Bash
     Name  Age     City
0   Alice   24 New York
3   David   32  Houston
3. Selecting Rows and Columns

You can select specific rows and columns by providing labels for both:

Python
# Selecting specific rows and columns
print(df.loc[1:3, ['Name', 'City']])

Output:

Bash
      Name         City
1      Bob  Los Angeles
2  Charlie      Chicago
3    David      Houston

In this example, we select rows from label 1 to 3 and only the Name and City columns. The result is a DataFrame showing “Bob”, “Charlie”, and “David” with their respective cities.

4. Using Boolean Indexing

Boolean indexing with .loc[] allows for filtering data based on conditions:

Python
# Selecting rows where Age is greater than 25
print(df.loc[df['Age'] > 25])

Output:

Bash
    Name  Age         City
1    Bob   27  Los Angeles
3  David   32      Houston
4    Eva   28      Phoenix

This example selects all rows where the Age is greater than 25. The result is a DataFrame containing the data for “Bob”, “David”, and “Eva”, who are all older than 25

5. Modifying Data with .loc[]

The .loc[] method can also be used to modify data in the DataFrame:

Python
# Changing the Age of Bob to 30
df.loc[df['Name'] == 'Bob', 'Age'] = 30
print(df)

In this example, let’s modify the Age of “Bob” from 27 to 30.

Output:

Bash
      Name  Age         City
0    Alice   24     New York
1      Bob   30  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston
4      Eva   28      Phoenix

Now, the DataFrame reflects this change, showing Bob’s updated age.


Advanced Usage

1. Using Slices for Labels

Slices can be used with labels to select a range of rows and columns:

Python
# Selecting a range of rows and columns
print(df.loc[1:3, 'Name':'City'])

This example selects rows from label 1 to 3 and all columns from Name to City. The result is a DataFrame with “Bob”, “Charlie”, and “David” along with their ages and cities.

Output:

Bash
      Name  Age         City
1      Bob   30  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston

2. Handling Missing Labels

If you try to access a label that does not exist, Pandas will raise a KeyError. You can use .reindex() to avoid this issue:

Python
# Reindexing to safely access missing labels
print(df.reindex([0, 5]))

Now, we attempt to reindex the DataFrame to include a row with label 5, which does not exist in the original DataFrame.

Output:

Bash
     Name   Age      City
0   Alice  24.0  New York
5     NaN   NaN       NaN

As a result, it returns a DataFrame with NaN values for the non-existent row.

The reindex method in Pandas is used to change the row and/or column labels of a DataFrame to a new specified set of labels. If the new labels do not match the existing ones, it can introduce new rows or columns filled with NaN (missing) values, or remove those that are no longer needed. This method is useful for aligning data to a new index or for reordering the rows and columns.

Conclusion

The .loc[] method is a powerful tool for data selection and manipulation in Pandas. It allows for precise control over the rows and columns you want to access or modify, making it an essential part of any data analyst or data scientist’s toolkit. By understanding and utilizing the .loc[] method, you can streamline your data analysis tasks and make your code more efficient and readable.

Read also:

Leave a Comment