Pandas DataFrame dtypes Property | Find Data Type of Columns

Pandas is a powerful library for data manipulation and analysis in Python. One of the key features of Pandas is its ability to handle diverse types of data within a DataFrame. Understanding the data types of your columns is crucial for data analysis and preprocessing. This is where the dtypes property of a Pandas DataFrame comes in handy.

In this blog, we will explore the dtypes property of a Pandas DataFrame with several examples to illustrate its usage and importance.

What is the dtypes Property?

The dtypes property returns a Series with the data type of each column in the DataFrame. This helps in understanding the type of data contained in each column, which is essential for performing type-specific operations and avoiding errors.

Syntax

Python

DataFrame.dtypes

Example 1: Basic Usage

Let’s start with a simple DataFrame and explore the dtypes property.

Python

import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Height': [5.5, 6.0, 5.8],
    'Member': [True, False, True]
}

df = pd.DataFrame(data)

# Display the DataFrame
print(df)

# Display the data types of each column
print(df.dtypes)

Output:

Markdown

      Name  Age  Height  Member
0    Alice   25     5.5    True
1      Bob   30     6.0   False
2  Charlie   35     5.8    True

Name       object
Age         int64
Height    float64
Member       bool
dtype: object

In this example, the dtypes property returns the data types of each column in the DataFrame. We can see that the Name column is of type object (string), Age is int64 (integer), Height is float64 (floating-point number), and Member is bool (boolean).

Example 2: Checking Data Types Before and After Conversion

Often, you may need to convert the data type of a column to perform specific operations. The dtypes property can help verify the data type before and after conversion.

Python

# Display the original data types
print("Original Data Types:")
print(df.dtypes)

# Convert 'Age' to float and 'Member' to int
df['Age'] = df['Age'].astype(float)
df['Member'] = df['Member'].astype(int)

# Display the data types after conversion
print("\nData Types After Conversion:")
print(df.dtypes)

Output:

Markdown

Original Data Types:
Name       object
Age         int64
Height    float64
Member       bool
dtype: object

Data Types After Conversion:
Name       object
Age       float64
Height    float64
Member      int64
dtype: object

Here, we converted the Age column from int64 to float64 and the Member column from bool to int64. The dtypes property helps us confirm the changes.

Example 3: Filtering Columns by Data Type

You can use the dtypes property to filter columns based on their data types. This is useful when you want to perform operations only on specific types of data.

Python

# Select only the columns with integer data type
int_columns = df.select_dtypes(include=['int64']).columns
print("Integer Columns:")
print(int_columns)

# Select only the columns with float data type
float_columns = df.select_dtypes(include=['float64']).columns
print("\nFloat Columns:")
print(float_columns)

Output:

Markdown

Integer Columns:
Index(['Member'], dtype='object')

Float Columns:
Index(['Age', 'Height'], dtype='object')

In this example, we used the select_dtypes method to filter columns based on their data types. This can be particularly useful when dealing with large DataFrames containing various data types.

Example 4: Handling Missing Data and ‘dtypes’

When working with real-world data, it is common to encounter missing values. Understanding how these missing values affect the data types of your DataFrame columns is crucial for accurate data analysis.

Let’s consider our initial DataFrame:

Python

import pandas as pd

# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Height': [5.5, 6.0, 5.8],
    'Member': [True, False, True]
}

df = pd.DataFrame(data)

# Display the DataFrame
print(df)

# Display the data types of each column
print(df.dtypes)

Output:

Markdown

      Name  Age  Height  Member
0    Alice   25     5.5    True
1      Bob   30     6.0   False
2  Charlie   35     5.8    True

Name       object
Age         int64
Height    float64
Member       bool
dtype: object

Here, we have a DataFrame with four columns: Name, Age, Height, and Member. Their data types are object, int64, float64, and bool respectively.

Now, let’s introduce some missing values (NaN) into the Age and Height columns.

Python

# Adding missing values to the DataFrame
df.loc[1, 'Age'] = None
df.loc[2, 'Height'] = None

# Display the DataFrame with missing values
print(df)

# Display the data types after adding missing values
print(df.dtypes)

Output:

Markdown

      Name   Age  Height  Member
0    Alice  25.0     5.5    True
1      Bob   NaN     6.0   False
2  Charlie  35.0     NaN    True

Name       object
Age       float64
Height    float64
Member       bool
dtype: object

Explanation

In the original DataFrame, Age was of type int64. After introducing a missing value (None or NaN) into the Age column at index 1, the entire column’s data type changed from int64 to float64. This is because NaN (Not a Number) is a special floating-point value, and Pandas cannot represent NaN in an integer column, so it converts the entire column to float64.
Similarly, we introduced a missing value into the Height column at index 2. Since Height was already of type float64, the data type remains float64.

Key Takeaway
When you introduce missing values into a DataFrame, Pandas automatically adjusts the data types of the affected columns. Specifically, if a column of integer type (int64) contains missing values, Pandas converts the entire column to floating-point type (float64) to accommodate the NaN values. This conversion is necessary because NaN is a floating-point representation and cannot be stored in an integer column.

Understanding this behavior is important when working with data that may have missing values, as it ensures you correctly handle data types and avoid unexpected issues during data analysis.

Conclusion

The dtypes property of a Pandas DataFrame is a powerful tool for understanding and managing the data types of your columns. It helps you verify data types, filter columns, and handle type conversions effectively. By leveraging the dtypes property, you can ensure that your data analysis and preprocessing tasks are accurate and efficient.

By following the examples provided in this blog, you should have a solid understanding of how to use the dtypes property in Pandas to manage and analyze your data effectively. Happy coding!

Also Explore:

What is the dtypes Property?

Syntax

Example 1: Basic Usage

Example 2: Checking Data Types Before and After Conversion

Example 3: Filtering Columns by Data Type

Example 4: Handling Missing Data and ‘dtypes’

Explanation

Conclusion

Leave a Comment Cancel reply