How to Check the Data Type of a Column in Pandas

When working with data in Pandas, it’s crucial to understand the data types of the columns in your DataFrame. This knowledge helps in performing appropriate data operations and ensures the accuracy of your data analysis. In this blog post, we’ll explore various methods to check the data type of a column in a Pandas DataFrame.

What are Data Types in Pandas?

Pandas supports several data types that can be used in DataFrames, including:

  • int64: Integer values
  • float64: Floating-point values
  • object: General-purpose type (usually strings)
  • bool: Boolean values
  • datetime64[ns]: Date and time values
  • category: Categorical values

Understanding these data types is essential for effective data manipulation and analysis.

1. Using dtypes Attribute

The dtypes attribute provides a quick and straightforward way to check the data types of all columns in a DataFrame.

Python
import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Height': [5.5, 6.0, 5.8],
    'Member': [True, False, True]
}

df = pd.DataFrame(data)

# Check data types of all columns
print(df.dtypes)

Output Explanation:

Name       object
Age         int64
Height    float64
Member       bool
dtype: object

The output shows the data types of each column in the DataFrame:

  • Name is of type object, which usually represents string data.
  • Age is of type int64, representing integer values.
  • Height is of type float64, representing floating-point values.
  • Member is of type bool, representing boolean values.
2. Using info() Method

The info() method provides a concise summary of the DataFrame, including the data types of the columns.

Python
# Check data types using info()
df.info()

Output Explanation:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    3 non-null      object 
 1   Age     3 non-null      int64  
 2   Height  3 non-null      float64
 3   Member  3 non-null      bool   
dtypes: bool(1), float64(1), int64(1), object(1)
memory usage: 283.0+ bytes

The output includes:

  • The class type of the DataFrame (pandas.core.frame.DataFrame).
  • The index range (0 to 2).
  • The column names and their non-null count.
  • The data type (Dtype) of each column.
  • A summary of the data types (dtypes) and memory usage.
3. Using astype() Method

The astype() method can be used to check and convert the data type of a specific column. Although it is primarily used for type conversion, it can help verify the current data type as well.

Python
# Check the data type of a specific column
print(df['Age'].astype('int64').dtype)

Output Explanation:

int64

The output confirms that the Age column is of type int64. The astype('int64') method doesn’t change the data type here since it is already int64; it just verifies it.

4. Using select_dtypes() Method

The select_dtypes() method allows you to select columns based on their data type. This is particularly useful for filtering columns by type.

Python
# Select columns with data type 'int64'
int_columns = df.select_dtypes(include=['int64'])
print(int_columns)

Output Explanation:

   Age
0   25
1   30
2   35

The output shows a new DataFrame containing only the columns with data type int64. In this case, it’s just the Age column.

You can also exclude certain data types:

Python
# Exclude columns with data type 'object'
non_object_columns = df.select_dtypes(exclude=['object'])
print(non_object_columns)

Output Explanation:

   Age  Height  Member
0   25     5.5    True
1   30     6.0   False
2   35     5.8    True

The output shows a new DataFrame that excludes columns with data type object. The resulting DataFrame includes the Age, Height, and Member columns, which have data types int64, float64, and bool, respectively.

Conclusion

Understanding the data types of your DataFrame columns is a fundamental step in data analysis and manipulation with Pandas. The methods discussed above—dtypes, info(), astype(), and select_dtypes()—provide you with the tools to effectively check and manage column data types. By leveraging these methods, you can ensure your data is properly handled and analyzed.

Feel free to explore these methods with your own datasets and see how they can enhance your data analysis workflow. Happy coding!


With this blog post, you should have a solid understanding of how to check the data type of columns in a Pandas DataFrame. If you have any questions or suggestions, please leave a comment below!

Also Explore:

Leave a Comment