Pandas Data Types – A Simple Explanation with examples

When working with data in Pandas, understanding data types is essential for data analysis and manipulation. This blog will cover the basics of Pandas data types, including what they are, how to check and change them, and the different types available.

What is a Pandas Data Type?

A data type in Pandas, often referred to as a dtype, is an attribute that tells Pandas how to interpret and handle the data stored in a DataFrame or Series. It determines the kind of operations you can perform on the data, such as arithmetic operations, string manipulations, and logical comparisons.

How Many Data Types are in Pandas?

Pandas supports several data types, which can be broadly categorized as follows:

1. Numeric Types: Includes integers and floating-point numbers.

    • int64: Integer values
    • float64: Floating-point values

    2. Boolean Type: Represents True or False values.

    • bool: Boolean values

    3. Object Type: Used for text or mixed data types.

    • object: Typically used for strings or mixed data types

    4. Categorical Type: For categorical data with a limited set of values.

    • category: Categorical data

    5. Datetime Types: For date and time data.

    • datetime64[ns]: Dates and times
    How to Check Data Type in Pandas?

    To check the data type of columns in a Pandas DataFrame, you can use the dtypes attribute. Here’s an example:

    Python
    import pandas as pd
    
    data = {
        'A': [1, 2, 3],
        'B': [1.1, 2.2, 3.3],
        'C': [True, False, True],
        'D': ['apple', 'banana', 'cherry']
    }
    
    df = pd.DataFrame(data)
    print(df.dtypes)

    Output:

    A      int64
    B    float64
    C       bool
    D     object
    dtype: object

    In this example, we create a DataFrame with four columns containing different types of data: integers, floats, booleans, and strings. The dtypes attribute is used to display the data type of each column, showing int64 for integers, float64 for floating-point numbers, bool for boolean values, and object for strings.

    How to Change Type in Pandas?

    You can change the data type of a column using the astype method. Here’s how you can do it:

    Python
    # Change column 'A' to float
    df['A'] = df['A'].astype(float)
    
    # Change column 'D' to category
    df['D'] = df['D'].astype('category')
    
    print(df.dtypes)

    Output:

    A    float64
    B    float64
    C       bool
    D    category
    dtype: object

    This example demonstrates how to change the data type of a column using the astype method. We convert column ‘A’ from an integer to a float and column ‘D’ from an object (string) to a categorical type. The dtypes attribute is then used to confirm the changes in data types.

    What is the Boolean Type in Pandas?

    The boolean type in Pandas is used to represent True and False values. It is useful for logical operations and filtering data. In Pandas, boolean data type is denoted as bool.

    Example:

    Python
    # Boolean column
    print(df['C'])

    Output:

    0     True
    1    False
    2     True
    Name: C, dtype: bool

    In this example, we access the boolean column ‘C’ from the DataFrame. The output shows the True and False values stored in this column along with its data type, bool. Boolean types are particularly useful for logical operations and filtering rows based on conditions.

    What other types of Pandas are there?

    Apart from the common types mentioned above, Pandas also supports:

    • Timedelta Types: For time differences.
      • timedelta64[ns]: Time differences
    • Period Types: For time periods.
      • period[D]: Periods with daily frequency
    • Sparse Types: For storing data with many missing values efficiently.
      • Sparse: Sparse data structures
    Conclusion

    In summary, Pandas data types are essential for data manipulation and analysis. Knowing the different types, how to check them, and how to change them will help you work more effectively with your data. Whether you’re dealing with numbers, text, dates, or categories, Pandas has a data type that suits your needs.

    Also Explore:

    Leave a Comment