Pandas DataFrame var() Method – Explained with Examples

The var() method in Pandas is used to calculate the variance of the values in a DataFrame. Variance is a statistical measurement of the spread between numbers in a data set. The var() method helps to understand how data points in a dataset are spread out from their mean.

Syntax

Python

DataFrame.var(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)

Parameters

axis: {0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the variance is computed. The default is 0 (compute along columns).
skipna: bool, default True
Exclude NA/null values. If True, it excludes the missing values during calculation.
level: int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a DataFrame.
ddof: int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.
numeric_only: bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data.

Returns

Series or DataFrame: If level is specified, returns a DataFrame; otherwise, returns a Series.

Examples

Let’s dive into some examples to understand how to use the var() method.

Example 1: Basic Usage

Consider the following DataFrame:

Python

import pandas as pd

data = {
    'A': [1, 2, 3, 4, 5],
    'B': [5, 6, 7, 8, 9],
    'C': [9, 10, 11, 12, 13]
}

df = pd.DataFrame(data)
print(df)

Output:

   A  B   C
0  1  5   9
1  2  6  10
2  3  7  11
3  4  8  12
4  5  9  13

To calculate the variance of each column:

Python

variance = df.var()
print(variance)

Output:

A    2.5
B    2.5
C    2.5
dtype: float64

In this example, the variance for each column (A, B, and C) is 2.5. This means the data points in each column are spread out from their mean by an average of 2.5 units squared.

Example 2: Calculating Variance Along Rows

You can also calculate the variance along rows by setting the axis parameter to 1.

Python

row_variance = df.var(axis=1)
print(row_variance)

Output:

0    16.0
1    16.0
2    16.0
3    16.0
4    16.0
dtype: float64

In this example, the variance along each row is 16.0. This indicates that within each row, the data points have an average spread of 16.0 units squared from their mean.

Example 3: Handling Missing Values

The skipna parameter can be used to exclude NA/null values. Consider the following DataFrame with missing values:

Python

data_with_nan = {
    'A': [1, 2, None, 4, 5],
    'B': [5, None, 7, 8, 9],
    'C': [9, 10, 11, None, 13]
}

df_nan = pd.DataFrame(data_with_nan)
print(df_nan)

Output:

     A    B     C
0  1.0  5.0   9.0
1  2.0  NaN  10.0
2  NaN  7.0  11.0
3  4.0  8.0   NaN
4  5.0  9.0  13.0

By default, skipna=True, so NA/null values are excluded:

Python

variance_nan = df_nan.var()
print(variance_nan)

Output:

A    3.333333
B    2.916667
C    2.916667
dtype: float64

If you set skipna=False, the method will return NA for columns with missing values:

Python

variance_nan_include = df_nan.var(skipna=False)
print(variance_nan_include)

Output:

A   NaN
B   NaN
C   NaN
dtype: float64

In the above examples, when missing values are excluded (skipna=True), the variances for columns A and C are approximately 2.92, and for column B is 2.0. If missing values are included (skipna=False), the variance cannot be calculated for columns with any missing data, resulting in NaN values.

Example 4: Using `ddof` Parameter

The ddof parameter allows you to set the degrees of freedom for the calculation. By default, ddof=1.

Python

variance_ddof0 = df.var(ddof=0)
print(variance_ddof0)

Output:

A    2.0
B    2.0
C    2.0
dtype: float64

In this example, using ddof=0 changes the divisor to N instead of N-1, resulting in a variance of 2.0 for each column, showing a slightly lower spread of data points.

Conclusion

The var() method in Pandas is a powerful tool for statistical analysis, allowing you to compute the variance of your data along a specified axis while handling missing values and providing flexibility with degrees of freedom. Understanding and utilizing this method effectively can help you gain insights into the variability of your datasets.

Also Explore:

Syntax

Parameters

Returns

Examples

Example 1: Basic Usage

Example 2: Calculating Variance Along Rows

Example 3: Handling Missing Values

Example 4: Using ddof Parameter

Conclusion

Leave a Comment Cancel reply

Example 4: Using `ddof` Parameter