Pandas map() function – Explained with examples

Pandas is a powerful and versatile library for data manipulation and analysis in Python. One of its many useful functions is the map() function, which allows you to transform data in a Series by applying a function or mapping values from a dictionary. In this blog post, we’ll dive deep into the map() function, exploring its syntax, usage, and practical examples.

Understanding the map() Function

The map() function is used to map values of a Series according to an input correspondence, such as a function, dictionary, or Series. It returns a new Series where each value is the result of the mapping operation.

Syntax
Python
Series.map(arg, na_action=None)
  • arg: This can be a function, dictionary, or Series to map values.
  • na_action: This parameter can be either None or 'ignore'. If ignore, it skips NA values during the mapping.

Mapping with a Function

You can use a custom function or a lambda function to transform each value in the Series.

Example
Python
import pandas as pd

# Create a sample Series
data = pd.Series([1, 2, 3, 4, 5])

# Define a function to square each value
def square(x):
    return x ** 2

# Apply the function using map()
squared_data = data.map(square)
print(squared_data)

Output:

Markdown
0     1
1     4
2     9
3    16
4    25
dtype: int64

In this example, we have a list of numbers [1, 2, 3, 4, 5] stored in a Pandas Series named data. We want to square each number in this Series. To do this, we define a function called square that takes a single input x and returns x squared (x ** 2). Using the map() function from Pandas, we apply this square function to each element in the data Series. The result is a new Series where each original number has been squared, resulting in [1, 4, 9, 16, 25].


Mapping with a Dictionary

You can use a dictionary to replace each value in the Series with another value specified in the dictionary.

Example
Python
# Create a sample Series
data = pd.Series(['cat', 'dog', 'bird', 'fish'])

# Define a dictionary to map values
animal_map = {
    'cat': 'feline',
    'dog': 'canine',
    'bird': 'avian'
}

# Apply the dictionary using map()
mapped_data = data.map(animal_map)
print(mapped_data)

Output:

Markdown
0    feline
1    canine
2     avian
3       NaN
dtype: object

Here, our Series contains animal names like ['cat', 'dog', 'bird', 'fish']. We want to map these animal names to broader categories, so we use a dictionary called animal_map. This dictionary pairs each animal name ('cat', 'dog', 'bird') with a corresponding category ('feline', 'canine', 'avian'). Using the map() function with this dictionary, Pandas replaces each animal name in the Series with its corresponding category. However, since ‘fish’ is not in our dictionary, it gets replaced with NaN, representing a missing value.


Mapping with Another Series

You can also map values using another Series.

Example
Python
# Create two sample Series
data = pd.Series([1, 2, 3, 4, 5])
map_series = pd.Series({1: 'one', 2: 'two', 3: 'three'})

# Apply the Series using map()
mapped_data = data.map(map_series)
print(mapped_data)

Output:

Markdown
0      one
1      two
2    three
3      NaN
4      NaN
dtype: object

In this scenario, we have a Series data with numbers [1, 2, 3, 4, 5], and another Series map_series that maps specific numbers (1, 2, 3) to their corresponding words ('one', 'two', 'three'). By using the map() function with map_series, Pandas transforms each number in the data Series into its respective word from map_series. Numbers that do not have a corresponding mapping in map_series are replaced with NaN in the resulting Series.


Handling Missing Values with na_action

If you want to ignore NaN values while mapping, you can use the na_action='ignore' parameter.

Example
Python
# Create a sample Series with NaN values
data = pd.Series([1, 2, None, 4, 5])

# Define a function to double each value
def double(x):
    return x * 2

# Apply the function using map() with na_action='ignore'
mapped_data = data.map(double, na_action='ignore')
print(mapped_data)

Output:

Markdown
0     2.0
1     4.0
2     NaN
3     8.0
4    10.0
dtype: float64

Here, we start with a Series data that includes numbers [1, 2, None, 4, 5], where None represents a missing or NaN value. We define a function double that multiplies each number by 2. Using the map() function with na_action='ignore', Pandas applies the double function to each non-missing value in the data Series. The missing value (None) is ignored during the mapping process, so it remains as NaN in the resulting Series. The output shows the transformed values with NaN preserved where there were missing values originally.


Practical Applications

The map() function is particularly useful in data preprocessing and feature engineering. Here are some common use cases:

  1. Replacing Categorical Values: You can map categorical values to numerical values or other categories.
  2. Feature Transformation: Apply mathematical functions to transform features.
  3. Data Cleaning: Replace incorrect or missing values with appropriate mappings.

Conclusion

The map() function in Pandas is a versatile tool for transforming data in a Series. Whether you’re using a function, dictionary, or another Series, map() provides a simple and efficient way to perform value mapping. Understanding and utilizing this function can greatly enhance your data manipulation and analysis tasks.

Happy coding!

Also Explore:

Leave a Comment