Pandas is a Python software library designed for data manipulation and analysis. It provides data structures and functions specifically for working with numerical tables and time series. Released as free software, it is available under the three-clause BSD license. It is built on top of NumPy and provides data structures and functions needed to work with structured data seamlessly. This blog will introduce the basic concepts of pandas, its primary data structures, and some fundamental operations you can perform with it.
Table of Contents
- What is Pandas?
- Installation
- Pandas Data Structures
- Series
- DataFrame
- Basic Operations
- Creating a DataFrame
- Viewing Data
- Selecting Data
- Data Manipulation
- Summary
1. What is Pandas?
Pandas is designed for practical, real-world data analysis in Python. It allows you to manipulate and analyze data in an efficient and easy-to-understand manner. Pandas is especially useful for working with data that is in tabular form, like data from spreadsheets or SQL tables.
2. Installation
You can install pandas using pip, which is the standard package manager for Python. Run the following command in your terminal or command prompt:
pip install pandas
3. Pandas Data Structures
Pandas has two primary data structures: Series and DataFrame. These structures are highly flexible and allow you to handle a wide variety of data formats.
i) Series
A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating-point numbers, etc.). Think of it as a column in a spreadsheet or a SQL table.
import pandas as pd
# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)
Output:
0 1
1 3
2 5
3 7
4 9
dtype: int64
ii) DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it as a table in a database, a data frame in R, or a sheet in Excel.
# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 27, 22, 32],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Alice 24 New York
1 Bob 27 Los Angeles
2 Charlie 22 Chicago
3 David 32 Houston
4. Basic Operations
Once you have your data in a DataFrame, you can perform various operations to manipulate and analyze it.
i) Creating a DataFrame
You can create a DataFrame from different data sources like lists, dictionaries, or even external files (CSV, Excel, SQL, etc.).
# Creating DataFrame from a dictionary
data = {
'Product': ['Tablet', 'Laptop', 'Smartphone'],
'Price': [250, 1200, 800]
}
df = pd.DataFrame(data)
print(df)
Output:
Product Price
0 Tablet 250
1 Laptop 1200
2 Smartphone 800
ii) Viewing Data
Pandas provides various methods to quickly inspect the data in a DataFrame.
# Viewing the first few rows
print(df.head())
# Viewing the last few rows
print(df.tail())
# Getting a quick statistical summary
print(df.describe())
Output:
# Viewing the first few rows
Product Price
0 Tablet 250
1 Laptop 1200
2 Smartphone 800
# Viewing the last few rows
Product Price
0 Tablet 250
1 Laptop 1200
2 Smartphone 800
# Getting a quick statistical summary
Price
count 3.000000
mean 750.000000
std 476.969601
min 250.000000
25% 525.000000
50% 800.000000
75% 1000.000000
max 1200.000000
iii) Selecting Data
You can select data from a DataFrame in several ways, such as by column name or by conditions.
# Selecting a single column
print(df['Product'])
# Selecting multiple columns
print(df[['Product', 'Price']])
# Selecting rows based on a condition
print(df[df['Price'] > 500])
Output:
# Selecting a single column
0 Tablet
1 Laptop
2 Smartphone
Name: Product, dtype: object
# Selecting multiple columns
Product Price
0 Tablet 250
1 Laptop 1200
2 Smartphone 800
# Selecting rows based on a condition
Product Price
1 Laptop 1200
2 Smartphone 800
iv) Data Manipulation
Pandas allows you to perform various data manipulation tasks such as adding, updating, or deleting columns and rows.
# Adding a new column
df['Discount'] = [10, 20, 15]
print(df)
# Updating a column
df['Price'] = df['Price'] * 0.9
print(df)
# Deleting a column
df = df.drop('Discount', axis=1)
print(df)
Output:
# Adding a new column
Product Price Discount
0 Tablet 250 10
1 Laptop 1200 20
2 Smartphone 800 15
# Updating a column
Product Price Discount
0 Tablet 225.0 10
1 Laptop 1080.0 20
2 Smartphone 720.0 15
# Deleting a column
Product Price
0 Tablet 225.0
1 Laptop 1080.0
2 Smartphone 720.0
5. Summary
This blog covered the basics of pandas, including its installation, primary data structures, and some fundamental operations you can perform on data using pandas. As you become more familiar with pandas, you’ll discover its extensive capabilities for more advanced data analysis tasks.
You can explore the official Pandas documentation for more detailed information and advanced usage. Happy data analyzing!
Also Explore: