Basic Concept of Pandas in Python | A Simple Overview

Pandas is a Python software library designed for data manipulation and analysis. It provides data structures and functions specifically for working with numerical tables and time series. Released as free software, it is available under the three-clause BSD license. It is built on top of NumPy and provides data structures and functions needed to work with structured data seamlessly. This blog will introduce the basic concepts of pandas, its primary data structures, and some fundamental operations you can perform with it.

Table of Contents

  1. What is Pandas?
  2. Installation
  3. Pandas Data Structures
    • Series
    • DataFrame
  4. Basic Operations
    • Creating a DataFrame
    • Viewing Data
    • Selecting Data
    • Data Manipulation
  5. Summary

1. What is Pandas?

Pandas is designed for practical, real-world data analysis in Python. It allows you to manipulate and analyze data in an efficient and easy-to-understand manner. Pandas is especially useful for working with data that is in tabular form, like data from spreadsheets or SQL tables.

2. Installation

You can install pandas using pip, which is the standard package manager for Python. Run the following command in your terminal or command prompt:

Bash
pip install pandas

3. Pandas Data Structures

Pandas has two primary data structures: Series and DataFrame. These structures are highly flexible and allow you to handle a wide variety of data formats.

i) Series

A Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating-point numbers, etc.). Think of it as a column in a spreadsheet or a SQL table.

Python
import pandas as pd

# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)

Output:

0    1
1    3
2    5
3    7
4    9
dtype: int64
ii) DataFrame

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. You can think of it as a table in a database, a data frame in R, or a sheet in Excel.

Python
# Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston

4. Basic Operations

Once you have your data in a DataFrame, you can perform various operations to manipulate and analyze it.

i) Creating a DataFrame

You can create a DataFrame from different data sources like lists, dictionaries, or even external files (CSV, Excel, SQL, etc.).

Python
# Creating DataFrame from a dictionary
data = {
    'Product': ['Tablet', 'Laptop', 'Smartphone'],
    'Price': [250, 1200, 800]
}

df = pd.DataFrame(data)
print(df)

Output:

      Product  Price
0      Tablet    250
1      Laptop   1200
2  Smartphone    800
ii) Viewing Data

Pandas provides various methods to quickly inspect the data in a DataFrame.

Python
# Viewing the first few rows
print(df.head())

# Viewing the last few rows
print(df.tail())

# Getting a quick statistical summary
print(df.describe())

Output:

# Viewing the first few rows
      Product  Price
0      Tablet    250
1      Laptop   1200
2  Smartphone    800

# Viewing the last few rows
      Product  Price
0      Tablet    250
1      Laptop   1200
2  Smartphone    800

# Getting a quick statistical summary
             Price
count     3.000000
mean    750.000000
std     476.969601
min     250.000000
25%     525.000000
50%     800.000000
75%    1000.000000
max    1200.000000
iii) Selecting Data

You can select data from a DataFrame in several ways, such as by column name or by conditions.

Python
# Selecting a single column
print(df['Product'])

# Selecting multiple columns
print(df[['Product', 'Price']])

# Selecting rows based on a condition
print(df[df['Price'] > 500])

Output:

# Selecting a single column
0        Tablet
1        Laptop
2    Smartphone
Name: Product, dtype: object

# Selecting multiple columns
      Product  Price
0      Tablet    250
1      Laptop   1200
2  Smartphone    800

# Selecting rows based on a condition
      Product  Price
1      Laptop   1200
2  Smartphone    800
iv) Data Manipulation

Pandas allows you to perform various data manipulation tasks such as adding, updating, or deleting columns and rows.

Python
# Adding a new column
df['Discount'] = [10, 20, 15]
print(df)

# Updating a column
df['Price'] = df['Price'] * 0.9
print(df)

# Deleting a column
df = df.drop('Discount', axis=1)
print(df)

Output:

# Adding a new column
      Product  Price  Discount
0      Tablet    250        10
1      Laptop   1200        20
2  Smartphone    800        15

# Updating a column
      Product   Price  Discount
0      Tablet   225.0        10
1      Laptop  1080.0        20
2  Smartphone   720.0        15

# Deleting a column
      Product   Price
0      Tablet   225.0
1      Laptop  1080.0
2  Smartphone   720.0

5. Summary

This blog covered the basics of pandas, including its installation, primary data structures, and some fundamental operations you can perform on data using pandas. As you become more familiar with pandas, you’ll discover its extensive capabilities for more advanced data analysis tasks.

You can explore the official Pandas documentation for more detailed information and advanced usage. Happy data analyzing!

Also Explore:

Leave a Comment