Generating Random Integers in Pandas DataFrame

When working with data in Pandas, there are many occasions when you might need to generate random integers. This can be useful for creating sample data, testing algorithms, or simulating real-world data. In this blog post, we’ll explore various ways to generate random integers in a Pandas DataFrame.

Introduction to Random Integer Generation

Random integer generation involves creating a sequence of random numbers within a specified range. In Pandas, we can leverage the power of NumPy to generate these random integers efficiently.

1. Using numpy.random.randint()

The numpy.random.randint() function is a straightforward way to generate random integers. It allows you to specify the range and the size of the array you want to generate.

Example:
Python
import pandas as pd
import numpy as np

# Creating a DataFrame with random integers
df = pd.DataFrame(np.random.randint(0, 100, size=(10, 4)), columns=list('ABCD'))

print(df)

In this example, we create a DataFrame with 10 rows and 4 columns filled with random integers between 0 and 99. The columns parameter is used to name the columns ‘A’, ‘B’, ‘C’, and ‘D’.

Output:
    A   B   C   D
0  35  75  99  12
1  41  56  11  67
2  18  24  32  44
3  93  86  21  73
4  57  92  20  89
5  49  13  88  95
6  84  45  51  63
7  60  72  17  14
8  27  68  55  85
9  90  78  23  37
2. Using numpy.random.choice()

The numpy.random.choice() function allows you to generate random integers from a specified list of values. This can be useful if you need random integers from a non-continuous range or specific set of values.

Example:
import pandas as pd
import numpy as np

# Creating a DataFrame with random integers from a specified list
choices = [10, 20, 30, 40, 50]
df = pd.DataFrame(np.random.choice(choices, size=(10, 4)), columns=list('ABCD'))

print(df)

In this example, we specify a list of choices [10, 20, 30, 40, 50] and create a DataFrame with 10 rows and 4 columns filled with random integers from this list.

Output:
    A   B   C   D
0  50  30  20  40
1  30  50  40  50
2  20  40  10  30
3  10  20  50  10
4  50  20  40  50
5  20  10  30  20
6  30  10  30  40
7  10  50  30  20
8  20  30  20  40
9  40  10  50  50
3. Create a time-based DataFrame with random data
Example:
Python
import pandas as pd
import numpy as np

# Creating a time-based index
date_range = pd.date_range(start='2020-01-01', periods=10, freq='D')

# Creating a DataFrame with random floats
df = pd.DataFrame(np.random.randn(10, 4), index=date_range, columns=list('ABCD'))

# Converting floats to integers
df = df.applymap(lambda x: int(x * 100))

print(df)

In this example, we manually create a date range and generate random data for the DataFrame. The values are then converted to integers similarly by multiplying by 100 and casting to int.

Output:
             A   B   C   D
2000-01-03  53  38 -48  61
2000-01-04 -38  94  43  37
2000-01-05 -64 -70 -80 -45
2000-01-06 -65  31  21 -32
2000-01-07 -73 -58  30 -24
2000-01-10  92 -45 -27  41.
2000-01-11  14 -11  91  27
2000-01-12  56  47 -97  87
2000-01-13 -12  60 -68  73
2000-01-14  66 -94  91  14

Sure! Here are a few more examples that illustrate different ways to generate random integers in a Pandas DataFrame:

4. Using numpy.random.default_rng().integers()

The numpy.random.default_rng().integers() function is part of NumPy’s new random number generation system. It provides more flexibility and better performance.

Example:
Python
import pandas as pd
import numpy as np

# Creating a random number generator
rng = np.random.default_rng()

# Creating a DataFrame with random integers
df = pd.DataFrame(rng.integers(low=0, high=100, size=(10, 4)), columns=list('ABCD'))

print(df)

In this example, we create a DataFrame with 10 rows and 4 columns filled with random integers between 0 and 99 using NumPy’s new random number generator.

Output:
    A   B   C   D
0  27  73  38  11
1  88  14  65  44
2  56  92  34  72
3  95  13  76  49
4  77  22  19  33
5  90  57  62  88
6  40  68  21  87
7  11  80  15  74
8  25  42  53  66
9  19  60  18  51
5. Using numpy.random.randint() with pandas.Series

You can generate random integers using numpy.random.randint() and then convert them into a Pandas Series to create a DataFrame.

Example:
Python
import pandas as pd
import numpy as np

# Creating a DataFrame with random integers using Series
data = {
    'A': pd.Series(np.random.randint(0, 100, size=10)),
    'B': pd.Series(np.random.randint(0, 100, size=10)),
    'C': pd.Series(np.random.randint(0, 100, size=10)),
    'D': pd.Series(np.random.randint(0, 100, size=10))
}

df = pd.DataFrame(data)

print(df)

In this example, we create a dictionary of Pandas Series, each filled with random integers, and then construct a DataFrame from this dictionary.

Output:
    A   B   C   D
0  18  83  92  19
1  70  55  11  32
2  37  78  45  21
3  49  34  54  14
4  26  63  70  29
5  95  40  88  65
6  80  19  53  46
7  33  22  66  91
8  57  77  38  27
9  61  95  29  84
6. Using numpy.random.rand() and Scaling

Another approach is to use numpy.random.rand() to generate random floats between 0 and 1, and then scale them to the desired integer range.

Example:
Python
import pandas as pd
import numpy as np

# Creating a DataFrame with scaled random integers
df = pd.DataFrame(np.random.rand(10, 4) * 100, columns=list('ABCD')).astype(int)

print(df)

In this example, we create a DataFrame with random floats between 0 and 1, multiply by 100 to scale them to the range of 0 to 99, and then convert them to integers.

Output:
    A   B   C   D
0  79  92  54  15
1  44  22  85  30
2  94  19  48  77
3  55  67  32  10
4  36  20  41  94
5  71  63  16  27
6  83  11  97  40
7  78  23  66  58
8  64  58  76  12
9  91  70  31  88
7. Using pandas.DataFrame.sample()

You can also use pandas.DataFrame.sample() to randomly sample rows from a DataFrame of integers.

Example:
Python
import pandas as pd
import numpy as np

# Creating a base DataFrame with a range of integers
base_df = pd.DataFrame({
    'A': range(0, 100),
    'B': range(100, 200),
    'C': range(200, 300),
    'D': range(300, 400)
})

# Randomly sampling rows from the base DataFrame
df = base_df.sample(n=10, random_state=1).reset_index(drop=True)

print(df)

In this example, we create a base DataFrame with a range of integers and then randomly sample 10 rows from it using sample(). The random_state parameter ensures reproducibility.

Output:
     A    B    C    D
0   37  137  237  337
1   12  112  212  312
2   72  172  272  372
3   9   109  209  309
4   75  175  275  375
5   5   105  205  305
6   36  136  236  336
7   79  179  279  379
8   68  168  268  368
9   96  196  296  396
Conclusion

Generating random integers in a Pandas DataFrame can be achieved using various methods, depending on your specific needs. By following these methods, you can easily create DataFrames filled with random integers for your data analysis and testing purposes.

By understanding these techniques, you can effectively simulate real-world data scenarios and enhance your data manipulation skills in Pandas.

Happy coding!

Also Explore:

Leave a Comment