How to generate random data with numpy - Python

The Python NumPy library has many sub-modules within it with extra functions and features to assist you. One of these many modules is the “random” module, which we can use to generate random data with Python NumPy.

You may have previously heard about other libraries for generating random data. The NumPy Random module however, comes with some extra functions that you won’t find elsewhere. We will be discussing some of these in today’s tutorial.

Basic Random Data Generation with NumPy

There are various functions for the generation of random data in Numpy, each with a slightly unique twist. Let’s discuss the one by one. (All of these functions are found in the numpy.random module)

Generating Random Integers

The first and probably the most commonly used one is randint(). There many different ways we can use this function, depending on the number of parameters used.

The most simple case involves passing a single parameter.

import numpy as np

print(np.random.randint(100))

This will print out a number between 0 to 100.

In the above example, 0 is assumed as the starting point for the range from which the random number is picked. By passing in two parameters, we can change the starting point.

This code will print out a number from between 50 and 100.

print(np.random.randint(50, 100))

By passing in a third parameter, we can have this function return more than one number in the form of an array. The third parameter defines how many values will be returned. By default this value is 1.

print(np.random.randint(0, 10, 3))

[2 3 9]

As you can see, the above code returned 3 values.

The third parameter is actually called size parameter. You can further modify it and by passing in a tuple containing two integers, you can have it return a 2D array of random numbers.

print(np.random.randint(0, 10, size = (3, 4)))

[[2 8 3 5]
 [6 2 8 9]
 [2 7 5 7]]

Generating Random Floats

To generate random floats, we have a different function called rand(). This function can be called without passing any parameters, and will generate a random float in the range 0 to 1.

print(np.random.rand())

0.33109829661441437

By passing an integer n, you can have a array of size n returned with random float values.

print(np.random.rand(5))

[0.50132969 0.94523932 0.12451228 0.77109302 0.53581386]

You can also generate 2D random float arrays by passing in a second parameter.

print(np.random.rand(5, 4))

[[0.60346791 0.40194638 0.20951285 0.33231324]
 [0.85746236 0.83145176 0.22490456 0.38462784]
 [0.61896829 0.45537222 0.47277243 0.54272509]
 [0.06944676 0.69805889 0.6685201  0.70558914]
 [0.50977764 0.72910723 0.42678105 0.98245785]]

Making Random Choices

We can use the choice() to make random choices by feeding an iterable with some values. For example, if we give it a list of 10 values, it can be used to randomly pick a value(s) from it.

print(np.random.choice(["a", "b", "c", "d", "e"]))

You can add a second parameter to choose how many times you want a random choice to be made.

print(np.random.choice(["a", "b", "c", "d", "e"], 5))

['c' 'e' 'b' 'd' 'e']

You can also use choice() to generate random integers by simply giving it a list or numbers, or even a single number as shown below. (It acts similar to randint() with one parameter)

print(np.random.choice(10))

You can also pass in a second parameter to return more values.

Using a Seed to generate Random Numbers

Numpy generates “random” data by using what we call a “seed”. A “seed” is a base value that is used to initialize a random number generator. Usually numpy (and other random number generators) use the system-time as a seed. It’s a good choice because it’s constantly changing and unique. This ensures that patterns are not repeated.

In certain applications however, you may wish to manually define a seed. An example of where this might be useful, is when multiple people are working on the same project. If they use a common seed, then they will be getting the same pattern, which will make debugging and collaboration easier.

You can observe the behavior of a seed here.

import numpy as np

np.random.seed(0)
print(np.random.rand(5))

np.random.seed(0)
print(np.random.rand(5))

[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]
[0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]

The output of both are the exact same, because we reset the seed after the first output. This is just to show you that the seed follows a predictive pattern.

This marks the end of the How to generate random data with numpy Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.