Bucket Sort Algorithm in Python | Sorting Algorithms

In this article, we will be discussing the Python Bucket Sort Algorithm in complete detail. We will start with it’s explanation, followed by a complete solution which is then explained by breaking it down into steps and explaining each of them separately.

At the end we have also included a small comparison of the Python Bucket Sort with others similar algorithms. As a bonus, we also have a video explanation of the Python Bucket Sort Algorithm, which is included at the very end of the article.

Methodology

Bucket Sort is a rather unique sorting algorithm as the actual Bucket sorting has no comparisons between the values in the array. And also because it is not a stand alone sorting algorithm, and requires another to actually completely sort a list. This is because Bucket Sort is meant only to speed up the sorting progress, not actually sort the whole array.

It uses the concept of Buckets, into which values are placed after some calculations. The supporting algorithm is then used on each one of these Buckets, which are singe 1D lists.

Bucket Sort has a sister Algorithm called Radix Sort that is a standalone sorting algorithm, similar in nature and very fast at sorting large number of integers.

Best and Worst Case: Bubble sort works best when the values are uniformly spread out over a range of values. The more balanced out all the Buckets are, the better. If we end up with a skewed distribution where most values are concentrated in one Bucket, the efficiency of Bucket Sorting will drop.

Bucket Sort Algorithm

The code itself is fairly well commented, so it shouldn’t be too hard to understand. There is only one function you have to make, which can be divided into 4 sub sections. First you create a 2D list of Buckets. The number of Buckets created are equal to the number of elements in the array.

Next up is the actual Bucket sorting logic that we’ll explain later. After that is the part where we individually sort each Bucket, which is a single list of values. Finally we “flatten” the 2D list of buckets into a single list. We then return the sorted list back to the place it was called.

import random

def bucketSort(array):
    largest = max(array)
    length = len(array)
    size = largest/length

    # Create Buckets
    buckets = [[] for i in range(length)]

    # Bucket Sorting   
    for i in range(length):
        index = int(array[i]/size)
        if index != length:
            buckets[index].append(array[i])
        else:
            buckets[length - 1].append(array[i])

    # Sorting Individual Buckets  
    for i in range(len(array)):
        buckets[i] = sorted(buckets[i])


    # Flattening the Array
    result = []
    for i in range(length):
        result = result + buckets[i]
            
    return result


arr = [5, 4, 2, 7, 8, 55, 2, 1, 4, 5, 1, 2]
output = bucketSort(arr)
print(output)

Creating the Buckets

    # Create Buckets
    buckets = [[] for i in range(length)]

Using List Comprehension, we were able to quickly generate x number of buckets, where x is the number of elements in the array. This is to ensure that we have enough buckets to hold each value in the array, if they all turn out to be unique. (unique enough to require a different value bucket)

You can use other methods to create this 2D list, but List comprehension is one of the most simplest and fastest.

Placing Values in the Bucket

This is where the main Bucket Sort logic resides. Using the length, largest and size values that we calculated earlier, we determine which index the next value should go into as we iterate over the array, value by value.

index = int(array[i]/size)

The above line of code is responsible for determining the index by dividing the currently selected value in the array with the size. This then returns the index value of the bucket into which the value needs to be placed.

The way this algorithm determines the bucket index seems a bit odd, but it’s actually based of a system of ratios using the largest value in the array and the number of elements. For example, if the largest value is 100, and there are 10 values in total, there will be 10 buckets. Bucket1 will store values from 0 – 9, Bucket2 will store values from 10 – 19 and so on.

Sorting Individual Buckets

The entire goal of the initial pass of the Bucket Sort Algorithm, where we place the values in the buckets, is to create a bunch of smaller lists that we can sort easily. Typically, the time taken to sort a list grows exponentially according to the size of the list. Hence, sorting is always going to be faster on a bunch of small lists, rather than a large list. (This can vary with some sorting algorithms as they perform better on larger lists)

    # Sorting Individual Buckets  
    for i in range(len(array)):
        buckets[i] = sorted(buckets[i])

By this point in our code, we have a X number of buckets or “lists”. We will now use an appropriate sorting algorithm on each one of these buckets. For this tutorial, I used the built-in sorted() function in Python which is pretty fast. Another very popular choice, which you see being used in both Python and other languages, is insertion sort. It performs well on small lists, hence why we use it.

Flattening the Buckets

    # Flattening the Array
    result = []
    for i in range(length):
        result = result + buckets[i]

By this point, we have a fully sorted 2-D list. But now we need to convert this into a single 1D list. This process is known as flattening. There are many different ways of accomplishing this; the technique we used in the above code is just one of them. You can learn more about the various flattening techniques here if you are interested.

Bucket Sort Explanation

One slight thing to keep in mind is that Python, (and other languages) will “truncate” the decimal part of the value when converting float values to int. For example, in the Bucket Sort code, int(array[i]/size) will return only int values. If it returns 4.1 for instance, the int() will convert it to 4.

Pictured below is the list of numbers that we will be for our Bucket Sort Demonstration.

Now let’s imagine we just ran the Bucket Sort function and are at the very beginning. We’ll first find the largest value which is 97 and store it in largest. Using len() we will find the length which is 10, and store it in length. Then we calculate size by dividing largest and length.

    for i in range(length):
        index = int(array[i]/size)
        if index != length:
            buckets[index].append(array[i])
        else:
            buckets[length - 1].append(array[i])

Now we’ll begin the actual bucket sorting. I’ve included the code in question above for reference.

Let’s take the first value 30, and divide it by the size, which is 9.7 (97 divided by 10). 30 / 9.7 returns 3.09 which then gets converted to 3. This means that the value 30 will be stored in Bucket Index 3, also known as Bucket 4.

The next value is 17. 17 / 9.7 returns 1.75, which then gets converted to an integer value of 1. This goes into Bucket number 2. Remember that indexing starts from 0, hence the Bucket at index 1, is actually Bucket 2.

Next is the value 97, which also happens to be the largest value in the array. 97 / 9.7 gives us 10, which causes a slight problem as Bucket 11 does not exist. For this we have an extra case setup, where if index == length we store it in the last bucket by giving it the index of length - 1.

In this manner we sort all the values into their appropriate Buckets. We’ve already discussed any possible special cases, so we’ll skip the rest of the values. We are now left with a 2D list, that has been visualized in the below image.

As you can see, it is mostly sorted except for the Buckets which have more than one value in them. This is why we have to run another sorting algorithm on each separate list in the 2D list.

There isn’t much point in discussing this any further as what happens in the next sorting step depends on the sorting algorithm used to sort the individual lists. The Insertion Sort Algorithm is a pretty popular choice that you can learn here on our site.

Video Tutorial

Here is a Video Version of the Python Bucket Sort Algorithm. It’s able to go a little more in depth and break down the steps a little better due to the Video format. So if you had trouble understanding the Python Bucket Sort Algorithm here, you might want to check this out.

This marks the end of the Python Bucket Sort Algorithm Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.