Statistical functions are used in many popular applications such as Microsoft Excel. Pandas are used to represent and manipulate data in the form of tables too, so learning how to use these functions is a must. Since Python Pandas does not have an explicit COUNTIF() function, we will explore the alternate ways by which we can achieve the same results.
We will be using “data.txt” as a sample text file throughout this tutorial. We have already covered how to read data from a text file in a previous article.
The following image shows the contents of data.txt:
Name Age Section
1 Ada 15 C
2 Sara 23 D
3 Dante 18 D
4 Oz 15 B
5 Oscar 23 A
6 Johan 19 C
7 Alice 15 A
8 Jack 25 B
9 Gilbert 15 B
10 Xai 20 A
How to use COUNT() in Python Pandas:
Before showing how to use COUNTIF() in Python Pandas, we will first cover how to unconditionally count records. This is similar to the COUNT() function in MS Excel.
The count()
method can be used to count the number of values in a column. By default, it returns a Pandas Series containing the number of non-NA values each column or row contains. NA values are also known as None, NaN, NaT. Another NA value is numpy.inf
which can be found in the numpy
library.
However, we usually don’t need separate counts of all the columns, it will be much simpler to calculate and display the count of only one column. We can use indexing and pass the name of the primary field/column so that only one counter is returned. We passed the primary field because that is the field with the least chance of containing NA values. For this tutorial, we will use the “Name” field as the primary field.
import pandas as pd
df = pd.read_csv("data.txt", sep=" ")
counter = df.count()["Name"]
print(counter)
Output:
10
count()
also has several optional parameters, one of them being axis
. Passing axis="columns"
will allow you to count the number of fields in the Pandas DataFrame. Similar to the previous example, this will return a Pandas Series, so we will use the index number to return the counter of a specific row. In the following code, we return the number of fields of the first row.
import pandas as pd
df = pd.read_csv("data.txt", sep=" ")
counter = df.count(axis="columns")[1]
print(counter)
Output:
3
How to use COUNTIF() in Python Pandas:
Now, we will demonstrate how to conditionally count fields in a Pandas DataFrame. Everything we learned from unconditional counting applies to conditional counting, with the only difference being how we use indexing to select certain records that meet a certain criteria.
count() method:
If we pass a condition into the square brackets in DataFrame[]
, it will return a DataFrame where the condition is true in all records. As an example, let’s pass df["Age"] == 15
and see the resultant output. All records in the original DataFrame will be scanned to see if the value of “Age” is 15, and if true, the record will be stored in a new DataFrame.
df = pd.read_csv("data.txt", sep=" ")
new_df = df[ df["Age"] == 15 ]
print(new_df)
Output:
Name Age Section
1 Ada 15 C
4 Oz 15 B
7 Alice 15 A
9 Gilbert 15 B
As shown, the result is a DataFrame where all records have a value of 15 in the “Age” field.
The only final change we need to make is to use the count()
method and view the primary field’s counter using indexing.
df = pd.read_csv("data.txt", sep=" ")
counter = df[ df["Age"] == 15 ].count()["Name"]
print(counter)
Output:
4
groupby() method:
This is used to group multiple chunks of data under the same column label. Passing the column label as a parameter and using the count()
method will return a DataFrame that counts the number of values in the other columns for each record. To simplify the output, we will pass the name of the primary field as an index so that only one Series is returned.
Finding the number of people in each section can be accomplished by grouping data under “Section”.
df = pd.read_csv("data.txt", sep=" ")
df_group = df.groupby("Section")["Name"].count()
print(df_group)
Output:
Section
A 3
B 3
C 2
D 2
Name: Name, dtype: int64
Finding the number of people of each age can be accomplished by grouping data under “Age”.
df = pd.read_csv("data.txt", sep=" ")
df_group = df.groupby("Age")["Name"].count()
print(df_group)
Output:
Age
15 4
18 1
19 1
20 1
23 2
25 1
Name: Name, dtype: int64
Multiple Conditions in COUNTIF()
To use multiple conditions in Pandas, you can simply add extra conditions and use Logic Operators (such as AND, OR) to define the relationship between conditions. Additionally, enclose each statement in brackets so that Python can differentiate between separate conditions.
Returning a DataFrame that contains records with “Age” equal to 15 AND “Section” equal to “B” can be achieved by the following code.
df = pd.read_csv("data.txt", sep=" ")
new_df = df[(df["Age"] == 15) & (df["Section"] == "B")]
print(new_df)
Output:
Name Age Section
4 Oz 15 B
9 Gilbert 15 B
Returning a DataFrame that contains records with “Age” greater than 20 OR “Section” equal to “C” can be achieved by the following code:
df = pd.read_csv("data.txt", sep=" ")
new_df = df[(df["Age"] > 20) | (df["Section"] == "C")]
print(new_df)
Output:
Name Age Section
1 Ada 15 C
2 Sara 23 D
5 Oscar 23 A
6 Johan 19 C
8 Jack 25 B
This marks the end of the “How to use COUNTIF() in Python Pandas” Tutorial. Any suggestions or contributions for CodersLegacy are more than welcome. Questions regarding the tutorial content can be asked in the comments section below.