AnalystPath

Top Percentile Return Rates by Region

PandasMediumMid level~10 min

Problem

The `returns` DataFrame holds the return rate of each product line within a sales region: `line_id`, `region`, and `return_rate` (a decimal between 0 and 1, higher is worse).

The quality team wants the worst offenders within each region. For every region, find the product lines whose return rate sits in the top 5 percentile of that region, where 'top 5 percentile' means a percent-rank of at most 0.05 when the lines are ranked from highest return rate to lowest within the region. Percent-rank here is the SQL `PERCENT_RANK` definition: (rank - 1) / (count - 1), with ties sharing the lowest rank, and a single-row region having percent-rank 0.

Return `line_id`, `region`, and `return_rate`, ordered by `region` ascending, then `return_rate` descending, then `line_id` ascending.

Input data

Example rows — the live problem includes the full dataset.

returns
line_idregionreturn_rate
1Andes0.92
2Andes0.68
3Andes0.17
4Baltic0.94
5Baltic0.81

Expected output

Your answer should return 4 rows with the columns line_id, region, return_rate.

Starter code (Pandas (Python))

import pandas as pd

def top_percentile_returns(returns) -> pd.DataFrame:
    # Your code here
    return returns

Solve this Pandas question free

Write Pandas (Python) and run it instantly in your browser — even on your phone. No signup needed to try.

Solution & explanation

Create a free account to unlock the optimal solution, a step-by-step explanation, and the hidden test cases that grade your answer.

Sign up free to unlock

Related Pandas questions