AnalystPath

Render Node Utilisation

PandasHardSenior level~10 min

Problem

A render farm logs every job in a DataFrame `render_jobs` with columns `job_id`, `node_id`, `begin_at`, and `finish_at` (the last two are datetime strings). Jobs on the same node can overlap in time.

For each node, report two figures:
- `busy_hours`: the total number of whole hours the node was busy with at least one job. Overlapping jobs are merged so shared time is counted only once, and the merged total is then floored to whole hours.
- `peak_parallel_jobs`: the largest number of jobs that ran simultaneously at any instant.

Return columns `node_id`, `busy_hours`, `peak_parallel_jobs`, ordered by `node_id`.

Input data

Example rows — the live problem includes the full dataset.

render_jobs
job_idnode_idbegin_atfinish_at
15012024-03-10 08:00:002024-03-10 09:00:00
25012024-03-10 08:30:002024-03-10 10:30:00
35012024-03-10 11:00:002024-03-10 12:00:00
75012024-03-10 13:00:002024-03-10 15:30:00
45022024-03-10 09:00:002024-03-10 10:00:00

Expected output

Your answer should return 3 rows with the columns node_id, busy_hours, peak_parallel_jobs.

Starter code (Pandas (Python))

import pandas as pd

def render_node_utilisation(render_jobs) -> pd.DataFrame:
    # Your code here
    return render_jobs

Solve this Pandas question free

Write Pandas (Python) and run it instantly in your browser — even on your phone. No signup needed to try.

Solution & explanation

Create a free account to unlock the optimal solution, a step-by-step explanation, and the hidden test cases that grade your answer.

Sign up free to unlock

Related Pandas questions