Inputs & Outputs

Before using the tool, it would be helpful to define all the input arguments available to the user to enrich their estimations and the final outputs (columns of the returned datasets).

Input Arguments

Argument	Description
`StartDate`	The first date of the range to process jobs for, in `YYYY-MM-DD`. Default: January 1st of the current year e.g. `2025-01-01`
`EndDate`	The final date of the range to process jobs for, in `YYYY-MM-DD`. Default: The current date.
`JobIDs`	Comma-separated list of all the HPC job IDs to filter on (e.g `"id1245,id6789"`) Default is “all_jobs”, which processes all the jobs ran in the specified date range.
`Region`	UK region of the HPC cluster you are using, needed for carbon intensity data. This is used to retrieve realtime carbon intensity data from the NESO Carbon Intensity API corresponding to job start times. Options: `'North Scotland'`, `'South Scotland'`, `'North West England'`, `'North East England'`, `'Yorkshire'`, `'North Wales'`, `'South Wales'`, `'West Midlands'`, `'East Midlands'`, `'East England'`, `'South West England'`, `'South England'`, `'London'`, `'South East England'`. Default: `'UK_average'` which was 124 gCO2e/kWh in 2024.
`Scope3`	Option to include scope 3 (embodied) emissions estimates as well as scope 2 in the output. This feature is only available to a few HPC systems which have undergone lifecycle assessments to obtain a per node-hour scope 3 emissions factor. Options: `Isambard3`, `IsambardAI`, and `Archer2` (see here). You may also specify a custom numeric value in gCO2e/node-hour for other HPC systems if these values are available (e.g. `51`). Default: `no_scope3` which means only scope 2 (operational) emissions will be calculated and included in the output.
`CSV`	Save the final datasets to CSV file for further analysis elsewhere. Options: `full`: Entire dataset (all jobs) with all columns (see below.) `full_summary`: entire dataset with summary columns only. `daily`: dataset aggregated by day with all columns. `daily_summary`: dataset aggregated by day with summary columns only. `total`: dataset aggregated over all total jobs with all columns. `total_summary` : dataset aggregated over all total jobs with summary columns only. `all`: all of the above datasets saved to CSV files.

Output Data

Below are the columns returned in the outputted datasets once the tool has been run. These are not stored if you are using the command-line interface, unless the user has specified the data to be saved to CSV. The following pages describe how to use the tool and store outputs for each user mode:

SLURM-Extracted Job Data columns

Column Name	Description
`Job_ID`	Unique identifier for your job ran on the HPC system.
`UserID`	Unique numerical ID of the job submitter.
`UserName`	Username of the person who submitted the job.
`PartitionName`	Name of the partition the job was submitted to.
`PartitionCategory`	Processor type of the partition (`CPU` or `GPU`). This is provided by the user in `hpc_config.yaml`.
`NameofJob`	Job name given by the user when submitted.
`SubmissionTime`	Date and time the job was submitted in `%Y-%m-%d %H:%M:%S` format (Datetime).
`StateCode`	Numeric code representing the job status (1 = completed/successful, 0 = failed).
`TotalNodes`	Total number of nodes allocated to the job(s).
`CPUsAllocated`	Total number of CPU cores allocated to the job(s).
`GPUsAllocated`	Total number of GPUs allocated to the job(s).
`ElapsedRuntime`	Total wallclock runtime of the job(s).
`CPUusagetime`	The actual CPU time consumed by the job(s), summed across all CPUs (measured time CPUs were actively processing). As a timedelta object `D days HH:MM:SS`.
`CPUwalltime`	Estimated CPU time (NCPUs * `ElapsedRuntime`). The max CPU time if all cores were 100% utilised. As a timedelta object `D days HH:MM:SS`.
`GPUusagetime`	Estimated GPU usage time (`GPUsAllocated` * `ElapsedRuntime`). Assumes 100% GPU utilisation due to a lack of measured GPU usage data available from SLURM. As a timedelta object `D days HH:MM:SS`.
`RequestedMemoryGB`	Amount of memory requested by the user at job submissions (in GB).
`UsedMemoryGB`	Amount of memory actually used by the job(s) (in GB).
`RequiredMemoryGB`	The estimated minimum amount of memory required for the job(s) to run (in GB).
`NodeHours`	Calculated total node-hour usage (in hours).
`WorkingDirectory`	File system path where the job was run from. e.g. `/lus/lfs1aip1/home/d5c/eayliffe.d5c/job`

Energy Data

Column Name	Description
`EnergyIPMI_kwh`	Total energy consumed (kWh) by the job(s) measured by hardware energy/power counters (e.g. IPMI or RAPL). This is only logged if energy counters are available on the HPC system
`energy_estimated_kwh`	Total energy consumed (kWh) by the job(s) including the datacenter overhead (PUE factor). This is estimated from usage data and TDP values supplied by the user in `hpc_config.yaml`
`energy_estimated_noPUE_kwh`	Total energy consumed by the job(s) without the datacenter overhead (PUE) applied (usage-based estimate) This is for valid comparison with energy counters.
`CPU_energy_estimated_kwh`	Energy consumed by CPUs (usage-based estimate).
`GPU_energy_estimated_kwh`	Energy consumed by GPUs (usage-based estimate).
`memory_energy_estimated_kwh`	Energy consumed by memory (usage-based estimate).
`required_memory_energy_estimated_kwh`	Energy consumed by memory if only the required memory was allocated (usage-based estimate).
`energy_requiredMem_estimated_kwh`	Total energy consumed (kWh) by the job(s) if only the required memory was allocated.
`failed_energy_kwh`	Energy consumed by failed jobs only.

Carbon Emissions Data

Column Name	Description
`CarbonIntensity_gCO2e_kwh`	Carbon Intensity at the time of job submission for the selected `Region` (in gCO2e/kWh). Retrieved from the carbon intensity API at the time of job submission. This is averaged over all jobs.
`Scope2Emissions_gCO2e`	Scope 2 (operational) emissions calculated using estimated energy (in gCO2e).
`Scope2Emissions_IPMI_gCO2e`	Scope 2 (operational) emissions calculated using measured energy (energy counters).
`Scope3Emissions_gCO2e`	Scope 3 (embodied) emissions estimated for the job(s). Only shows if the `Scope3` argument is set.
`Scope2Emissions_requiredMem_gCO2e`	Scope 2 emissions produced if only the required memory had been allocated.
`Scope2Emissions_failed_gCO2e`	Scope 2 emissions associated with the failed jobs only.
`TotalEmissions_gCO2e`	Total carbon emissions in gCO2e (scope 2 + scope 3). This includes counter-based scope 2 emissions if energy counters are available, and usage-based estimates if they aren’t.

Equivalents for User Interest

These data are provided as approximate values intended to help contextualise the impact of the user’s computational carbon footprint. See the Methodology for sources and assumptions for these calculations.

Column Name	Description
`Cost_GBP`	The approximate electricity cost (in British pounds) of running the job(s) based on the value supplied in `hpc_config.yaml`.
`driving_miles`	The equivalent number of miles driven by an average UK car (miles).
`tree_absorption_months`	The months for one tree to absorb the total amount of CO2e produced (months).
`uk_houses_daily_emissions`	Equivalent number of UK household’s daily emissions from electricity use.
`bris_paris_flights`	Equivalent number of flights from Bristol to Paris.