The Freja cluster is the replacement for Bi. This page outlines the key differences between Freja and Bi. It also documents some of the experiences from the pilot testing phase. If you have been using Bi before, the information here might help you in migrating your jobs to Freja.
Freja has 64 cores per compute node, four time the amount of Bi. There is also only 78 compute nodes in Freja, compared to the 641 nodes Bi had during the later half of it’s lifetime. If you have a working job configuration for Bi, you should make sure you take this into account by scheduling your smaller jobs on an appropriate number of cores instead of whole nodes.
Freja does not have hyper-threading, so there are no virtual cores. Bi did have hyper-threads that was enabled with --ntasks-per-core=2
. That option should NOT be used on Freja.
Freja is using Rocky Linux 9 (equivalent of RHEL9), vs Bi with CentOS 7 (equivalent of RHEL7). Those were released eight years apart, and there are more changes than can easily be enumerated, but it should feel very similar on the surface.
Node sharing is available, so you can run more than one job on a node. Considering the amount of cores available you should do that more often than not. See Scheduling policy on Freja.
You cannot use normal ssh NODENAME
to login to a node where you are running a job. Use jobsh -j JOBID NODENAME
instead.
For you as a user, the compiler environment should be very similar to Bi. But Freja is running a much newer operating system, so available software will differ somewhat and you should recompile your own.
Freja uses the Slurm job scheduling system, like earlier clusters at NSC. Below, we present some example of how to launch parallel jobs with different kinds of parallelization.
This is the simplest way of running. The job script below will launch the job on 4 compute nodes and you will get 64 MPI ranks per node (1 per core).
#!/bin/bash
# SBATCH -J jobname
# SBATCH -t HH:MM:SS
# SBATCH -N 8
...
mpprun binary.x
Use of the fat nodes counts towards fairshare usage at double the cost of normal nodes. Jobs not requesting fat nodes can be scheduled on fat nodes if no other nodes are available, but will then not be hit with the extra cost. This in contrast with Bi where requesting a fat node had no extra cost.
The “high” and “risk” qos classes no longer exist. Users of “risk” should use “low” that now have a 4h timelimit. Users of “high” are encouraged to test the boost-tools described below.
There is a new tool available to all users to change the priority of jobs themselves:
buildenv-intel/2023a-eb
.If you rely on the /esgf filesystem you will have to keep using it from Bi for a little while yet.
It is not yet possible to use Publisher from Freja. You can do the work on Freja and then publish the result from Bi for now.
Guides, documentation and FAQ.
Applying for projects and login accounts.