Using AlphaFold 3 on Berzelius

1. Introduction
2. Preparations
3. Running AlphaFold 3
- 3.1 Loading the Module
- 3.2 Running an Example
4. Best Practice of Running AlphaFold 3
- 4.1 On Tetralith
- 4.2 On Berzelius
5. AlphaFold 3 Alternatives
- 5.1 HelixFold3

1. Introduction

AlphaFold is a deep learning-based protein structure prediction program developed by DeepMind. The software uses a neural network to predict the 3D structure of a protein from its amino acid sequence. Building on the successes of AlphaFold 2, which revolutionized the field by predicting protein structures with near-experimental accuracy, AlphaFold 3 introduces several new capabilities and enhancements aimed at expanding its applicability to complex biological problems.

2. Preparations

2.1 AlphaFold 3 Genetic Databases

We have a copy of AlphaFold 3 Genetic Databases available on Berzelius at /proj/common-datasets for public use.

2.2 Setting the Paths

We specify the paths for AlphaFold 3 database, AlphaFold 3 model parameters and results. Due to Terms of Use limitations, you will need to obtain the model parameters yourself.

export ALPHAFOLD_DB=/proj/common-datasets/AlphaFold3
export ALPHAFOLD_MODEL=${ALPHAFOLD_DB}/model_parameters
export ALPHAFOLD_RESULTS=/proj/nsc_testing/xuan/alphafold_results_3.0.0
mkdir -p ${ALPHAFOLD_DB} ${ALPHAFOLD_MODEL} ${ALPHAFOLD_RESULTS}
mkdir -p ${ALPHAFOLD_RESULTS}/output ${ALPHAFOLD_RESULTS}/input

2.3 Downloading Test Data

The test input alphafold_input.json can be found on this page. Download and save it to ${ALPHAFOLD_RESULTS}/input.

3. Running AlphaFold 3

On Berzelius, we have AlphaFold 3 as a module.

3.1 Loading the Module

On a compute node we load the AlphaFold 3 module:

module load AlphaFold/3.0.1-hpc1

3.2 Running an Example

We run an example:

python ${ALPHAFOLD_PREFIX}/run_alphafold.py \
    --db_dir=${ALPHAFOLD_DB} \
    --json_path=${ALPHAFOLD_RESULTS}/input/alphafold_input.json \
    --model_dir=${ALPHAFOLD_MODEL} \
    --output_dir=${ALPHAFOLD_RESULTS}/output \
    --run_inference=True

Please run python ${ALPHAFOLD_PREFIX}/run_alphafold.py --help to check the usage.

4. Best Practice of Running AlphaFold 3

4.1 On Tetralith

On Tetralith, the GPU node’s local disk at /scratch/local is 2 TB of NVMe SSD storage. You can copy the Database (0.6 TB) to /scratch/local at the beginning of a job to improve I/O performance. On Tetralith, the AlphaFold 3 database can be found at /proj/common_datasets/AlphaFold3.

export ALPHAFOLD_DB=/proj/common_datasets/AlphaFold3
export ALPHAFOLD_DB_LOCAL=/scratch/local
cp -a ${ALPHAFOLD_DB}/* ${ALPHAFOLD_DB_LOCAL}

We run an example:

export ALPHAFOLD_MODEL=${ALPHAFOLD_DB}/model_parameters
export ALPHAFOLD_RESULTS=/proj/nsc/users/xuan/alphafold_results_3.0.0
module load AlphaFold/3.0.1-hpc1

python ${ALPHAFOLD_PREFIX}/run_alphafold.py \
    --db_dir=${ALPHAFOLD_DB_LOCAL} \
    --json_path=${ALPHAFOLD_RESULTS}/input/alphafold_input.json \
    --model_dir=${ALPHAFOLD_MODEL} \
    --output_dir=${ALPHAFOLD_RESULTS}/output \
    --flash_attention_implementation=xla \
    --run_inference=True

4.2 On Berzelius

To make the best use of the GPU resources on Berzelius, we strongly suggest separating the CPU and GPU parts when running AlphaFold 3 jobs. You should run the CPU part on a CPU node, then run the GPU part on a GPU node.

Running the CPU Part on a CPU Node

You need to set --norun_inference in the command to run MSA and template searches only.

We run an example:

export ALPHAFOLD_DB=/proj/common-datasets/AlphaFold3
export ALPHAFOLD_MODEL=${ALPHAFOLD_DB}/model_parameters
export ALPHAFOLD_RESULTS=/proj/nsc_testing/xuan/alphafold_results_3.0.0
module load AlphaFold/3.0.1-hpc1

python ${ALPHAFOLD_PREFIX}/run_alphafold.py \
    --db_dir=${ALPHAFOLD_DB} \
    --json_path=${ALPHAFOLD_RESULTS}/input/alphafold_input.json \
    --model_dir=${ALPHAFOLD_MODEL} \
    --output_dir=${ALPHAFOLD_RESULTS}/output \
    --flash_attention_implementation=xla \
    --norun_inference

Running the GPU Part on a GPU Node

You need to set --norun_data_pipeline in the command. This will skip the MSA and template searches and proceed directly to the predictions. This stage requires the input JSON file to contain pre-computed MSAs and templates.

export ALPHAFOLD_DB=/proj/common-datasets/AlphaFold3
export ALPHAFOLD_RESULTS=/proj/nsc_testing/xuan/alphafold_results_3.0.0/
module load AlphaFold/3.0.1-hpc1

python ${ALPHAFOLD_PREFIX}/run_alphafold.py \
    --db_dir=${ALPHAFOLD_DB} \
    --json_path=${ALPHAFOLD_RESULTS}/output/2pv7/2pv7_data.json \
    --model_dir=${ALPHAFOLD_MODEL} \
    --output_dir=${ALPHAFOLD_RESULTS}/output \
    --norun_data_pipeline

Run Multiple AlphaFold 3 GPU Jobs in Parallel

To achieve better GPU utilization, you can run several AlphaFold 3 GPU part jobs concurrently. See the example sbatch script, which demonstrates how to execute 5 GPU part jobs concurrently.

5. AlphaFold 3 Alternatives

5.1 HelixFold3

HelixFold3 is developed to replicate the advanced capabilities of AlphaFold 3. HelixFold3’s accuracy in predicting the structures of small molecule ligands, nucleic acids (including DNA and RNA), and proteins is comparable to that of AlphaFold 3.

Loading the Module

On a compute node we load the HelixFold3 module.

module load HelixFold3/705c297-hpc1

Running the CPU Part on a CPU Node

You can use the flag --run_feature_only to separate the CPU and GPU parts of the job.

export INPUT_JSON_PATH=/proj/nsc_testing/xuan/helixfold3_results/input/demo_protein_ligand.json
export OUTPUT_DIR=/proj/nsc_testing/xuan/helixfold3_results/output

run_infer.sh --input_json ${INPUT_JSON_PATH} \
--output_dir ${OUTPUT_DIR} \
--run_feature_only True \
--infer_times 5 \
--diff_batch_size 1 \
--precision "fp32"

Running the GPU Part on a GPU Node

run_infer.sh --input_json ${INPUT_JSON_PATH} \
--output_dir ${OUTPUT_DIR} \
--run_feature_only False \
--infer_times 5 \
--diff_batch_size 1 \
--precision "fp32"

Using AlphaFold 3 on Berzelius

1. Introduction

2. Preparations

2.1 AlphaFold 3 Genetic Databases

2.2 Setting the Paths

2.3 Downloading Test Data

3. Running AlphaFold 3

3.1 Loading the Module

3.2 Running an Example

4. Best Practice of Running AlphaFold 3

4.1 On Tetralith

4.2 On Berzelius

Running the CPU Part on a CPU Node

Running the GPU Part on a GPU Node

Run Multiple AlphaFold 3 GPU Jobs in Parallel

5. AlphaFold 3 Alternatives

5.1 HelixFold3

Loading the Module

Running the CPU Part on a CPU Node

Running the GPU Part on a GPU Node

User support

Getting access

Everything OK!

Self-service