AlphaFold is a deep learning-based protein structure prediction program developed by DeepMind. The software uses a neural network to predict the 3D structure of a protein from its amino acid sequence. Building on the successes of AlphaFold 2, which revolutionized the field by predicting protein structures with near-experimental accuracy, AlphaFold 3 introduces several new capabilities and enhancements aimed at expanding its applicability to complex biological problems.
We have a copy of AlphaFold 3 database available on Berzelius at /proj/common-datasets
for public use.
We specify the paths for AlphaFold database, AlphaFold model parameters and results. Due to Terms of Use limitations, you will need to obtain the model parameters yourself.
export ALPHAFOLD_DB=/proj/common-datasets/AlphaFold3
export ALPHAFOLD_MODEL=${ALPHAFOLD_DB}/model_parameters
export ALPHAFOLD_RESULTS=/proj/nsc_testing/xuan/alphafold_results_3.0.0
mkdir -p ${ALPHAFOLD_DB} ${ALPHAFOLD_MODEL} ${ALPHAFOLD_RESULTS}
mkdir -p ${ALPHAFOLD_RESULTS}/output ${ALPHAFOLD_RESULTS}/input
The test input alphafold_input.json
can be found on this page. Download and save it to ${ALPHAFOLD_RESULTS}/input
.
On Berzelius, we have AlphaFold 3 as a module.
On a compute node we load the AlphaFold module:
module load AlphaFold/3.0.0-hpc1
We run an example:
python ${ALPHAFOLD_PREFIX}/run_alphafold.py \
--db_dir=${ALPHAFOLD_DB} \
--json_path=${ALPHAFOLD_RESULTS}/input/alphafold_input.json \
--model_dir=${ALPHAFOLD_MODEL} \
--output_dir=${ALPHAFOLD_RESULTS}/output \
--run_inference=True
Please run python ${ALPHAFOLD_PREFIX}/run_alphafold.py --help
to check the usage.
On Tetralith, the GPU node’s local disk at /scratch/local
is 2 TB of NVMe SSD storage. You can copy the Database (0.6 TB) to /scratch/local
at the beginning of a job to improve I/O performance. On Tetralith, the AlphaFold database can be found at /proj/common_datasets/AlphaFold3
.
export ALPHAFOLD_DB=/proj/common_datasets/AlphaFold3
export ALPHAFOLD_DB_LOCAL=/scratch/local
cp -a ${ALPHAFOLD_DB}/* ${ALPHAFOLD_DB_LOCAL}
We run an example:
export ALPHAFOLD_MODEL=${ALPHAFOLD_DB}/model_parameters
export ALPHAFOLD_RESULTS=/proj/nsc/users/xuan/alphafold_results_3.0.0
module load AlphaFold/3.0.0-hpc1
python ${ALPHAFOLD_PREFIX}/run_alphafold.py \
--db_dir=${ALPHAFOLD_DB_LOCAL} \
--json_path=${ALPHAFOLD_RESULTS}/input/alphafold_input.json \
--model_dir=${ALPHAFOLD_MODEL} \
--output_dir=${ALPHAFOLD_RESULTS}/output \
--flash_attention_implementation=xla \
--run_inference=True
To make the best use of the GPU resources on Berzelius, we strongly suggest separating the CPU and GPU parts when running AlphaFold jobs. You should run the CPU part on Tetralith or your local computer, and then run the GPU part on Berzelius.
You need to set --norun_inference
in the command to run MSA and template searches only.
On Tetralith, the GPU node’s local disk at /scratch/local is 2 TB of NVMe SSD storage. You can copy the BFD subset (1.8 TB) to /scratch/local at the beginning of a job to improve I/O performance. On Tetralith, the AlphaFold database can be found at /proj/common_datasets/AlphaFold3
.
export ALPHAFOLD_DB=/proj/common_datasets/AlphaFold
export ALPHAFOLD_DB_LOCAL=/scratch/local
cp -a ${ALPHAFOLD_DB}/* ${ALPHAFOLD_DB_LOCAL}
We run an example:
export ALPHAFOLD_MODEL=${ALPHAFOLD_DB}/model_parameters
export ALPHAFOLD_RESULTS=/proj/nsc/users/xuan/alphafold_results_3.0.0
module load AlphaFold/3.0.0-hpc1
python ${ALPHAFOLD_PREFIX}/run_alphafold.py \
--db_dir=${ALPHAFOLD_DB_LOCAL} \
--json_path=${ALPHAFOLD_RESULTS}/input/alphafold_input.json \
--model_dir=${ALPHAFOLD_MODEL} \
--output_dir=${ALPHAFOLD_RESULTS}/output \
--flash_attention_implementation=xla \
--norun_inference
Transfer the CPU part results from Tetralith to Berzelius via your local computer.
Run the GPU part of the job on Berzelius.
You need to set --norun_data_pipeline
in the command. This will skip the MSA and template searches and proceed directly to the predictions. This stage requires the input JSON file to contain pre-computed MSAs and templates.
export ALPHAFOLD_DB=/proj/common-datasets/AlphaFold
export ALPHAFOLD_RESULTS=/proj/nsc_testing/xuan/alphafold_results_3.0.0/
module load AlphaFold/3.0.0-hpc1
python ${ALPHAFOLD_PREFIX}/run_alphafold.py \
--db_dir=${ALPHAFOLD_DB} \
--json_path=${ALPHAFOLD_RESULTS}/output/2pv7/2pv7_data.json \
--model_dir=${ALPHAFOLD_MODEL} \
--output_dir=${ALPHAFOLD_RESULTS}/output \
--norun_data_pipeline
HelixFold3 is developed to replicate the advanced capabilities of AlphaFold3. HelixFold3’s accuracy in predicting the structures of small molecule ligands, nucleic acids (including DNA and RNA), and proteins is comparable to that of AlphaFold3.
On a compute node we load the HelixFold3 module.
module load HelixFold3/73cd80b-hpc1
You can use the flag --run_feature_only
to separate the CPU and GPU parts of the job.
export INPUT_JSON_PATH=/proj/nsc_testing/xuan/helixfold3_results/input/demo_protein_ligand.json
export OUTPUT_DIR=/proj/nsc_testing/xuan/helixfold3_results/output
run_infer.sh --input_json ${INPUT_JSON_PATH} \
--output_dir ${OUTPUT_DIR} \
--run_feature_only False \
--infer_times 5 \
--diff_batch_size 1 \
--precision "fp32"
Guides, documentation and FAQ.
Applying for projects and login accounts.