#Tutorials
We are going to have a tutorial of the ALmoMD with an example of CuI.
But the actual implementation of ALmoMD with DFT is too demanding for the tutorial purpose due to the large amount of supercell calculations via DFT. Therefore, in this tutorial, we are going to have a pretrained MLIP model as a ground truth instead of DFT. To do that, you need REFER instead of DFT_INPUTS.
REFER: A directory containing the pretrained MLIP depolyed model (deployed-model_0_0.pth).
Accodingly, your ALmoMD inputs should be modified as below.
output_format : nequip # Make the pretrained NequIP model as the ground truth
E_gs : -0.142613646819514 # New corresponding reference potential energy (Energy of the geometry.in.supercell)
1) Split the aiMD trajectories into training and testing data. This can be implemented by a command of almomd utils split (# of testing data) (E_gs). In this practice, we will use 100 testing data.
almomd utils split 100 -0.142613646819514
It will create two files trajectory_train.son, trajectory_test.son, and a directory MODEL containing data_test.npz.
2) Create the training inputs for initial MLIP modes.
almomd init
It will create a directory of 300K-0bar_0 inside of MODEL.
cd MODEL/300-bar_0
You will find 3 training data (data-train_*.npz), and 6 NequIP inputs (input_*_*.yaml) and corresponding job scripts (job-nequip-gpu _*_*.slurm). This is because you assign 3 subsampling and 2 random initialization in input.in, leading to a total of 6 (=2*3) different MLIP models.
3) Submit your job scripts to train MLIP models.
sbatch job-nequip-gpu\_0.slurm; sbatch job-nequip-gpu\_1.slurm; sbatch job-nequip-gpu\_2.slurm; sbatch job-nequip-gpu\_3.slurm; sbatch job-nequip-gpu\_4.slurm; sbatch job-nequip-gpu\_5.slurm
4) When your training is done, you will get deployed MLIP models (depolyed-model_*_*.pth).
The active learning iterative loop in the ALmoMD consists of three major steps (MLIP exploration, DFT calculation, and MLIP training).
When you have MLIP models, the ALmoMD will explore the configurational space via MLIP-MD. This can be conducted by submit your job-cont.slurm.
sbatch job-cont.slurm
It will generate many files and directories. But, almomd.out, result.txt, and UNCERT/uncertainty-300K-0bar_*.txt are important files that users know.
1) almomd.out: It shows the overall process of the ALmoMD.
2) result.txt: It contains the testing results and their MLIP uncertainty at each active learning step.
3) UNCERT/uncertainty-300K-0bar_*.txt: It records the result of the MLIP-MD steps. You can recognize which MD snapshots are sampled.
When it samples all data, it will create a directory of CALC/300K-0bar_*, where all DFT inputs for the sampled snapshots are prepared.
In each iteration, you need to go into the most recent CALC/300K-0bar_*.
cd CALC/300-0bar_1
You need to submit all job scripts.
sbatch job-vibes_0.slurm; sbatch job-vibes_1.slurm; sbatch job-vibes_2.slurm; sbatch job-vibes_3.slurm; sbatch job-vibes_4.slurm; sbatch job-vibes_5.slurm; sbatch job-vibes_6.slurm; sbatch job-vibes_7.slurm; sbatch job-vibes_8.slurm; sbatch job-vibes_9.slurm; sbatch job-vibes_10.slurm; sbatch job-vibes_11.slurm; sbatch job-vibes_12.slurm; sbatch job-vibes_13.slurm; sbatch job-vibes_14.slurm; sbatch job-vibes_15.slurm
Once all DFT calculations are finished, go back to main directory where result.txt exists.
almomd gen
This will add all new DFT outcomes into the training data. The new training data, inputs, and corresponding job scripts are generated in the most recent MODEL/300K-0bar_*. Then, submit all job scripts.
sbatch job-nequip-gpu\_0.slurm; sbatch job-nequip-gpu\_1.slurm; sbatch job-nequip-gpu\_2.slurm; sbatch job-nequip-gpu\_3.slurm; sbatch job-nequip-gpu\_4.slurm; sbatch job-nequip-gpu\_5.slurm
When your training is done, you will get deployed MLIP models (depolyed-model_*_*.pth). Then, go back to MLIP exploration section to complete the loop.
By default ALmoMD only writes the DFT and training job scripts to disk — it does not submit them. You run every sbatch yourself. That is the safe default for a first-time user or a shared cluster.
If instead you want ALmoMD to chain the whole cont → DFT → train → cont → … loop on its own, uncomment four lines across three files. The chain is built out of two parts: each step auto-submits the jobs it just wrote, and each step uses SLURM job dependencies to kick off the next step once those jobs finish.
1) libs/lib_dft.py — submit the DFT jobs written by almomd cont:
# subprocess.run([inputs.job_command, job_script])
2) scripts/lib_run_dft_cont.py — after submitting the DFT jobs, submit job-gen.slurm with a SLURM dependency on them:
# if inputs.rank == 0:
# job_dependency('gen', inputs.num_calc)
3) libs/lib_train.py — submit the training jobs written by almomd gen:
# subprocess.run([inputs.job_command, job_script]);
4) scripts/lib_run_dft_gen.py — after submitting the training jobs, submit the next job-cont.slurm with a SLURM dependency on them:
# job_dependency('cont', inputs.num_mdl_calc)
All four need to be uncommented together. Enabling only some of them leaves the chain broken: e.g. if #4 fires but #3 didn’t submit anything, job_dependency has no job IDs to depend on and will pass an empty --dependency=afterany: to sbatch.
With all four enabled, you start the loop yourself exactly once:
sbatch job-cont.slurm
From there each iteration flows like this:
job-cont.slurm
↓ runs `almomd cont` (MLMD exploration)
↓ #1 sbatch job-vibes_*.slurm (DFT jobs for sampled snapshots)
↓ #2 sbatch job-gen.slurm --dep=DFT (chained)
job-gen.slurm
↓ runs `almomd gen` (ingest DFT, write NequIP inputs)
↓ #3 sbatch job-nequip-gpu_*.slurm (training jobs)
↓ #4 sbatch job-cont.slurm --dep=train (chained — next iteration)
next iteration's job-cont.slurm starts when training finishes
↓ ...
almomd cont may still need manual resubmissionsOn some clusters a single SLURM walltime is shorter than the MLMD sampling window (e.g. ~2 ns for our runs), so job-cont.slurm will be killed before the iteration’s sampling quota is reached. almomd cont is restart-safe (state lives in UNCERT/, TEMPORARY/, and result.txt), so you just resubmit it and MLMD resumes. The auto-chain (#1/#2) only fires once the quota is actually reached.