FastMDXplora usage examples
Worked examples for both the command-line interface (fastmdx) and the
Python API. Everything here runs the same four-phase pipeline — setup →
simulation → analysis → report — whether you drive it from one flag or a
hundred-run sweep.
A note on input: a system can be a PDB/CIF file path, a 4-character
PDB ID (fetched from RCSB, e.g. 1L2Y), or a one-letter sequence.
The form is auto-detected, so there is no separate --pdb-id flag.
Command-line interface
The simplest run
Run the whole pipeline on a structure file:
fastmdx explore --system protein.pdb
Or fetch a structure from the PDB by ID:
fastmdx explore --system 1L2Y
-s, -system, and --system are all equivalent — the single-dash
-system form matches the GROMACS/AMBER/NAMD convention. The xplore
alias works anywhere explore does, for the X-branding:
fastmdx xplore -s 1L2Y
Output lands in ./fastmdxplora_output_<timestamp>/ unless you set
--output.
Tuning the run with flags
Per-phase options are namespaced by phase (--setup-…, --simulate-…,
--analyze-…, --report-…):
fastmdx explore --system protein.pdb \
--output ./trpcage_study \
--setup-ph 7.4 \
--setup-ion-concentration-M 0.15 \
--simulate-duration-ns 100.0 \
--simulate-temperature-K 310.0 \
--simulate-platform CUDA \
--analyze-analyses rmsd rmsf rg cluster \
--report-title "Trp-cage at 310 K"
--simulate-duration-ns is production length; equilibration (NVT/NPT)
is independent and has its own defaults.
Choosing phases
Run only part of the pipeline with --include (allowlist) or
--exclude (denylist) — they are mutually exclusive:
# Only prepare and simulate; analyze later
fastmdx explore -s protein.pdb --include setup simulation
# Everything except the report
fastmdx explore -s protein.pdb --exclude report
# Convenience flag for the common case
fastmdx explore -s protein.pdb --no-report
Running a single phase
Each phase is also its own subcommand. Here the per-phase flags are bare
(no --simulate- prefix), since the phase is already chosen:
fastmdx setup --system protein.pdb --ph 6.5 --box-shape octahedron
fastmdx simulate --output ./trpcage_study --duration-ns 50.0 --platform CUDA
fastmdx analyze --output ./trpcage_study --analyses rmsd rg --selection "name CA"
fastmdx report --output ./trpcage_study --no-slides
Pointing later phases at the same --output lets them pick up the
artifacts the earlier phases wrote.
MD engine controls
Integrator, pressure (in bar or atm), GPU device, and checkpointing:
fastmdx explore -s protein.pdb \
--simulate-integrator langevin_middle \
--simulate-timestep-fs 2.0 \
--simulate-pressure-atm 1.0 \
--simulate-device-index 0 \
--simulate-checkpoint-interval-steps 5000
Supported integrators: langevin_middle (default), langevin,
brownian, verlet, variable_langevin, variable_verlet. Pressure can
be given as --simulate-pressure-bar or --simulate-pressure-atm; atm is
converted to OpenMM’s native bar (1 atm = 1.01325 bar).
Skipping PDBFixer
If you already have a prepared structure, skip the fixer:
fastmdx explore -s raw.pdb --setup-fixed-pdb prepared.pdb
Config files
For anything beyond a quick run, put it all in a YAML file. Generate a fully-commented template to edit:
fastmdx init-config # writes fastmdxplora.yml
fastmdx init-config --minimal -o study.yml # short starter
fastmdx init-config -o study.yml --force # overwrite an existing file
Then run it:
fastmdx explore --config study.yml # -c and -config also work
Flags still override the file, so you can reuse one config and tweak a value per invocation:
fastmdx explore --config study.yml --simulate-duration-ns 50
Preview without running — print the plan (runs, systems, swept values, output directories, phases) and exit:
fastmdx explore --config campaign.yml --dry-run
Every run writes resolved_config.yml — the exact merged configuration
that ran — so you can reproduce it later with
fastmdx explore --config resolved_config.yml.
Other commands
fastmdx info # versions, detected backends (OpenMM/PDBFixer), citation
fastmdx --cite # just the citation
fastmdx --version
Config file format
Input is always a systems: list — even for one system — so the file
looks the same shape whether you study one protein or a dozen.
A single study
# study.yml
systems:
- id: trpcage
system: trpcage.pdb # path, PDB ID, or sequence
output: ./trpcage_study
include: [setup, simulation, analysis, report]
setup:
ph: 7.4
ion_concentration_M: 0.15
simulation:
duration_ns: 100.0
temperature_K: 310.0
platform: CUDA
analysis:
include: [rmsd, rmsf, rg, cluster]
selection: "name CA"
options:
cluster:
methods: [kmeans, hierarchical]
n_clusters: 5
report:
title: "Trp-cage at 310 K"
With one system and no sweep, output uses the familiar flat layout
(trpcage_study/setup/, trpcage_study/simulation/, …).
Several systems
Add entries to the list. Each can carry its own per-phase overrides:
# compare.yml
systems:
- id: wildtype
system: wt.pdb
- id: mutant
system: mutant.pdb
setup: { ph: 6.5 } # this system only
output: ./comparison
simulation:
duration_ns: 50.0 # shared by all systems
A parameter sweep
A sweep: block varies parameters across runs. Each axis is a dotted
phase.option key mapped to a list of values; multiple axes form the
full cross-product:
# campaign.yml
systems:
- id: trpcage1
system: trpcage.pdb
- id: trpcage2
system: trpcage.pdb
setup: { ph: 6.5 }
output: ./trpcage_campaign
sweep:
simulation.temperature_K: [300, 310, 320]
simulation.pressure_bar: [1.0, 1.2]
This is 2 systems × 3 temperatures × 2 pressures = 12 runs. With more
than one run, each goes in runs/<id>/ and a top-level
batch_manifest.json indexes them all:
trpcage_campaign/
batch_manifest.json
runs/
trpcage1__temperature_K-300__pressure_bar-1.0/
trpcage1__temperature_K-300__pressure_bar-1.2/
...
Within each run, option precedence is: base config < per-system override < swept value.
Parallel execution
By default runs go one at a time. An execution: block runs several at
once:
execution:
mode: parallel # sequential (default) | parallel
workers: 2 # how many runs at once
devices: [0, 1] # GPU indices — one run pinned per device
continue_on_error: true
On GPU the safe pattern is one run per GPU: list your devices and each
worker is pinned to a distinct index. Don’t set workers higher than the
number of devices on GPU — oversubscribing one GPU is slower than running
sequentially. When workers is unset it defaults to one per device (GPU)
or the CPU count capped at the run count (CPU).
Python API
A single study
from fastmdxplora import FastMDXplora
fmdx = FastMDXplora(system="protein.pdb")
results = fmdx.explore()
for r in results:
print(r.name, r.status) # e.g. "setup ok", "simulation ok"
The recommended import alias mirrors the CLI name:
import fastmdxplora as fastmdx
fastmdx.FastMDXplora(system="1L2Y").explore()
With options and phase selection
options is keyed by phase; explore() takes include/exclude and a
report convenience flag:
from fastmdxplora import FastMDXplora
fmdx = FastMDXplora(
system="1L2Y", # fetched from RCSB
output_dir="./trpcage_study",
options={
"setup": {"ph": 7.4, "ion_concentration_M": 0.15},
"simulation": {"duration_ns": 100.0, "temperature_K": 310.0,
"platform": "CUDA", "integrator": "langevin_middle"},
"analysis": {"include": ["rmsd", "rmsf", "rg", "cluster"]},
},
)
results = fmdx.explore(include=["setup", "simulation", "analysis"])
run = results[0] # one study -> a list of one
print("run status:", run.status)
for phase in run.phases:
print(" ", phase.name, phase.status)
include/exclude/options can be set on the constructor or passed to
explore(); arguments to explore() take precedence.
explore() always returns a list of RunResult — a single study is a
list of one, a sweep is a list of many. Each RunResult carries
run_id, system, status ("ok"/"error"), output_dir,
sweep_values, and phases (the list of PhaseResult for that run).
The iteration idiom is the same no matter how many runs there are:
for run in results:
print(run.run_id, run.status)
for phase in run.phases:
print(" ", phase.name, phase.status)
Running a single phase
Each phase has a method that returns a PhaseResult:
from fastmdxplora import FastMDXplora
fmdx = FastMDXplora(system="protein.pdb", output_dir="./study")
setup_result = fmdx.setup(ph=6.5, box_shape="octahedron")
print(setup_result.status, setup_result.artifacts)
sim_result = fmdx.simulate(duration_ns=50.0, platform="CUDA")
fmdx.analyze(include=["rmsd", "rg"], selection="name CA")
fmdx.report(slides=False)
A PhaseResult carries name, status ("ok", "skipped", or
"error"), output_dir, artifacts, and a message.
Driving from a config file (one system or many)
A config file — single system, several systems, or a sweep — runs through
the same FastMDXplora(config=...).explore() interface. A single-system
config writes the flat layout; many runs go in runs/<id>/.
from fastmdxplora import FastMDXplora
# One study from a file
FastMDXplora(config="study.yml").explore()
# A whole campaign (systems × sweep) from a file — same interface
results = FastMDXplora(config="campaign.yml").explore()
for run in results:
print(run.run_id, run.status, run.sweep_values)
explore() returns the same list[RunResult] here as for a single
study — one element per run. Each carries run_id, system, status,
output_dir, sweep_values, and its phases.
Building a config in code
You don’t need a file on disk — pass a config dict directly with
config_data:
from fastmdxplora import FastMDXplora
config = {
"output": "./scan",
"include": ["setup", "simulation", "analysis"],
"systems": [
{"id": "trpcage", "system": "trpcage.pdb"},
],
"sweep": {
"simulation.temperature_K": [290, 300, 310, 320],
},
"execution": {"mode": "parallel", "workers": 2, "devices": [0, 1]},
}
results = FastMDXplora(config_data=config).explore()
n_ok = sum(r.status == "ok" for r in results)
print(f"{n_ok}/{len(results)} runs succeeded")
Previewing a run with --dry-run
To see exactly what a config will do — every run, its system, swept values, output directory, and the phases that will execute — without running anything, use a dry run. On the CLI:
fastmdx explore --config campaign.yml --dry-run
In Python, pass dry_run=True:
from fastmdxplora import FastMDXplora
planned = FastMDXplora(config="campaign.yml").explore(dry_run=True)
for run in planned:
print(run.run_id, run.sweep_values, "->", run.output_dir)
# run.status == "planned"; nothing was executed
A dry run prints the plan and returns a list[RunResult] with status
"planned" and no populated phases. Nothing is written to disk.
Cross-run comparison report
When a study has more than one run, FastMDXplora automatically builds a
comparison/ report at the batch root that aggregates the runs:
my_campaign/
batch_manifest.json
comparison/
overlay_rmsd.png # all runs' RMSD traces on one axes
overlay_rg.png
trend_rmsd.png # mean RMSD vs the swept parameter
trend_rg.png
comparison_summary.csv # one row per run, summary scalars
comparison_report.md # the written report
runs/
...
Nothing extra is required — running a sweep produces it:
fastmdx explore --config campaign.yml
For per-frame analyses (RMSD, Rg, Q-value, total SASA) it draws an
overlay of every run’s trace, and — when the sweep axis is numeric — a
trend of each run’s summary scalar against that axis. The
comparison_summary.csv is convenient for your own plotting:
import pandas as pd
df = pd.read_csv("my_campaign/comparison/comparison_summary.csv")
print(df[["temperature_K", "rmsd_mean", "rg_mean"]])
To turn the report off, set it in the config’s report block:
report:
comparison: false
You can also (re)build it — for instance after re-running some of the
runs, or for a batch that finished earlier — with compare():
from fastmdxplora import FastMDXplora
# Right after a run, compare() operates on the study just produced:
fmdx = FastMDXplora(config="campaign.yml")
fmdx.explore()
fmdx.compare()
# Or rebuild for an existing batch directory:
FastMDXplora(config="campaign.yml").compare(output_dir="my_campaign")
Reproducibility
Every run writes a resolved_config.yml capturing the fully-merged
configuration that actually executed (defaults + file + overrides). It is
itself a valid config, so feeding it back reproduces the run exactly:
fastmdx explore --config some_run/resolved_config.yml
For a batch, batch_manifest.json at the output root records every run,
its swept values, status, and output directory — the index for the whole
campaign.