A spectrum is not a structure. NMR signals, mass-spec peak lists, IR absorbance traces — these are time-series or peak-list data, often associated with a structure but not derivable from it. JCAMP-DX (1988), mzML (2008), and NMReDATA (2017) are the load-bearing carriers in this ecosystem; the rest either ride them or substitute for them in a specific instrument vendor's world. This is the third of three articles — the first covered line notations, the second covered structural file formats.
Formats covered
Domains, not eras
Spectroscopy splits cleanly by physical method. Each domain has its own metadata vocabulary, its own peak-detection conventions, its own software ecosystem. The format families track those splits.
- NMR — JCAMP-DX, nmrML, NMReDATA. Time-domain FIDs, frequency-domain spectra, peak lists with chemical shifts and couplings.
- Mass spectrometry — mzML, mzData (legacy), imzML (imaging MS). Centroided peak lists, profile spectra, MS/MS hierarchies.
- Optical (IR / Raman / UV-Vis) — JCAMP-DX is the dominant interchange; vendor-specific binary formats (Bruker OPUS, Thermo SPA) are what the instruments actually emit.
- Generic / multi-method — AnIML aims to cover everything; adoption varies by instrument vendor.
A chronological tour
Each format gets a numbered block with the domain it primarily targets, what it adds, and where it overlaps with the structural-formats article.
1. JCAMP-DX — 1988
Joint Committee on Atomic and Molecular Physical data — Data eXchange. IUPAC-standardised. ASCII; key/value "LDR" (labelled-data record) lines like ##TITLE=, ##XYDATA=, then the compressed numeric block. Originally for IR; extended to NMR, MS, Raman, UV-Vis. Still the most-portable spectroscopy format.
What it adds: a single ASCII container that every spectroscopy package on every platform has read-support for, going back to 1988.
2. ANDI / netCDF MS — 1995
Analytical Data Interchange. Built on netCDF (binary, self-describing). Targeted GC/LC-MS interchange; widely supported in the late 1990s, displaced by mzData and then mzML in the 2000s. Still in vendor software for legacy reasons.
What it adds: first widely-deployed binary format for MS interchange. netCDF gave it good random-access and large-file performance.
3. mzData — 2003
Proteomics Standards Initiative (PSI), 2003. XML; first PSI-MS attempt at a unified MS interchange format. Superseded by mzML in 2008 — the PSI explicitly deprecated it.
What it adds (historically): first vendor-neutral PSI format. Worth knowing because legacy tooling still emits mzData; everything new should target mzML.
4. imzML — 2007
Imaging mass spectrometry. Two-file format: an XML metadata file plus a binary .ibd with the per-pixel spectra. Designed for MALDI imaging where each pixel is its own MS spectrum and a single experiment is gigabytes.
What it adds: imaging-shaped MS data — a 2D grid of spectra, indexed efficiently. mzML doesn't scale to MALDI imaging; imzML does.
5. mzML — 2008
PSI-MS standard for mass spectrometry. XML; replaces mzData. Self-describing via the PSI-MS controlled vocabulary (every detector, ionisation method, scan polarity gets a CV term). Supported by every major MS vendor and every open-source tool (OpenMS, MSnbase, ProteoWizard).
What it adds: a single CV-driven schema for every MS data type — proteomics, metabolomics, top-down, bottom-up. The format the field actually agreed on.
6. AnIML — 2011
Analytical Information Markup Language. ASTM-standardised. XML; broader scope than nmrML or mzML — aims to cover every analytical instrument, with technique-specific extensions. Adoption is uneven; most popular in process-analytics labs.
What it adds: a single schema spanning multiple techniques in one file (e.g. an HPLC run with both UV and MS detectors).
7. nmrML — 2014
MetaboLights / COSMOS-NMR standard. XML. Carries raw FID, processed spectrum, peak list, and structure references. Built on the same controlled-vocabulary pattern as mzML — the nmrCV defines every experiment type, pulse program, and acquisition parameter.
What it adds: a CV-driven, validatable, structure-aware NMR format. The replacement for JCAMP-DX in metabolomics archives.
8. NMReDATA — 2017
NMR Extended Data — Pupier et al., 2017. SDF-compatible: an SDF tag stack (NMREDATA_VERSION, NMREDATA_ASSIGNMENT, NMREDATA_1D_1H, etc.) carrying assigned NMR peak lists alongside the structure. Designed for journal supplementary information — a single SDF file ships a structure plus its full NMR characterisation.
What it adds: the NMR / structure pair in one file, in a format every cheminformatics toolkit already reads. Used by Wiley, Elsevier, and ACS for SI submission.
Side-by-side
Per-domain coverage at a glance. "Carrier" means the file holds the structure too; "Reference" means the file points at an external structure file.
| Format | Year | Domain | Structure link | What it adds |
|---|---|---|---|---|
| JCAMP-DX | 1988 | NMR / IR / MS / Raman | External (or embedded ASCII) | Universal ASCII spectroscopy interchange |
| ANDI | 1995 | MS | External | Binary netCDF; first widely-supported MS format |
| mzData | 2003 | MS | External | PSI's first XML attempt; deprecated |
| imzML | 2007 | MS imaging | External | Per-pixel spectra; gigabyte-scale MALDI |
| mzML | 2008 | MS | External | PSI-MS standard; CV-driven schema |
| AnIML | 2011 | Multi | External | ASTM; spans multiple techniques in one file |
| nmrML | 2014 | NMR | External / inline Mol | CV-driven NMR; replacement for JCAMP in metabolomics |
| NMReDATA | 2017 | NMR | Carrier (SDF) | Structure + assigned NMR in one SDF file |
Bringing them together
Spectroscopy data is the third axis of fragmentation. The same compound, characterised in three labs, ends up as a JCAMP-DX in one archive, an nmrML in a second, and an NMReDATA SDF in a third — and a search by structure on any of those three pulls a different subset of the literature.
chempirical's plan: spectra are first-class records in the same graph-shaped engine. A spectrum points at the molecule it characterises; the molecule indexes its known spectra. Imports normalise JCAMP-DX, nmrML, NMReDATA, and mzML into a common peak-list representation; exports re-emit to whichever format the caller needs. The chemistry library that runs in the browser will ship a small JCAMP-DX parser too — paste a JCAMP file into the search box, find the molecule it describes, view neighbours.