Home
Registering a new item in the EMD¶
Registration for the EMD is available on the GitHub repository. Alternatively click on the relevant stage / cell in the diagram below to be directed to your required form. Full instructions are provided in the submission guide. To know what to expect after submission and what the actions mean, these are described in the What to expect page. For information on what is or is not allowed at each stage please consult the latest version of the specification.
EMD Structure Diagram¶
*This is a representation of the back-end file structure. You will only be required to fill in one form per stage.
Each stage will require a brief (human) review process before you can continue. You may click on any box to be taken to the GitHub submission inputs, or use the links provided in the submission guide.*
What is the Essential Model Documentation?¶
The EMD is a structured, machine-readable record of how climate models are built. For every model that contributes data to CMIP7, it captures the grids it runs on, the software components it uses, and how those components are coupled together.
It answers the questions that model output alone cannot: What resolution did this model actually run at? Is this the same ocean grid as that other model? Which components are interactive and which are prescribed?
What does it contain?¶
The EMD is organised into nine linked record types, assembled from the bottom up. Items marked with an asterisk (*) are the only ones you will be required to fill in for your EMD submission.
Grid cells *¶
Grid cells describe the horizontal geometry a model computes on — the shape, resolution, and number of cells in a 2D tile. A regular 1° latitude-longitude atmosphere grid and a tripolar ocean grid are each their own record, reused by any model that runs on them.
Vertical grids *¶
Vertical grids capture the layering in the third dimension — how many levels, what coordinate system (pressure, height, depth), and how thick each layer is.
Subgrids¶
Subgrids record where different physical quantities sit within a horizontal grid — mass variables, east-west velocities, and north-south velocities can each occupy a different stagger point on the same underlying cells. These are atumatically generated as part of the horizontal computational grids and can have multiple projects pointing to the same grid-variable pair.
Computational grids *¶
Computational grids consist of two pairs to describe a model. The horizontal compuatational grids assemble subgrids into the complete horizontal domain a model component actually uses, including the Arakawa grid arrangement that governs how variables are interpolated and exchanged. Vertical computational grids describe the vertical profile (columns) arrangement ontop of this.
Component families¶
Component families document the scientific lineage of a single-domain code base: who built it, what scientific domain it covers, and how it has evolved. Examples include NEMO (ocean), ARPEGE-Climat (atmosphere), and SURFEX (land surface). A component family is referenced by model components at Stage 3.
ESM families¶
ESM families document the lineage of a coupled Earth System Model — the broader multi-component system that institutions develop and maintain across generations. Examples include HadGEM3, CNRM-CM, and CESM. An ESM family is referenced by the final model record at Stage 4.
Both component families and ESM families are stored in the
model_familyfolder. They are distinguished by afamily_typefield:"component"or"model".
Model components *¶
Model components are specific versioned instances of a piece of model software: an atmosphere, ocean, sea-ice, or land-surface code at a specific version, with citable references. Each component references its component family.
Component configurations¶
Component configurations bind a component version to a specific horizontal and vertical grid, producing a fully-specified computational setup. The configuration ID encodes this directly — atmosphere_arpege-climat-version-6-3_h100_v100 is unambiguous and self-describing. Similar to the subgrids this is not a property most users will be engaging with directlt, but rather the items (e.g. model) describing it.
Models (Source ID) *¶
Models are the complete assembled system — listing every component configuration, declaring which earth system realms are active or prescribed, recording how components are coupled or embedded within one another, and referencing the ESM family the model belongs to.
Repository Structure?¶
The core principle¶
The backend structure of the EMD is organised around the idea that each thing that can vary independently gets its own record type. Information has a single definition that is referenced everywhere it is needed. This is what determines the folder and file structure. Most people will not be interacting with this side of the code, however for those that do want to understand it better, there will be a dev module added [here] once all work for this has completed.
Grids are split into four record types¶
Grid cells capture geometry alone — shape, resolution, extent. This is separated out first because grid geometry is genuinely independent of everything else. The same 1° tripolar ocean grid can be used by dozens of models across different institutions. Making it a shared record means a correction propagates automatically to every model that references it.
Subgrids capture where different physical variables sit within that geometry. On an Arakawa-C grid, mass variables, east-west velocities, and north-south velocities do not all sit at the same point on the cell — they are staggered. This is a fundamental property of how the discretisation works, and it determines how variables are interpolated and exchanged at component boundaries. Separating subgrid from grid cell means the staggering convention is recorded explicitly rather than implied.
Computational grids assemble a set of subgrids into the complete domain that a component actually runs on. This is separated from the cell and subgrid records because the same physical cells can be assembled differently — with different arrangements or different subsets of variable types — by different components.
Vertical grids are kept entirely separate from the horizontal records because vertical structure varies completely independently. A model can increase horizontal resolution without touching its 85-level atmosphere, or adopt a new vertical coordinate in the ocean while keeping the same horizontal grid. Combining vertical and horizontal into one record would force a new record every time either changed.
Together, the four grid types answer a question that is otherwise unanswerable from model output alone: exactly what computational space was this variable defined on?
The different kinds of family¶
The model_family folder holds records for both component families and ESM families, distinguished by a family_type field. They are stored together because they share the same structure — institution, lineage, scientific domains, references — but they serve different purposes in the hierarchy.
Component families (type: "component") group versions of a single-domain code base. NEMO as a family encompasses NEMO v3.6, v4.0, and so on. This lineage matters for understanding which model generations share numerical methods and parameterisations, and for tracking how a component evolved independently of whatever coupled model it was embedded in.
ESM families (type: "model") group configurations of a coupled Earth System Model across generations. HadGEM3 as a family encompasses HadGEM3-GC31-LL, HadGEM3-GC31-MM, and future successors. This lineage matters for genealogy analysis — understanding which models are structurally related and how the community's ensemble of models is distributed across independent development lines.
Keeping them in one folder with a type field, rather than two separate folders, reflects the fact that the distinction is about how the family is used, not about what a family record contains.
Components are split into two¶
Model components define a specific versioned instances of a piece of software: ARPEGE-Climat version 6.3, NEMO v3.6. They are separated from the family because different versions of the same component have different scientific properties and different citable references, and from the grid configuration because the same version can be deployed at different resolutions.
Component configurations are the binding: this component version, on this horizontal grid, on this vertical grid. This is the key reuse point. If two models both run NEMO v3.6 on the same 1° tripolar grid at 75 levels, they share the same component configuration record. That shared record is what makes the comparison between the two models computable rather than inferential.
Source / Model¶
The model (source_id) record does not describe any individual component or grid. It describes a system — which component configurations are present, which realms are active or prescribed, and how the components exchange information with one another. The coupling and embedding topology lives here because it is a property of the assembled system, not of any individual part.
The model record is deliberately thin. It holds IDs of component configurations and an ESM family reference, not copies of their content. If an ocean grid specification needs to be corrected, the correction happens in one grid cell record and is immediately reflected in every model that references it. The idea is that once registered, the model id can be used directly within a project. In another project wants to use a similar (but not identical) configuration, then a new source id would be required.
The design in one sentence¶
Each folder corresponds to a distinct scientific concept that varies independently of the others, so that any one aspect of a model configuration can be linked , shared, or be modified (corrected) without causing breaking errors or duplication.
Who does this affect?¶
-
Data users
The EMD makes that configuration machine-readable and linkable, rather than buried in prose documentation or spread across incompatible metadata fields. It enables programmatic filtering: find all models with a native ocean resolution finer than 0.5°, or all models where aerosols are interactive rather than prescribed. -
Scientists looking at model intercomparison
It makes structural similarity computable. Two models with identical coupling topologies can be identified automatically, not by manually reading papers. -
Software and infrastructure
The grid records are shared reference objects. Regridding tools, data pipelines, and visualisation systems can look up the exact grid specification rather than inferring it from the data itself. -
Reproducibility
A model record pinned to specific component versions and grid IDs is an unambiguous description of a configuration — enough to reconstruct what was run.
Become a reviewer¶
If you are likely to know even a little about any part of the EMD and are keen to help keep CMIP clear and accessible (usable), we are in need of people to help. Each review should take about 5 mintues, and we do not expect more than 30 minutes a week, hopefully quieting down after the first batch of publications. To apply use this link or the embedded form below.
References¶
- EMD Specification v1.1 — the complete specification with property definitions, controlled vocabularies, and worked examples
- CMIP7 Grid Guidance v1.0 — companion guidance on grid description
- GitHub repository — source data, issue tracker, and contribution workflow