Skip to content

CMIP7 Guidance for Data Users

This page is designed to inform users of climate model outputs on key CMIP7 concepts and tools. It is a landing page to redirect them to the proper resources to learn more.

1. Accessing CMIP7 data

CMIP7 model output is available through a distributed data archive developed and operated by the Earth System Grid Federation (ESGF). The data are hosted on a collection of nodes located at centres across the world.

Understanding ESGF Nodes

ESGF is a collaboration of groups, agencies and institutions around the world, that are dedicated to the development and operation of a long-term system for the management, access and analysis of climate data. The ESGF architecture is based on a system of autonomous and distributed Nodes. Data is hosted on a collection of nodes located at modelling centres or data centres across the world. Nodes exchange information about their data holdings and services, trust each other for registering users and establishing access control decisions. The net result is that a user can use a web browser or rich desktop client, connect to any Node, and seamlessly find and access data throughout the federation.

More documentation on CMIP nodes is available here.

ESGF data usage and publication metrics can be found on the CMCC dashboard.

There are 3 options to access the data:

  1. MetaGrid (LLNL, DKRZ, ORNL, CEDA)

    Web interface to search and download ESGF data. It provides access through http downloads, wget scripts, OPeNDAP URLs and Globus transfers. It is most useful for browsing and downloading a small number of files. The data can be accessed through any of the CMIP7 web interfaces linked above, which enable users to search across the entire distributed archive as if it were all centrally located.

  2. Using a python package

    For larger queries, it might be more appropriate to automate the search and downloads. A few packages are available to do this:

  3. Alternative Access Platforms

    While all published CMIP7 data is available from ESGF, some of it is additionally hosted in non-ESGF storage facilities. Below are links to some of these replicas. If you know of another place CMIP data is currently being stored, please submit this form to let us and the community know!

    • COMING SOON

    For all non-ESGF data access routes, we encourage users to verify that the data used is the latest version.

2. Terms of use and citations requirements

To enable modelling groups and others who support CMIP7 to demonstrate its impact (and secure ongoing funding), you are required to cite and acknowledge those who have made CMIP7 possible. Some best practices are also recommended.

Mandatory

  1. Cite the specific dataset(s) used.

    For each model whose data is used, please include a citation in the form of:

    Authors/Data Creators (publication year): Title. Version YYYYMMDD. Earth System Grid Federation. DOI.
    

    e.g.

    Swart et al. (2019): CCCma CanESM5 model output prepared for CMIP6 ScenarioMIP. Version 20190429. Earth System Grid Federation. https://doi.org/10.22033/ESGF/CMIP6.1317.
    

    Please include a table with at minimum the models ("sources"), institutions, and data citations as above, as well as a data availability statement pointing to the table and acknowledging ESGF. If the journal has a citation limit, putting the table in the Supporting Information is acceptable.

    How to find the DOI and the version?

    🤖 You can use the python library CMIPcite. Input tracking_id(s), dataset PID(s) or file paths(s) to retrieve the citation (textually or in the bibtex format).

    🖱️ You can also do the work on CMIPcite by hand: Take the tracking_id global attribute of a file and append it to http://hdl.handle.net/ (e.g., http://hdl.handle.net/hdl:21.14100/be06a059-363d-47a4-97a2-d5253190fd15). From there, you can follow "The file is part of the following aggregation(s)" and find the DOI and version of the dataset.

    🔍 You can also search databases. The Citation Search can help you find the DOI (but not the version). MetaGrid results indicate the version and have a citation tab.

    Note that there are two citation granularities (MIP-model-experiment data and MIP-model data) and that the version has to be added separately as it is not included in the DOI.

    Further information on the data citation concept is described in Stockhause and Lautenschlager (2017).

    Do I cite the model based on activity or experiment?

    For each participating model, data citations exist at two high-level aggregations: activity and experiment. The activity-level citation refers to all experiments defined by that activity (i.e., MIP) for a given model. While data citation is mandatory, data users can choose which of these aggregation levels should be cited. It is recommended that:

    • Wherever possible cite the experiments used (e.g., historical), using the experiment-level data citations.
    • If this isn't feasible (e.g. limitations of the number of data citations imposed by a journal), use the activity-level data citation.

    Experiment-specific data citations are useful for tracking the impact of specific CMIP7 experiments, which is valuable information for modelling centres and planning of future CMIP phases. However it is recognized that their finer granularity could result in an unmanageable number of references for data users, journals may limit the number of references, and a balance between the number of data and literature citations is desirable in scholarly publications.

  2. Acknowledge CMIP7.

    In the Acknowledgment section, insert the following text:

    We acknowledge the World Climate Research Programme's Coupled Model Intercomparison Project contributors who coordinated and promoted CMIP7. We thank the climate modelling groups for producing and making available their model output, the Earth System Grid Federation (ESGF) for archiving the data and providing access, and the multiple funding agencies who support CMIP7 and ESGF.

  3. Adhere to the license

    Adhere to the license conditions listed in the global attribute of each dataset.

  1. Cite the CMIP literature

    To provide more context, we recommend citing relevant articles. For example:

    • Overview of the CMIP7 experiment design: Dunne et al. (2025)
    • CMIP standard and infrastructure: Durack et al. (2025)
    • Papers describing MIPs associated with experiment(s) used
    • Model documentation papers associated the model(s) used

In general, the CMIP7 GMD special issue is a good place to look for relevant literature.

  1. Register your work.

    Register your work on the CMIP7 Publication Hub (coming soon).

  2. Use the standard vocabularies

    Where possible, we recommend using the CMIP7 standardized names as defined by the controlled vocabularies (CVs) (see Section 3) for terms such as the source (model) or experiment, in order to make references as clear and unambiguous as possible. However, if your audience requires different terms, then you should use those but we recommend keeping a mapping from the term your audience uses to the standardized name, again to ensure that references can be unambiguously resolved where needed. Refer to the collection of CMIP7 models as the “CMIP7 multi-model ensemble”.

Disclaimer

The data disclaimer for CMIP is available here.

3. CMIP7 facets and their documentation

CMIP7 datasets can be identified through a series of facets that represents key attributes of the data. The main facets are

  • activity
  • institution
  • source
  • experiment
  • variant
  • realm
  • frequency
  • variable
  • grid
  • version

Tip

Current advice from the CVs task team is to only access the CVs via ESGVOC. This will be subject to change in the future.

More information about the meaning of these facets is provided in the global attributes documentation, with further guidance provided on the Global Attributes page. The values associated with each facet are standardized through the CVs. They are used to search the ESGF database and can be found in the global attributes of the data. This section provides helpful links and gives a bit more information on a few key facets.

3.1. Source and Variant

The Essential Model Documentation (EMD) contains a high-level description intended to contain information on model formulation that can be easily compared between different models. EMD pages contain links to more in-depth model documentation for each source.

Basic Concepts to Understand Variants

The source facet gives the name of the model and the variant facet represents each member of an ensemble for a given source. It can also be called the “ripf” identifier (“r” for realization, “i” for initialization, “p” for physics, and “f” for forcing).

A useful tool to evaluate the models is the Rapid Evaluation Framework (REF). It is an evaluation of the models participating in CMIP6 and the CMIP7 Assessment Fast Track (AFT).

3.2. Experiment and Activity

The CMIP7 protocol and experiments are described in a special issue of Geoscientific Model Development with an overview of the design and scientific strategy provided in the lead article of that issue by Dunne et al. (2025).

Basic Concepts to Understand Experiments

Each model participating in CMIP7 will contribute results from the eight DECK experiments. These experiments are the only ones directly overseen by the CMIP Panel, and together these constitute the ongoing (slowly evolving) “CMIP” activity. In addition to the DECK, each modeling group may choose to contribute to any of the CMIP7 Community MIPs. The CMIP panel identifies key experiments to be prioritized on different timelines through fast tracks. The first one is the AFT, which includes a set of Community MIP experiments chosen by the CMIP panel to address specific needs.

MORE COMING SOON

3.3. Variable

The variables produced in CMIP7 were recommended by the CMIP7 Data Request task team. In CMIP7, the concept of branded variable identifies the variables. It follows the template:

<variableRootDD>_<temporalLabelDD>-<verticalLabelDD>-<horizontalLabelDD>-<areaLabelDD>

3.4 Frequency

Models report data on a variety of time steps. The MIP table defines the frequency with which requested variables in an experiment should be reported.

Calendars and Time Handling

Climate models often use simplified or idealized calendars for numerical and computational reasons.
CMIP7 data include a calendar attribute associated with the time coordinate, which determines how dates are represented.
Before working with any CMIP dataset, users should check the calendar type and handle it appropriately.

Common calendars found in CMIP data include:

  • gregorian (or standard) — follows the real-world Gregorian calendar including leap years.
  • noleap — identical to Gregorian but without leap days (365 days every year).
  • 360_day — each year has 12 months of 30 days (total 360 days).
  • proleptic_gregorian — a continuous Gregorian calendar extended backward in time.
  • all_leap — every year has 366 days (all years include a leap day).

These calendars are stored in the calendar attribute of the time variable, for example:

ncdump -h tas_day_ACCESS-ESM1-5_ssp585.nc | grep calendar

Further reading in CF conventions for time coordinate: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.12/cf-conventions.html#time-coordinate

It is recommended that users use the cftime library to handle time.

3.5 Grid

Different climate models use a variety of different horizontal grids that are documented in the grid registry (coming soon).

Different MIPs also have different requirements for vertical grid reporting. Output can be defined either on the native model levels, or it can be remapped to pressure levels.

Masked Averaging

Many variables in CMIP7 are defined as masked means, defined as the mean of a quantity over a portion of the grid cell defined by an area type. For more information on this, see this webpage (coming soon).

4. CMIP7 data format

As in previous phases, all CMIP7 output has been written to netCDF files. Before being published, these files must pass the ESGF Quality Control (ESGF-QC). Many modelling centres use the CMOR software to standardize their files. They are then said to have been “CMORized”.

Essential features of CMORized data are :

5. Reporting suspected errors

Warning

The CMIP7 archive contains the output of scientific simulations of the past and potential future that are subject to multiple sources of error, ranging from errors in data handling, to errors in the representation of the real world in either the model, or the experimental setup for which the model was used. Different parts of the CMIP7 archive may be subject to differing levels of such errors, and users should be alert to these issues, and their potential consequences.

Information about discovered issues of CMIP7 data is captured by the Errata Service.

Any CMIP data user can report an error by submitting an issue through the Propose button on the Errata Service website. Proposing erratum through the webform requires a contact email address. Once the webform is validated and created, a special link is created and can be shared but the issue won’t appear on the index page. A moderator (from the relevant modelling centre providing data) will validate, update or reject the entry. When data errors are discovered, data providers are expected to retract the affected datasets from ESGF and, if possible, republish corrected data using updated dataset version identifiers. If no moderation action is taken after the 14-day validation period, the issue will be publicly indexed, albeit with a special flag.

6. New to CMIP?

First time using CMIP? Need a bit more help ? Check out the Entry-Level Documentation (coming soon), put together by the Fresh Eyes on CMIP group.

You have a more specific question ? Ask it on the Fresh Eyes Platform. (You need to register here first.)

Document version: 2025-10-08