CMIP7 Participation for Data Managers
1. Installation and configuration
All information on Earth System Grid Federation (ESGF) can be found here this needs updating?
Somewhere in the documentation below needs a) information about QC procedures for Data Managers and b) graphic of the ESGF NG setup (Forrest/Phil have sent examples to IPO)
1.1 ESGF Software
Description
The ESGF Data Node software stack enables sites hosting earth system data to make it available to the community over several transfer protocols including http(s). ~~Index nodes enable search for hosted data via data publishing to the index, and these nodes include a search API and web frontend~~.not sure this is relevant if indexing is deprecated? Identity nodes manage user accounts. Nodes run as Docker containers and can be deployed via Ansible Playbooks or Helm Charts in a Kubernetes environment
New and exisiting installations
For new or exisiting ESGF node installations, first read the following document needs proper link and updating on ESGF policies, as this will influence the type of installation you need to deploy.
1.2 How to install
Requirements, setup and usage documentation
Software Stack The ESGF software stack requires Linux RedHat Enterprise or Rocky/Alma distributions. Administrators must have full sudo privileges to root access or a Kubernetes Cluster. The services are meant to run on webserver-grade hardware need a practical example here with cost estimate. For data-sharing nodes the storage holding your data must be mounted on the node.
ESGF Docker Instructions and links to any issuses can be found here.
Ansible Legacy documentation is available here is this still valid?
Metagrid user interface To install the Metagrid UI for end-users to search and download data, read the documentation here and see the Github repo heredon't know if these links need updating?
2. Dataset publication
Requirements
Publishers to ESGF must have an existing Data Node installed at their site. Although the publisher software (from v5.x onwards) does not need to run on the Data Node it does require a Data mount for the software to access data files.
2.1 Dataset preparation
The ESGF publication process requires robust and effective data management, which can also be a burden for data managers. However, the ESGF esgprep toolbox is a piece of software that enables data preparation according to ESGF best practices. Esgprep allows the data providers and data node managers to easily prepare their data for publishing to an ESGF node - it is a standalone toolbox. It can be used to fetch required configuration files, apply the Data Reference Syntax on local filesystems and/or generate mapfiles for ESGF publication.
Full details of esgprep and instructions for use provided by the team at Institut Pierre-Simon Laplace (IPSL) can be found here
2.2 Publisher introduction
The esg-publisher or esgcet Python package contains a collection of command-line utilities to scan, manipulate and push dataset metadata to an ESGF index node. The basic publication process takes several steps with some optional steps. Publisher functionality is available via several submodles/classes in the package. Please refer to the user documentation and Github issues page
2.3 ESG-Publisher software installation
Requirements
- A python environment, using venv, conda, miniforge/mamba etc.
- Mountpoint map to data on the same host as the publisher software installation, so the publisher scan utility (eg. autocurator) has access.
- Basic dataset information provided via the esg mapfile format. For example using the esgf-prepare/esgmapfile utility.
2.4 Dataset publication
Full details of the dataset publication process using pip install to install esgcet can be found here
3. Dataset retraction
3.1 Retraction process
The esgunpublish command retracts, or, upon specification, deletes a specified dataset(s). The output of this command is either a success or failure message accompanied with the id of the dataset that was retracted. Exercise caution when deleting datasets. If replicas have been made or if you will be republishing, you should retract rather than delete outright. Follow the instructions here and for an example, check out the Jupyter notebook