Skip to main content

Table 1 Assumptions and fundamental principles in building, maintaining, and sharing integrated macrosystems ecology databases

From: Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse

• The database should include both a ‘census’ population in which all possible ‘ecosystems’ or ‘sites’ are geographically represented in addition to the sites with in-situ data.

• The database will be fully documented, including descriptions of: the original data providers or sources, database design, all data processing steps and code for all data, possible errors or limitations of the data for the integrated dataset and individual datasets, and methods and code for geospatial data processing.

• To the greatest degree possible, existing community data standards are used to facilitate integration with other efforts.

• To the greatest degree possible, the provenance of the original data will be preserved through to the final data product.

• The database will include a versioning system to track different versions of the database for future users and to facilitate reproducibility.

• The database will be made publicly accessible in an online data repository with a permanent identifier using non-proprietary data formats at the end of the project or after a suitable embargo if necessary.

• A data paper will be written with the original data providers as co-authors to ensure recognition of data providers.

• A data-methods paper is written with the data-integration team as co-authors to ensure recognition of data integrators.

• Once the database is made available in a data repository and is open-access, whether it is static (no further data is added to the database) or ongoing (data continues to be added to it), there are a set of community policies by which other scientists use and cite the database, the original data providers, and the database-integrators.