Every organization tends to collect data in a unique way.
This document defines the protocol that GISIN users to transfer data between organizations in a standard way.
For information on why the specification includes the features it does, please see the Requirements Specification.
Note: The documentation on web services has been moved to the new Web Service page.
A large number of individuals and organizations helped with the initial idea, content, and reviews of this protocol. They include; Jim Graham and Annie Simpson for coordinating and documenting the effort, Jerry Cooper, Bob Morris and Michael Browne for original work on the IASPS documentation which was a starting point for this document; Greg Ruiz and Jim Carlton for their Framework for Vector Science; Pam Fuller, Greg Ruiz, Brian Steves, and Shawn Dalton, for their development of NISBase as the ground breaking work on implementing invasive species data exchange; Michael Browne for facilitating the development of the status categories; Robert Hilliard, Kevin Thiele and Aaron Wilton for assistance in integrating vocabularies, Roger Hyam, Renato De Giovanni, and Markus Doring for help with implementing TAPIR; Donald Hobern and Hannu Saarenmaa for overall guidance, and Liz Sellers, Rob Emery, Jacob Asiedeau, Greg Newman, Catherine Jarnevich, Silvia Ziller, Andrea Grosse, Olivier de Munck, the staff of Invasive Species Specialist Group and the Global Invasive Species Database.
If we have forgotten anyone please let us know!
The protocol has been designed with potential data providers in mind, including, a recognition that these
organizations will tend to have simple, flat databases, with minimal technical resources to modify their
databases to make them available as a web service. At the same time the protocol must perform at high
speeds to allow for both a large number of providers and for providers with very large data sets. The protocol is used by both the file upload and web service capabilities of GISIN.
2.1 Data Models
Below are the "Data Models" or types of data that are supported by the protocol.
SpeciesStatus - SpeciesStatus for a species, in a particular location, at a particular date. This includes data on origin, presence, distribution, abundance, rate of spread, whether the species is harmful.
SpeciesResourceURLs - URLs to web pages with profile information (also known as descriptions or fact-sheets)
Occurrences - Spatial information on species locations at specified dates. The location can be coordinates or location names.
Profile data - Data to profile specific species (i.e. life history, management, identification, etc.)
ImpactStatus - The types of impacts the organism is causing at a specific location
DispersalStatus - The methods of dispersal used by the organism.
ManagementStatus - The management activities engaged for an organism
Pictures - Digital images of species, impacts, and key characteristics (not defined yet)
Note: We broke out ManagementStatus and ImpactStatus from SpeciesStatus when we realized there could be multiple records for ManagementStatus and/or ImpactStatus for a single SpeciesStatus record (i.e. a single species and location). This allows us to keep the Model records flat.
The GISIN protocol was created as an extension to the DarwinCore standard. This standard refers to each of the types of data in a data model as a "concept". Concepts can also be thought of as the labels over the columns in a table or spreadsheet file, or as the fields in a database.
3.1 Common Concepts
The following Concepts are used as fields to identify the Date, Taxonomy, Location, and the Language for text results within the data for all Models.
Should be of the form:
[Authority]/guid/[InstitutionCode]/[CollectionCode]/[CatalogNumber]. The authority should be a recognized GUID authority. The authority will typically provide the InstitutionCode and the CollectionCode. The provider can then determine a CatalogNumber that uniquely identifies each record. If you use GISIN as the authority, please use the InstitutionCode and CollectionCode provided. GISIN GUIDs have the form: gisin.org/guid/[InstitutionCode]/[CollectionCode]/[CatalogNumber].
A globally unique identifier (GUID) for the record that needs to be resolvable and persistent. In other words, only GUIDs provided by a GUID authority should be used, and they must be unique to a particular record. If a record is removed from a dataset, the GUID must not be reused with any other record, ever. Also, if a provider receives data from another provider, they should just use and pass on the original GUID (if provided). However, if the data is modified in some way, the provider should provide a new GUID and cite the original provider in the Citation concept.
This is the last date that the record changed on in the format YYYY-MM-DD. This is used for harvesters to query only data that has changed. If the provider does not have this information it should return the current date.
Symphyotrichum lanceolatum ssp. hesperium var. hesperium
ScientificName is the primary means of filtering by taxa. At least a Genus is required. Species, subspecies, and variety may be provided in standard taxonomic notation. Do not include author and date so that all species of the same taxa can be identified in a single query. Kingdom is recommended to be included in all requests with ScientificName's to resolve the few conflicts where the same genus appears in more than one Kingdom.
Note: If a provider does not support a particular Concept it should simply not return that Concept as an element of the record.
Dates are represented as documented in International Standard ISO 8601. This format is YYYY-MM-DD where YYYY is the decimal year in the Gregorian calendar. See Markus's web page for a quick summary. At least a year is required, month and day are preferred as well.
The Modified is the last date that the record was changed. This will be used by data consumers of if they cache data. If the provider does not maintain a DateLastModfiied in their database they should always return the current date.
The StartValidDate and EndValidDate represent the range of when a "status" data Model is valid. Data providers should return an empty element for the EndValidDate if the status is still current.
ScientificName is the primary means of identifying a taxa. At least a Genus is required. Species, subspecies, variety, author, and date may be provided in standard taxonomic notation. Kingdom is recommended to be included in all requests with ScientificNames to resolve the few conflicts where the same genus appears in more than one Kingdom.
Examples of scientific names include:
Moerckiahibernica var. wilsoniana
Symphyotrichumlanceolatum ssp. hesperium var. hesperium
Note: We need to define how Taxonomic Concepts will be included.
Invasive species location data can be in one of three forms: 1) country codes with additional concepts, 2) standard location codes, and/or 3)
Readers should recognize that certain providers will have only local names while others will have only geographic coordinates
(of a variety of types). Consumers may request just LocationNames, just Geographic coordinates, or both.
The data provider should provide all the information it has available that meets the requested content.
220.127.116.11 Location from Country Codes with Additional Concepts
International codes exist for countries and are available from the
ISO 3166 and can be specified with the CountryCode concept.
Once a country code is specified, a state/province should be specified with the StateProvince concept.
If approrpiate a county (which includes cantons) should then be specified with the County concept.
If available, the name of the local area can be specified with the LocalityName concept. A LocalityType should accompany a LocalityName
to ensure it is unique.
If names are not associated with a LanguageCode then they are assumed to be in the default language of the provider.
Lake, stream, river, ocean, sea
City, town, burg, municipality
18.104.22.168 Standard Location Codes
Another approach is to identify the location of the record with a known defined standard (i.e. US_HUC, US_FIPS, AR_PostalCode, etc.).
This is done by providing a LocationStandard and LocationCode. Supported LocationStandards are listed below.
There can only be one location code per record.
Maintained for backward compatibility, replaced by ANSI (need more here). See Geographic Names Information System(GNIS) and download the National data set. The State FIPS codes are labeled 'STATE_NUMBERIC'. County FIPS codes are a combination of the 'STATE_NUMERIC' and 'COUNTY_NUMERIC' fields.
Please contact a member of the GISIN steering committee to have a new standard added.
22.214.171.124 Geographic Coordinates
A precise coordinate can be specified with the DecimalLatitude and DecimalLongitude concepts. These values should be in decimal degrees.
The accuracy of the coordinates should be provided with the HorizontalAccuracy concept and how the coordinates were created should be
provided with the GeographicProtocol concept if this information is available.
Filtering for geographic coordinates is only supported for geographic bounding boxes with the DecimalLatitudeMin, DecimalLatitudeMax,
DecimalLongitudeMin, and DecimalLongitudeMax parameters.
Languages are specified with IS0 639-2 codes. These are 3-letter codes. In some cases a bibliographic code and a terminology code is provided. In this case GISIN uses the terminology code. Some examples are below.
3.1.5 Globally Unique Identifiers
Globally Unique Identifiers, or GUIDs, are a mechanism to uniquely identify data that is moved around the Internet. GUIDs are extremely important both to make sure we are not duplicating records and to make sure that the originators of data are identifiable. GUIDs are also a means to allow corrections to data to be updated and to have old data removed from a system.
GISIN recommends GUIDs be attached to each record in the original source of the data. When this is not possible, providers should add a GUID and cite the original source in the Citation concept. If a provider changes the contents of a record that they do not own, they should create a new GUID and cite the original source in the Citation concept. Since GUIDs are used to identify a unique record, they should never be reused with a different record.
GUIDs must also be traceable or resolvable. Any GUID must be able to be used to determine the original source of a record, typically by entering it into a web browser as a URL (in the case of GISIN GUIDs, with the addition of http://www.). GUIDs must also last for a very long time (indefinitely). This means that a "GUID Authority" must be used to provide the GUID. An authority is defined as an organization that has made a long-term commitment to providing a resolving functionality for GUIDs.
There are a variety of formats for GUIDs. GISIN uses a GUID standard that is easy to read and resolve. The format appears as:
Where the values in brackets would be replaced with:
Authority - The authority that provides the resolving service for the GUID
InstitutionCode - Name of the institution providing the data (i.e. a specific university, county, national park, etc.)
CollectionCode - Uniquely identifies the collection within the institution
CatalogNumber - A unique string (can be letters and numbers) within the collection
GISIN has volunteered to be a GUID authority so if you as a data provider do not have another authority to use, you can contact GISIN to obtain an InstitutionCode and CollectionCode(s) and then determine your own unique CatalogNumbers. The format for GUIDs from GISIN is:
Whether the range of the organism is increasing or decreasing
Common in the referenced location. Moderate abundance
Numerically dominant in the referenced location. Depending on the nature of the referenced location, this information could be at the individual, population, community, ecosystem or landscape scale.
Exists at a high level of abundance which has resulted in virtually no other species being present in the referenced location.
Numerically rare in the referenced location.
Zero abundance means absent. It's ok to have parameters overlap (but not the values within a parameter).
Occurs in only a few parts of the referenced location.
Occurs in some but not all parts of the referenced location.
Occurs in most of the referenced location.
Benign. Not harmful
Has been known to be harmful elsewhere or displays tendencies which could become harmful.
Any kind of harm has been identified. Could be environmental, social/economic or harmful to human or animal health.
(Alien, foreign, exotic, introduced, non-native) Means a species, subspecies, or lower taxon occurring outside of its natural range (past or present) and dispersal potential (i.e. outside the range it occupies naturally or could not occupy without direct or indirect introduction or care by humans) and includes any part, gametes or propagule of such species that might survive and subsequently reproduce (IUCN 2000). Includes aboriginal introductions and archeozoa/archeophytes (early introductions of organisms into Europe by humans that are strongly integrated in European ecosystems)
Naturally distributed within the region of interest, with a long-term presence extending into the pre-historic record
Includes 'Cryptogenic' i.e. a species which is neither demonstratively native nor introduced in a region because its origin/native range is unknown, and 'Uncertain' i.e. the native range is known but this occurrence lies somewhere between its known native and nonindigenous ranges.
Not surviving. Presumed extinct at the referenced location.
Surviving and reproducing in perpetuity. Synonymous with 'naturalized'.
Surviving and reproducing for a limited period
Surviving but not reproducing (e.g., remnant species from old gardens).
Known to be absent. This may be due to Eradication, Interception at border, Presumed extinct (see Persistence) or Recorded in error. These 'Reasons for absence' may become part of an expanded schema
Known or reported to be present at the date of publication or last update of records. Please note that for aggregated data (e.g. a collection of historic reports), the date of publication or last update of records provides no indication of when the organism was present.
Species has been reported, such as in the literature, to be present in an area
Vagrant, migratory or an otherwise 'casual' presence
Organism that has not yet been through a process to determine if it should be restricted or prohibited.
Organism with some regulatory restriction or control
Biomass is decreasing
Biomass is increasing
Zero, or close to zero, trend
3.3.1 SpeciesResourceURL Concepts
SpeciesResourceURLs are required to support the following general Concepts:
Taxon: At least a scientific name
SpeciesResourceURL support the following additional Concepts as fields:
A decimal representation of the precision of the coordinates given in the decimalLatitude and decimalLongitude. Examples: '0.00001' (normal GPS limit for decimal degrees), '0.000278' (nearest second), '0.01667' (nearest minute), '1.0' (nearest degree). For discussion see http://code.google.com/p/darwincore/wiki/Location
First date when the occurrence data for the record was being collected in the format YYYY-MM-DD
High Accuracy Reference Network
World Geodetic System from 1984
Coordinate derived from searching for a place in a database that matches place names with coordinates such as the Geographic Names Information System.
Examples include centroid calculation for a state/province or county
Coordinate from a Geographic Positioning System unit
Derived from a physical map
The preferred datums are World Geodetic System 1984 (WGS84) or High Accuracy Reference Network (HARN) but providers may have data in datums that are not global and may not have the facilities to convert them to a global datum. Ignoring the datum can cause errors of thousands of meters! We highly encourage providers to provide data in WGS84 or HARN and consumers should always check the datum or else filter to choose just the datums they accept.
3.5.1 ImpactStatus Concepts
ImpactStatus represents the type of impact a species is having on a habitat. Multiple ImpactStatues should be provided for species that impact multiple habitats (i.e. marine and terrestrial).
ImpactStatuses are required to support the following general Concepts:
Taxon: At least a scientific name
Location: At least one code or name, coordinates do not apply to SpeciesStatus
ImpactStatuses support the following additional Concepts:
A habitat with a mix of fresh and salt water, typically estuaries but also includes salt marshes and salt lakes
A primarily freshwater habitat including streams, rivers, and freshwater lakes
A primarily marine habitat including oceans, seas, and bays
An impact exhibited primarily on land.
The impact the organism is having on the target specified is negative.
The impact the organism is having on the target specified is neutral.
The impact the organism is having on the target specified is positive.
Includes urban environments as well as forestry, agriculture, horticulture, second growth
Few environments are pristine. Most conservation efforts are focused on natural or semi-natural environments.
Includes livelihood, cultural, medicinal, amenity and social activities
Natural or semi-natural environments and/or the species they contain. Includes, changes to ecosystem functioning and composition, habitat availability, species interactions, hybridization, predation, competition etc.
In the future, expansion may be needed to distinguish between e.g. diseases and allergens.
Note: we need more quantifiable terms here
DispersalStatus represents the type of Dispersal a species is having on a habitat. Multiple DispersalStatuses should be provided for species that Dispersal multiple habitats (i.e. marine and terrestrial).
3.6.1 DispersalStatus Concepts
DispersalStatuses are required to support the following general Concepts:
Taxon: At least a scientific name
Location: At least one code or name, coordinates do not apply to DispersalStatus. This location identifies the area for this dispersal event. A 'FromCountryCode' can be specified for the country of origin.
DispersalStatuses support the following additional Concepts:
Defines the standard used by the provider to identify the location of the record (i.e. US_HUC, US_FIPS, AR_PostalCode, etc.) in the fromLocationValue concept field. See locationStandard concept for more information.
A textural description of the route the organism took from the FromCountryCode. If used, a LanguageCode must be specified. For example, this would be 'This species was brought over on a coal ship.'
A government is believed to be responsible for the introduction. This could be a specific agency within the government, most military, and state run universities.
A single individual is believed to have caused the introduction (e.g., someone released their pet into the wild).
This would include any organization that exists across international boundaries - such as the GISIN.
This would include non-governmental organizations that are not for profit (e.g., the nature conservancy), pirates, and churches among other groups.
A for profit group is responsible for introduction. Private academia would be included in this category.
Unintentional dispersal in association with human activity
Dispersal resulting from intentional human activity
Dispersal by an organisms' natural mechanisms and strategies without direct or indirect human intervention
The organism is dispersing between the regions specified as the 'from' and 'to' localities.
The organism is dispersing within the region specified in the locality field.
3.7.1 ManagementStatus Concepts
ManagementStatus represents the type of management activities involved with a species in a specified area. Multiple ManagementStatuses should be provided for if multiple activities are engaged in the same time period.
ManagementStatuses are required to support the following general Concepts:
Taxon: At least a scientific name
Location: At least one code or name, coordinates do not apply to SpeciesStatus
ManagementStatuses support the following additional Concepts:
Measures taken to keep a species within a defined area
Measures taken to reduce a speciesí biomass
Actions taken that eliminate all occurrences of a species
Detection of a species at a border and prevention of its entering an area.
Actions taken to reduce the harmful effects of a species
A decision to take no action has been made.
Measures taken to stop a species from entering an area.
The Action Failed; this should only be selected if the Status of the Action is 'Completed'.
The Action was successful; this should only be selected if the Status of the Action is 'Completed'.
The action is believed to have failed, but requires time to confirm the outcome.
The action is believed to be successful, but requires time for confirmation.
The Outcome of the management Action is unknown.
Prevention projects are usually ongoing, completed implies no further control effort.
Includes e.g. prevention systems in place and/or public education
Includes even the suggestion that the activity would be a good idea, because this indicates concern about the organism
Carlton, J.T. and G.M. Ruiz. 2002. Principles of Vector Science and Integrated Vector Management. In Mooney, H. et al. (eds.) Best Practices for the Prevention and Management of Alien Invasive Species. Island Press
IUCN 2000. Guidelines for the prevention of biodiversity loss due to biological invasion. IUCN – The World Conservation Union, Gland, Switzerland
Appendix A - Issues
See the protocol issues in the technical documentation
Appendix B - Changes
Added values for DispersalStatus Cause and Vector
Added ImpactStatus, DispersalStatus, and ManagementStatus
Added LanguageCode to SpeciesResourceURL
Added additional definitions to values
Added issue 6
Updated concepts for ImpactStatus, DispersalStatus, and ManagementStatus based on GISIN3 results
Added GloballyUniqueIdentifier to all data models based on GISIN3 resutls
Updated other areas of data models based on resolved issues from GISIN3
Separated out the GISIN protocol from the web service documentation