Every organization tends to collect data in a unique way.
This document defines the protocol that GISIN users to transfer data between organizations in a standard way.
For information on why the specification includes the features it does, please see the Requirements Specification.
Note: The documentation on web services has been moved to the new Web Service page.
1.1 Acknowledgements
A large number of individuals and organizations helped with the initial idea, content, and reviews of this protocol. They include; Jim Graham and Annie Simpson for coordinating and documenting the effort, Jerry Cooper, Bob Morris and Michael Browne for original work on the IASPS documentation which was a starting point for this document; Greg Ruiz and Jim Carlton for their Framework for Vector Science; Pam Fuller, Greg Ruiz, Brian Steves, and Shawn Dalton, for their development of NISBase as the ground breaking work on implementing invasive species data exchange; Michael Browne for facilitating the development of the status categories; Robert Hilliard, Kevin Thiele and Aaron Wilton for assistance in integrating vocabularies, Roger Hyam, Renato De Giovanni, and Markus Doring for help with implementing TAPIR; Donald Hobern and Hannu Saarenmaa for overall guidance, and Liz Sellers, Rob Emery, Jacob Asiedeau, Greg Newman, Catherine Jarnevich, Silvia Ziller, Andrea Grosse, Olivier de Munck, the staff of Invasive Species Specialist Group and the Global Invasive Species Database.
If we have forgotten anyone please let us know!
2. Overview
The protocol has been designed with potential data providers in mind, including, a recognition that these
organizations will tend to have simple, flat databases, with minimal technical resources to modify their
databases to make them available as a web service. At the same time the protocol must perform at high
speeds to allow for both a large number of providers and for providers with very large data sets. The protocol is used by both the file upload and web service capabilities of GISIN.
2.1 Data Models
Below are the "Data Models" or types of data that are supported by the protocol.
Models:
SpeciesStatus - SpeciesStatus for a species, in a particular location, at a particular date. This includes data on origin, presence, distribution, abundance, rate of spread, whether the species is harmful.
SpeciesResourceURLs - URLs to web pages with profile information (also known as descriptions or fact-sheets)
Occurrences - Spatial information on species locations at specified dates. The location can be coordinates or location names.
Profile data - Data to profile specific species (i.e. life history, management, identification, etc.)
ImpactStatus - The types of impacts the organism is causing at a specific location
DispersalStatus - The methods of dispersal used by the organism.
ManagementStatus - The management activities engaged for an organism
Pictures - Digital images of species, impacts, and key characteristics (not defined yet)
Note: We broke out ManagementStatus and ImpactStatus from SpeciesStatus when we realized there could be multiple records for ManagementStatus and/or ImpactStatus for a single SpeciesStatus record (i.e. a single species and location). This allows us to keep the Model records flat.
2.2 Concepts
The GISIN protocol was created as an extension to the DarwinCore standard. This standard refers to each of the types of data in a data model as a "concept". Concepts can also be thought of as the labels over the columns in a table or spreadsheet file, or as the fields in a database.
3. Details
3.1 Common Concepts
The following Concepts are used as fields to identify the Date, Taxonomy, Location, and the Language for text results within the data for all Models.
Should be of the form:
[Authority]/guid/[InstitutionCode]/[CollectionCode]/[CatalogNumber]. The authority should be a recognized GUID authority. The authority will typically provide the InstitutionCode and the CollectionCode. The provider can then determine a CatalogNumber that uniquely identifies each record. If you use GISIN as the authority, please use the InstitutionCode and CollectionCode provided. GISIN GUIDs have the form: gisin.org/guid/[InstitutionCode]/[CollectionCode]/[CatalogNumber].
A globally unique identifier (GUID) for the record that needs to be resolvable and persistent. In other words, only GUIDs provided by a GUID authority should be used, and they must be unique to a particular record. If a record is removed from a dataset, the GUID must not be reused with any other record, ever. Also, if a provider receives data from another provider, they should just use and pass on the original GUID (if provided). However, if the data is modified in some way, the provider should provide a new GUID and cite the original provider in the Citation concept.
This is the last date that the record changed on in the format YYYY-MM-DD. This is used for harvesters to query only data that has changed. If the provider does not have this information it should return the current date.
Symphyotrichum lanceolatum ssp. hesperium var. hesperium
ScientificName is the primary means of filtering by taxa. At least a Genus is required. Species, subspecies, and variety may be provided in standard taxonomic notation. Do not include author and date so that all species of the same taxa can be identified in a single query. Kingdom is recommended to be included in all requests with ScientificName's to resolve the few conflicts where the same genus appears in more than one Kingdom.
A URL pointing to a page containing the information being provided to GISIN with the specific record the URL is a part of. This needs to be a human readable page.
Note: If a provider does not support a particular Concept it should simply not return that Concept as an element of the record.
3.1.1 Dates
Dates are represented as documented in International Standard ISO 8601. This format is YYYY-MM-DD where YYYY is the decimal year in the Gregorian calendar. See Markus's web page for a quick summary. At least a year is required, month and day are preferred as well.
The Modified is the last date that the record was changed. This will be used by data consumers of if they cache data. If the provider does not maintain a DateLastModfiied in their database they should always return the current date.
The StartValidDate and EndValidDate represent the range of when a "status" data Model is valid. Data providers should return an empty element for the EndValidDate if the status is still current.
3.1.2 Taxa
ScientificName is the primary means of identifying a taxa. At least a Genus is required. Species, subspecies, variety, author, and date may be provided in standard taxonomic notation. Kingdom is recommended to be included in all requests with ScientificNames to resolve the few conflicts where the same genus appears in more than one Kingdom.
Examples of scientific names include:
Tamarix
Tamarix ramossissima
Moerckiahibernica var. wilsoniana
Epipenaeoningenslatifrons
Symphyotrichumlanceolatum ssp. hesperium var. hesperium
Note: We need to define how Taxonomic Concepts will be included.
3.1.3 Locations
Invasive species location data can be in one of three forms: 1) country codes with additional concepts, 2) standard location codes, and/or 3)
geographic coordinates.
Readers should recognize that certain providers will have only local names while others will have only geographic coordinates
(of a variety of types). Consumers may request just LocationNames, just Geographic coordinates, or both.
The data provider should provide all the information it has available that meets the requested content.
3.1.3.1 Location from Country Codes with Additional Concepts
International codes exist for countries and are available from the
ISO 3166 and can be specified with the CountryCode concept.
Once a country code is specified, a state/province should be specified with the StateProvince concept.
If approrpiate a county (which includes cantons) should then be specified with the County concept.
If available, the name of the local area can be specified with the LocalityName concept. A LocalityType should accompany a LocalityName
to ensure it is unique.
If names are not associated with a LanguageCode then they are assumed to be in the default language of the provider.
Locality Type
Synonyms
State
Province, commonwealth
County
Shire, canton
WaterBody
Lake, stream, river, ocean, sea
Locality
City, town, burg, municipality
Island
Isle
3.1.3.2 Standard Location Codes
Another approach is to identify the location of the record with a known defined standard (i.e. US_HUC, US_FIPS, AR_PostalCode, etc.).
This is done by providing a LocationStandard and LocationCode. Supported LocationStandards are listed below.
There can only be one location code per record.
Location Standard
Definitions
US_FIPSCode
Maintained for backward compatibility, replaced by ANSI (need more here). See Geographic Names Information System(GNIS) and download the National data set. The State FIPS codes are labeled 'STATE_NUMBERIC'. County FIPS codes are a combination of the 'STATE_NUMERIC' and 'COUNTY_NUMERIC' fields.
Please contact a member of the GISIN steering committee to have a new standard added.
3.1.3.3 Geographic Coordinates
A precise coordinate can be specified with the DecimalLatitude and DecimalLongitude concepts. These values should be in decimal degrees.
The accuracy of the coordinates should be provided with the HorizontalAccuracy concept and how the coordinates were created should be
provided with the GeographicProtocol concept if this information is available.
Filtering for geographic coordinates is only supported for geographic bounding boxes with the DecimalLatitudeMin, DecimalLatitudeMax,
DecimalLongitudeMin, and DecimalLongitudeMax parameters.
3.1.4 Languages
Languages are specified with IS0 639-2 codes. These are 3-letter codes. In some cases a bibliographic code and a terminology code is provided. In this case GISIN uses the terminology code. Some examples are below.
English: eng
French: fra
Spanish: spa
German: deu
Chinese: zho
Italian: ita
Dutch: nld
Portuguese: por
3.1.5 Globally Unique Identifiers
Globally Unique Identifiers, or GUIDs, are a mechanism to uniquely identify data that is moved around the Internet. GUIDs are extremely important both to make sure we are not duplicating records and to make sure that the originators of data are identifiable. GUIDs are also a means to allow corrections to data to be updated and to have old data removed from a system.
GISIN recommends GUIDs be attached to each record in the original source of the data. When this is not possible, providers should add a GUID and cite the original source in the Citation concept. If a provider changes the contents of a record that they do not own, they should create a new GUID and cite the original source in the Citation concept. Since GUIDs are used to identify a unique record, they should never be reused with a different record.
GUIDs must also be traceable or resolvable. Any GUID must be able to be used to determine the original source of a record, typically by entering it into a web browser as a URL (in the case of GISIN GUIDs, with the addition of http://www.). GUIDs must also last for a very long time (indefinitely). This means that a "GUID Authority" must be used to provide the GUID. An authority is defined as an organization that has made a long-term commitment to providing a resolving functionality for GUIDs.
There are a variety of formats for GUIDs. GISIN uses a GUID standard that is easy to read and resolve. The format appears as:
Where the values in brackets would be replaced with:
Authority - The authority that provides the resolving service for the GUID
InstitutionCode - Name of the institution providing the data (i.e. a specific university, county, national park, etc.)
CollectionCode - Uniquely identifies the collection within the institution
CatalogNumber - A unique string (can be letters and numbers) within the collection
GISIN has volunteered to be a GUID authority so if you as a data provider do not have another authority to use, you can contact GISIN to obtain an InstitutionCode and CollectionCode(s) and then determine your own unique CatalogNumbers. The format for GUIDs from GISIN is:
Whether the organism is considered harmful. Invasive Species should return 'Harmful=Yes' and 'Origin=Nonindigenous'. More detailed 'harm' information is dealt with in the ImpactStatus Data Model.
Whether the range of the organism is increasing or decreasing
abundance Values
Name
Description
Common
Common in the referenced location. Moderate abundance
Dominant
Numerically dominant in the referenced location. Depending on the nature of the referenced location, this information could be at the individual, population, community, ecosystem or landscape scale.
Monoculture
Exists at a high level of abundance which has resulted in virtually no other species being present in the referenced location.
Rare
Numerically rare in the referenced location.
Unknown
Zero
Zero abundance means absent. It's ok to have parameters overlap (but not the values within a parameter).
distribution Values
Name
Description
Localized
Occurs in only a few parts of the referenced location.
Moderate
Occurs in some but not all parts of the referenced location.
Unknown
Widespread
Occurs in most of the referenced location.
harmful Values
Name
Description
No
Benign. Not harmful
Potentially
Has been known to be harmful elsewhere or displays tendencies which could become harmful.
Unknown
Yes
Any kind of harm has been identified. Could be environmental, social/economic or harmful to human or animal health.
origin Values
Name
Description
Exotic
(Alien, foreign, exotic, introduced, non-native) Means a species, subspecies, or lower taxon occurring outside of its natural range (past or present) and dispersal potential (i.e. outside the range it occupies naturally or could not occupy without direct or indirect introduction or care by humans) and includes any part, gametes or propagule of such species that might survive and subsequently reproduce (IUCN 2000). Includes aboriginal introductions and archeozoa/archeophytes (early introductions of organisms into Europe by humans that are strongly integrated in European ecosystems)
Indigenous
Naturally distributed within the region of interest, with a long-term presence extending into the pre-historic record
Unknown
Includes 'Cryptogenic' i.e. a species which is neither demonstratively native nor introduced in a region because its origin/native range is unknown, and 'Uncertain' i.e. the native range is known but this occurrence lies somewhere between its known native and nonindigenous ranges.
persistence Values
Name
Description
DiedOut
Not surviving. Presumed extinct at the referenced location.
Persistent
Surviving and reproducing in perpetuity. Synonymous with 'naturalized'.
Temporary
Surviving and reproducing for a limited period
Transient
Surviving but not reproducing (e.g., remnant species from old gardens).
Unknown
presence Values
Name
Description
Absent
Known to be absent. This may be due to Eradication, Interception at border, Presumed extinct (see Persistence) or Recorded in error. These 'Reasons for absence' may become part of an expanded schema
Present
Known or reported to be present at the date of publication or last update of records. Please note that for aggregated data (e.g. a collection of historic reports), the date of publication or last update of records provides no indication of when the organism was present.
Reported
Species has been reported, such as in the literature, to be present in an area
SometimesPresent
Vagrant, migratory or an otherwise 'casual' presence
Unknown
publicationDatePrecision Values
Name
Description
Day
Month
Unknown
Year
rateOfSpread Values
Name
Description
Moderate
Rapid
Slow
Unknown
regulatoryListing Values
Name
Description
NotConsidered
Organism that has not yet been through a process to determine if it should be restricted or prohibited.
Prohibited
Banned organism
Restricted
Organism with some regulatory restriction or control
Unknown
trend Values
Name
Description
Declining
Biomass is decreasing
Expanding
Biomass is increasing
Stable
Zero, or close to zero, trend
Unknown
3.3 SpeciesResourceURLs
3.3.1 SpeciesResourceURL Concepts
SpeciesResourceURLs are required to support the following general Concepts:
Taxon: At least a scientific name
Modified
LanguageCode
SpeciesResourceURL support the following additional Concepts as fields:
List of experts to contact for information regarding a specific species
Identification key
Information to aid in identifiying an organism.
Image
Image of a species
Profile
Information about a species like life history
Reference list
List of references with inforamtion about a specific species
RiskAssessment
The resource is an assessement of risk for the species.
Unknown
Video
Video of a specie
3.4 Occurrences
Occurrence data are especially important for modeling present and future distributions of species. The GISIN system utilized other TDWG standards in the management of occurrence information.
3.4.1 Occurrence Concepts
Occurrences are required to support the following general Concepts:
Taxon: At least a scientific name
Location: At least one code or name, or coordinates
Modified
Occurrences support the following additional Concepts:
A decimal representation of the precision of the coordinates given in the decimalLatitude and decimalLongitude. Examples: '0.00001' (normal GPS limit for decimal degrees), '0.000278' (nearest second), '0.01667' (nearest minute), '1.0' (nearest degree). For discussion see http://code.google.com/p/darwincore/wiki/Location
First date when the occurrence data for the record was being collected in the format YYYY-MM-DD
collectionDatePrecision Values
Name
Description
Day
Month
Unknown
Year
geodeticDatum Values
Name
Description
HARN
High Accuracy Reference Network
Unknown
WGS84
World Geodetic System from 1984
georeferenceProtocol Values
Name
Description
ConvertedFromOtherUnit
Gazetteer
Coordinate derived from searching for a place in a database that matches place names with coordinates such as the Geographic Names Information System.
GIS Derived
Examples include centroid calculation for a state/province or county
GPS
Coordinate from a Geographic Positioning System unit
Map
Derived from a physical map
RemotelySensed
Unknown
Datums
The preferred datums are World Geodetic System 1984 (WGS84) or High Accuracy Reference Network (HARN) but providers may have data in datums that are not global and may not have the facilities to convert them to a global datum. Ignoring the datum can cause errors of thousands of meters! We highly encourage providers to provide data in WGS84 or HARN and consumers should always check the datum or else filter to choose just the datums they accept.
3.5 ImpactStatus
3.5.1 ImpactStatus Concepts
ImpactStatus represents the type of impact a species is having on a habitat. Multiple ImpactStatues should be provided for species that impact multiple habitats (i.e. marine and terrestrial).
ImpactStatuses are required to support the following general Concepts:
Taxon: At least a scientific name
Location: At least one code or name, coordinates do not apply to SpeciesStatus
Modified
ValidDate
ImpactStatuses support the following additional Concepts:
A habitat with a mix of fresh and salt water, typically estuaries but also includes salt marshes and salt lakes
Freshwater
A primarily freshwater habitat including streams, rivers, and freshwater lakes
Marine
A primarily marine habitat including oceans, seas, and bays
Terrestrial
An impact exhibited primarily on land.
Unknown
direction Values
Name
Description
Negative
The impact the organism is having on the target specified is negative.
Neutral
The impact the organism is having on the target specified is neutral.
Positive
The impact the organism is having on the target specified is positive.
Unknown
environment Values
Name
Description
Artificial
HumanModified
Includes urban environments as well as forestry, agriculture, horticulture, second growth
Natural
Few environments are pristine. Most conservation efforts are focused on natural or semi-natural environments.
Unknown
target Values
Name
Description
Economy
Includes livelihood, cultural, medicinal, amenity and social activities
Environment
Natural or semi-natural environments and/or the species they contain. Includes, changes to ecosystem functioning and composition, habitat availability, species interactions, hybridization, predation, competition etc.
HumanHealth
In the future, expansion may be needed to distinguish between e.g. diseases and allergens.
Unknown
Note: we need more quantifiable terms here
3.6 DispersalStatus
DispersalStatus represents the type of Dispersal a species is having on a habitat. Multiple DispersalStatuses should be provided for species that Dispersal multiple habitats (i.e. marine and terrestrial).
3.6.1 DispersalStatus Concepts
DispersalStatuses are required to support the following general Concepts:
Taxon: At least a scientific name
Location: At least one code or name, coordinates do not apply to DispersalStatus. This location identifies the area for this dispersal event. A 'FromCountryCode' can be specified for the country of origin.
Modified
ValidDate
DispersalStatuses support the following additional Concepts:
Defines the standard used by the provider to identify the location of the record (i.e. US_HUC, US_FIPS, AR_PostalCode, etc.) in the fromLocationValue concept field. See locationStandard concept for more information.
A textural description of the route the organism took from the FromCountryCode. If used, a LanguageCode must be specified. For example, this would be 'This species was brought over on a coal ship.'
dateOfIntroductionPrecision Values
Name
Description
Day
Month
Unknown
Year
fromLocalityType Values
Name
Description
fromLocationStandard Values
Name
Description
introducedBy Values
Name
Description
Government
A government is believed to be responsible for the introduction. This could be a specific agency within the government, most military, and state run universities.
Individual
A single individual is believed to have caused the introduction (e.g., someone released their pet into the wild).
International organization
This would include any organization that exists across international boundaries - such as the GISIN.
Other
This would include non-governmental organizations that are not for profit (e.g., the nature conservancy), pirates, and churches among other groups.
Private sector
A for profit group is responsible for introduction. Private academia would be included in this category.
Unknown
mode Values
Name
Description
Accidental
Unintentional dispersal in association with human activity
Deliberate
Dispersal resulting from intentional human activity
Natural
Dispersal by an organisms' natural mechanisms and strategies without direct or indirect human intervention
Unknown
movement Values
Name
Description
InterRegionalMovement
The organism is dispersing between the regions specified as the 'from' and 'to' localities.
IntraRegionalMovement
The organism is dispersing within the region specified in the locality field.
Unknown
3.7 ManagementStatus
3.7.1 ManagementStatus Concepts
ManagementStatus represents the type of management activities involved with a species in a specified area. Multiple ManagementStatuses should be provided for if multiple activities are engaged in the same time period.
ManagementStatuses are required to support the following general Concepts:
Taxon: At least a scientific name
Location: At least one code or name, coordinates do not apply to SpeciesStatus
Modified
ValidDate
ManagementStatuses support the following additional Concepts:
Measures taken to keep a species within a defined area
Control
Measures taken to reduce a species’ biomass
Eradication
Actions taken that eliminate all occurrences of a species
Interception
Detection of a species at a border and prevention of its entering an area.
Mitigation
Actions taken to reduce the harmful effects of a species
None
A decision to take no action has been made.
Prevention
Measures taken to stop a species from entering an area.
outcome Values
Name
Description
Failed
The Action Failed; this should only be selected if the Status of the Action is 'Completed'.
Successful
The Action was successful; this should only be selected if the Status of the Action is 'Completed'.
Unconfirmed failure
The action is believed to have failed, but requires time to confirm the outcome.
Unconfirmed success
The action is believed to be successful, but requires time for confirmation.
Unknown
The Outcome of the management Action is unknown.
status Values
Name
Description
Completed
Prevention projects are usually ongoing, completed implies no further control effort.
Executing
Includes e.g. prevention systems in place and/or public education
Proposed
Includes even the suggestion that the activity would be a good idea, because this indicates concern about the organism
Unknown
References
Carlton, J.T. and G.M. Ruiz. 2002. Principles of Vector Science and Integrated Vector Management. In Mooney, H. et al. (eds.) Best Practices for the Prevention and Management of Alien Invasive Species. Island Press
IUCN 2000. Guidelines for the prevention of biodiversity loss due to biological invasion. IUCN – The World Conservation Union, Gland, Switzerland
Appendix A - Issues
See the protocol issues in the technical documentation
Appendix B - Changes
18-May-2007
Added values for DispersalStatus Cause and Vector
20-May-2007
Added ImpactStatus, DispersalStatus, and ManagementStatus
Added LanguageCode to SpeciesResourceURL
29-May-2007
Added acknowledgements
Added additional definitions to values
Added issue 6
12-August-2009
Updated concepts for ImpactStatus, DispersalStatus, and ManagementStatus based on GISIN3 results
Added GloballyUniqueIdentifier to all data models based on GISIN3 resutls
Updated other areas of data models based on resolved issues from GISIN3
15-December-2011
Separated out the GISIN protocol from the web service documentation