Disambiguation of plant binomial names and essential oil composition profiles
Abstract
The EssOilDB - ESSential OIL DataBase, (http://www.nipgr.ac.in/Essoildb/) is a continually updated knowledge resource which contains experimental records of essential oil composition data, from published reports. It also contains information on related geo-morphological factors at the time of collection and extraction in order to contextualize volatile profile patterns from a biological perspective. EssOilDB provides an opportunity for context based scientific research, through a multitude of queries on volatile profiles of native, invasive, normal or stressed plants, across taxonomic clades, geographical locations and several other biotic and abiotic influences. It contains records of emitted essential oils spanning a century of published reports on volatile profiles. Database normalization or disambiguation means the organization of the data in the database. It is a systematic, multi-step process that puts data into a tabular form, removing duplicated records from the relation tables. Normalization is used for mainly two purposes, for the elimination of redundant data and for ensuring data dependencies make sense i.e
data is logically stored. This project involves the normalization of plant names and profiles that were already existing in the EssOilDB version 1.0. Currently, the data contains 1838 plant names and 7157 compound emission records. The inconsistencies in the data include typographical errors, duplications and introduction of special characters. Some of the tools used during this project are R packages such as taxize, wikitaxa, WikidataR and Taxonstand. Other public databases used during the course of this project are uBio NameBank, the National Biodiversity Network (NBN), National Center for Biotechnology Information (NCBI), Catalogue of Life (COL), The Plant List (TPL), Encyclopedia of Life (EOL), Global Biodiversity Information Facility (GBIF) and Integrated Taxonomic Information System (ITIS).
Keywords: EssOilDB, volatile emissions, essential oils, geo-morphological factors
Abbreviations
EssOilDB | ESSential OIL DataBase |
TPL | The Plant List |
GBIF | Global Biodiversity Information Facility |
COL | Catalogue of Life |
NCBI | National Center for Biotechnology Information |
NBN | National Biodiversity Network |
ITIS | Integrated Taxonomic Information System |
CRAN | Comprehensive R Archive Network |
UTF | Unicode Transformation Format |
INTRODUCTION
Background
The EssOilDB (the ESSential OIL DataBase) is a continually updated knowledge resource for plant volatile emissions, containing experimental records of essential oil composition data, from published reports. EssOilDB also contains information on related geo-morphological factors at the time of collection and extraction in order to appreciate volatile profile patterns from a global perspective. EssOilDB provides an opportunity for context based scientific research, through a multitude of queries on volatile profiles of native, invasive, normal or stressed plants, across taxonomic clades, geographical locations and several other biotic and abiotic influences. It contains 123041 essential oil records spanning a century of published reports on volatile profiles, with data from 92 plant taxonomic families, spread across diverse geographical locations all over the globe. [Kumari S, et al., 2014]
R Programming
The name “R” refers to the computational environment initially created by Robert Gentleman and Robert Ihaka, similar in nature to the “S” statistical environment developed at Bell Laboratories. (http://www.r-project.org/about.html). It has since been developed and maintained by a strong team of core developers (R-core), who are renowned researchers in computational disciplines. R has gained wide acceptance as a reliable and powerful modern computational environment for statistical computing and visualisation, and is now used in many areas of scientific computation. R is free software, released under the GNU General Public License; this means anyone can see all its source code, and there are no restrictive, costly licensing arrangements. [Eglen, 2009]
The R language is widely used by biologists, and now has over 5,000 packages on the Comprehensive R Archive Network (CRAN) to extend R. R is great for manipulating, visualizing and fitting statistical models to data.
Disambiguation of a database
Database normalization or disambiguation means the organization of the data in the database. It is a systematic, multi-step process that puts data into a tabular form, removing duplicated records from the relation tables. Normalization is used for mainly two purposes, for the elimination of redundant data and for ensuring data dependencies make sense i.e data is logically stored. [https://searchsqlserver.techtarget.com/definition/normalization]
The use of taxonomic names is, unfortunately, not straightforward. Taxonomic names often vary due to name revisions at the generic or specific levels, lumping or splitting lower taxa (genera, species) among higher taxa (families), and name spelling changes.
Statement of Problems
- This project involves the normalization of plant names and chemical profiles in the EssoilDB version 1.0. Currently, the data contains 1838 plant names and 7157 profiles.
- The inconsistencies in the data include typographical errors, duplications, erroneous scientific names, introduction of special characters, lack of synonyms and suitable database structure.
Scope
Essential oils have huge potential in pharmacology both as preventive and treatment agents for a range of health disorders. Further, they have also shown to be involved in aromatherapy and facilitating skin penetra- tion and used for transdermal delivery of medicines. In addition to therapeutics, their commercial value in food and cosmetic industry has also increased tremendously. Apart from the scientists, the layman, entrepreneurs and farmers , can obtain the benifits from this database.
LITERATURE REVIEW
EssOilDB 1.0
Each EssOilDB record corresponds to the amount of emission of a particular compound in a specific oil profile. Further, in case a single journal article lists three different sets of volatile profiles, say for three different plant parts, or under three independent stresses, we treat the datasets as three independent records. Currently, the database contains a total of 123,041 such records spanning a century of published reports of essential oil profiles, starting from early 1900s to date. These records have been sourced from over 1520 citations and the data includes 1618 plant species, subspecies or varieties representing 92 distinct taxonomic families encompassing the entire range from ancient and lower plants like chlorophytes and mosses, to the gymnosperms and angiosperms. [Kumari S, et al, 2014] Fig 1 shows the various plant-specific and chemical-specific keys.
R Studio
RStudio provides popular open source and enterprise-ready professional software for the R statistical computing environment. It is an Integrated Development Environment (IDE) which aids in the development of R programs. [Allaire, 2012]
R Package 'taxize'
The taxize is a taxonomic tool belt for R. Taxize wraps APIs for a large suite of taxonomic databases available on the web. It has a suite of R functions that interact with many taxonomic data sources via their web APIs (Table 1).
Function name | What it does | Source |
eol_search | Search EOL taxon information | Encyclopedia of Life http://eol.org/ |
get_tsn | Get ITIS TSN | Integrated Taxonomic Information System http://www.itis.gov/ |
get_uid | Get NCBI UID | National Center for Biotechnology Information7 |
gnr_resolve | Resolve names using EOL's global names index | Global Names Resolver http://resolver.globalnames.org/ |
iucn_status | IUCN status | IUCN Red List http://www.iucnredlist.org |
searchbycommonname | Search ITIS by common name | Integrated Taxonomic Information System http://www.itis.gov/ |
searchbyscientificname | Search ITIS by scientific name | Integrated Taxonomic Information System http://www.itis.gov/ |
tax_rank | Get rank of a taxonomic name | Various |
R Package 'Taxonstand'
The Taxonstand package is an automated standardization of taxonomic names and removal of orthographic errors in plant species names using 'The Plant List' website (www.theplantlist.org). [Luis Cayuela & Anke Stein, 2017]
The Plant List
The Plant List (http://www.theplantlist.org/) is an on‐line database of plant names that aims to be comprehensive for all described plant species. Version 1 of The Plant List includes 1 040 426 plant name records, of which 298 900 are accepted names. The Plant List is the product of a consortium of the Royal Botanic Gardens, Kew, and the Missouri Botanical Garden. [Kalwij, 2012]
R Package 'wikitaxa' - Taxonomy data from Wikipedia
The goal of wikitaxa is to allow search and taxonomic data retrieval from across many Wikimedia sites, including: Wikipedia, Wikicommons, and Wikispecies. There are lower level and higher level parts to the package API:
1. Low level API: The low level API is meant for power users and gives you more control, but requires more knowledge.
- wt_wiki_page()
- wt_wiki_page_parse()
- wt_wiki_url_build()
- wt_wiki_url_parse()
- wt_wikispecies_parse()
- wt_wikicommons_parse()
- wt_wikipedia_parse()
2. High level API: The high level API is meant to be easier and faster to use.
- wt_data()
- wt_data_id()
- wt_wikispecies()
- wt_wikicommons()
- wt_wikipedia()
Search functions:
- wt_wikicommons_search()
- wt_wikispecies_search()
- wt_wikipedia_search()
[Scott Chamberlain, 2018]
Wikidata
Wikipedia has been collecting increasing amounts of structured data: numbers, dates, coordinates, and many types of relationships from family trees to the taxonomy of species. This data has become a resource of enormous value, with potential applications across all areas of science, technology, and culture. Actual uses of the data are rare and often restricted to very specific pieces of information, such as the geo-tags of Wikipedia articles used in Google Maps. The reason for this striking gap between vision and reality is that Wikipedia’s data is buried within 30 million Wikipedia articles in 287 languages, from where it is very difficult to extract. The same information often appears in articles in many languages and on many articles within a single language. [Vrandečić, et al, 2014]
The goal of Wikidata is to overcome these problems by creating new ways for Wikipedia to manage its data on a global scale. It has the following features:
- Open Editing: Like Wikipedia, Wikidata allows every user of the site to extend and edit the stored information, even without creating an account. A form-based interface makes editing very easy.
- Community Control: Not only the actual data but also the schema of the data is controlled by the contributor community. Contributors edit the population number of Rome, but they also decide that there is such a number in the first place.
- Plurality: It would be naive to expect global agreement on the ‘true’ data, since many facts are disputed or simply uncertain. Wikidata allows conflicting data to coexist and provides mechanisms to organize this plurality.
- Secondary Data: Wikidata gathers facts published in primary sources, together with references to these sources. There is no ‘true population of Rome’, but a ‘population of Rome as published by the city of Rome in 2011’.
- Multilingual Data: Most data is not tied to one language: numbers, dates, and coordinates have universal meaning; labels like Rome and population are translated into many languages. Wikidata is multi-lingual by design. While Wikipedia has independent editions for each language, there is only one Wikidata site.
- Easy Access: Wikidata’s goal is to allow data to be used both in Wikipedia and in external applications. Data is exported through Web services in several formats, including JSON and RDF. Data is published under legal terms that allow the widest possible reuse.
- Continuous Evolution: In the best tradition of Wikipedia, Wikidata grows with its community and tasks. Instead of developing a perfect system that is presented to the world in a couple of years, new features are deployed incrementally and as early as possible.
[Vrandečić, et al, 2014]
R Package 'WikidataR'
It is an API client for the Wikidata store of semantic data. [Oliver Keyes, 2017]
Global Biodiversity Information Facility (GBIF)
GBIF—the Global Biodiversity Information Facility—is an international network and research infrastructure funded by the world’s governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth. [https://www.gbif.org/en/what-is-gbif]
WORK FLOW
Concepts
Resolution of binomial names using taxize
The resolution of names is performed using the function gnr_resolve(), the syntax for which is given in Fig 7:
Arguments:
- names - character; taxonomic names to be resolved. Doesn’t work for vernacular/common names.
- data_source_ids - character; IDs to specify what data source is searched.
- resolve_once - logical; Find the first available match instead of matches across all data sources with all possible renderings of a name. When TRUE, response is rapid but incomplete.
- with_context - logical; Reduce the likelihood of matches to taxonomic homonyms. When TRUE a common taxonomic context is calculated for all supplied names from matches in data sources that have classification tree paths. Names out of determined context are penalized during score calculation.
- canonical - logical; If FALSE (default), gives back names with taxonomic authorities. If TRUE, returns canocial names (without tax. authorities and abbreviations).
- highestscore - logical; Return those names with the highest score for each searched name? Defunct
- best_match_only - (logical) If TRUE, best match only returned. Default: FALSE
- preferred_data_sources - (character) A vector of one or more data source IDs.
- with_canonical_ranks - (logical) Returns names with infraspecific ranks, if present. If TRUE, we force canonical=TRUE, otherwise this parameter would have no effect. Default: FALSE
- http - The HTTP method to use, one of "get" or "post". Default: "get". Use http="post" with large queries. Queries with > 300 records use "post" automatically because "get" would fail
- cap_first - (logical) For each name, fix so that the first name part is capitalized, while others are not. This web service is sensitive to capitalization, so you’ll get different results depending on capitalization. First name capitalized is likely what you’ll want and is the default. If FALSE, names are not modified. Default: TRUE
- fields - (character) One of minimal (default) or all. Minimal gives back just four fields, whereas all gives all fields back.
- ... Curl options passed on to crul::HttpClient
[Scott Chamberlain, 2017]
Resolution of binomial names using Taxonstand
The resolution of names is performed using the function TPL(). Fig 8 shows the usage of TPL.
Arguments:
- splist - A character vector specifying the input taxa, each element including genus and specific epithet and, potentially, author name and infraspecific abbreviation and epithet
- genus - A character vector containing the genera of plant taxon names. [Optional if taxa is submitted as input]
- species - A character vector containing the specific epithets of plant taxon names. [Optional if taxa is submitted as input]
- infrasp - A character vector containing the infraspecific epithets of plant taxon names. [Optional; required for specific queries only]
- infra - Logical. If TRUE (default), infraspecific epithets are used to match taxon names in TPL. [Optional; required for specific queries only]
- corr - Logical. If TRUE (default), spelling errors are corrected (only) in the specific and infraspecific epithets prior to taxonomic standardization. [Optional; required for specific queries only] [ Luis Cayuela & Anke Stein, 2017 ]
Retrieval of id from Wikidata using WikidataR
The id's of all the binomial names were retreived from Wikidata. The function find_item() aids in retrieving a set of Wikidata items where the aliase or descriptions match a particular search term. Its usage is shown below:
Arguments:
- search_term - a term to search for.
- language - the language to return the labels and descriptions in; this should consist of an ISO language code. Set to "en" by default.
- limit - the number of results to return; set to 10 by default.
- ... further arguments to pass to httr's GET.
Methods
Resolution of binomial names using the taxize package
This was the first step taken in the project. The names were obtained from the EssoilDB v1.0 in csv format [See Appendix A]. This was the input file ("a") for this step.
Resolution of binomial names using the Taxonstand package
As the Taxonstand package uses The Plant List database, which is not used by the taxize package, this step was carried out to ensure the resolution using this database as well. The input file was the same as the input in the previous step.
Normalization of binomial names using GBIF web interface
1. A separate Microsoft Excel document containing a single column of the plant names only - was created.
2. The web page https://www.gbif.org/en/tools/species-lookup was used during this step.
3. A screenshot of the web page is shown below:
4. The newly created document was uploaded into the space provided (on the web page) after renaming the header as “scientificName”.
5. After submission, the following is displayed on screen:
6. The kingdom “Plantae” was selected and the “MATCH TO GBIF BACKBONE” button was clicked.
Retrieval of the synonyms of the plant names
The following procedure was followed in order to retrieve the synonym:
1. The normalized plant names were used during this step.
2. The taxize package has functions which can obtain the id and the synonyms for the respective taxon.
3. Catalogue of Life database was used in this as it contained most of the taxons.
4. The taxize functions which deal with this database are:
a) get_colid()
b) synonyms()
The below Fig 14 shows the code run for obtaining the synonyms of one of the plants - Abies alba
Retrieval of common names of plants
The following steps were followed in order to obtain the common names:
1. The normalized names were retrieved from the file and were encoded with UTF-8 (to ensure that the special characters are retained in the data during processing). The code used was as follows:
2. The wikitaxa package in R was used during this step. The code used is shown below:
Extraction of Wiki id for the plants
The Wiki id corresponding to each taxon was retrieved using the R platform. The column containing scientific names was extracted from the GBIF output file and encoded in UTF-8 format. The code used is as follows:
Classification of binomial names based on their status
A) The following code was run on R platform to extract the information from the GBIF output file - a:
B) The following code was used to determine whether the taxonomic name is a synonym or not:
C) The following code was used to determine whether the taxonomic name is accepted or not:
RESULTS AND DISCUSSION
Resolution of Binomial Names Using the Taxize Package
A table containing the results was obtained after running the function gnr_resolve(); a sample of which is given below:
user_supplied_name | submitted_name | matched_name | data_source_title | score | |
1 | Abies alba | Abies alba | Abies alba | NCBI | 0.988 |
2 | Abies borisii-regis | Abies borisii-regis | Abies borisii-regis | NCBI | 0.988 |
3 | Abies cephalonica | Abies cephalonica | Abies cephalonica | NCBI | 0.988 |
4 | Abies sachalinensis | Abies sachalinensis | Abies sachalinensis | NCBI | 0.988 |
5 | Acacia caven | Acacia caven | Acacia caven | Freebase | 0.988 |
6 | Acacia nuperrima | Acacia nuperrima | Acacia nuperrima | NCBI | 0.988 |
7 | Acacia nuperrima | Acacia nuperrima | Acacia nuperrima | NCBI | 0.988 |
8 | Acalypha segetalis | Acalypha segetalis | Acalypha segetalis | EOL | 0.988 |
9 | Achillea abrotanoides | Achillea abrotanoides | Achillea abrotanoides | NCBI | 0.988 |
10 | Achillea ageratum | Achillea ageratum | Achillea ageratum | NCBI | 0.988 |
Resolution of Binomial Names Using the Taxonstand Package
A table containing the results was obtained after running the function TPL(). The table contained the following columns:
- Taxon
- Genus
- Hybrid.marker
- Species
- Abbrev
- Infraspecific.rank
- Infraspecific
- Authority ID
- Plant.Name.Index
- TPL.version
- Taxonomic.status
- Family
- New.Genus
- New.Hybrid.marker
- New.Species
- New.Infraspecific.rank
- New.Infraspecific
- New.Authority
- New.ID
- New.Taxonomic.status
- Typo
- WFormat
- Higher.level
- Date
Some of the important columns are shown below:
Taxon | Taxonomic.status | New.Genus | New.Species | New.Authority | New.Taxonomic.status | Typo |
Abies alba | Accepted | Abies | alba | Mill. | Accepted | FALSE |
Abies borisii-regis | Accepted | Abies | borisii-regis | Mattf. | Accepted | FALSE |
Abies cephalonica | Accepted | Abies | cephalonica | Loudon | Accepted | FALSE |
Abies sachalinensis | Accepted | Abies | sachalinensis | (F.Schmidt) Mast. | Accepted | FALSE |
Acacia caven | Accepted | Acacia | caven | (Molina) Molina | Accepted | FALSE |
Acacia nuperrima | Accepted | Acacia | nuperrima | Baker f. | Accepted | FALSE |
Acacia nuperrima | Accepted | Acacia | nuperrima | Baker f. | Accepted | FALSE |
Acalypha segetalis | Accepted | Acalypha | segetalis | Müll.Arg. | Accepted | FALSE |
Achillea abrotanoides | Accepted | Achillea | abrotanoides | (Vis.) Vis. | Accepted | FALSE |
Achillea coarctata | Accepted | Achillea | coarctata | Poir. | Accepted | FALSE |
Normalization of Binomial Names Using GBIF Web Interface
The following image (Fig. 21) shows a sample of the data obtained after step (6) mentioned in Section 5.2.3:
1. A csv file containing results was obtained by selecting the option - “Generate CSV” which is displayed at the end of the results page.
2. The resulting file contains the following columns:
- occurrenceId
- verbatimScientificName (user-submitted name)
- scientificName (name existing in the database)
- key (unique number assigned to the particular species on GBIF
- matchType (3 levels of result - EXACT, FUZZY, HIGHERRANK)
- EXACT means the name exactly matches with the entry in the database
- FUZZY indicates entries that may be mis-spelt
- HIGHERRANK implies that the specific epithet of the entry is not being recognized (in other words, only genus is recognized)
- confidence (expressed in terms of percentage)
- status (3 levels of result - ACCEPTED, SYNONYM or DOUBTFUL)
- ACCEPTED Treated as accepted
- DOUBTFUL Treated as accepted, but doubtful whether this is correct.
- SYNONYM A general synonym, the exact type is unknown.
- rank (the highest rank recognized)
- kingdom
- phylum
- class
- order
- family
- genus
- species
Some of the important columns are shown below:
verbatimScientificName | scientificName | key | matchType | status |
Abies alba | Abies alba Mill. | 2685484 | EXACT | ACCEPTED |
Abies borisii-regis | Abies borisii-regis Mattf. | 2685519 | EXACT | ACCEPTED |
Abies cephalonica | Abies cephalonica Loudon | 2685326 | EXACT | ACCEPTED |
Abies sachalinensis | Abies sachalinensis Mast. | 2685437 | EXACT | ACCEPTED |
Acacia caven | Acacia caven (Molina) Molina | 2979244 | EXACT | SYNONYM |
Acacia nuperrima | Acacia nuperrima Baker f. | 2980107 | EXACT | ACCEPTED |
Acalypha segetalis | Acalypha segetalis MÌ_ll.Arg. | 3056915 | EXACT | ACCEPTED |
Achillea ageratum | Achillea ageratum L. | 3120391 | EXACT | ACCEPTED |
Achillea beibersteinii | Achillea beibersteinii Afan. | 7400456 | EXACT | DOUBTFUL |
Achillea biebersteinii | Achillea biebersteinii C.Afan. | 3120276 | EXACT | SYNONYM |
Ajuga austro-iranica | Ajuga austroiranica Rech.f. | 3888049 | FUZZY | ACCEPTED |
Some duplications and binomials which were misspelt could be identified using Table 4. The different classes of the 'status' were analyzed.
The names that were shown to have FUZZY matchType were rectified according to the database entry. For example, the last entry is shown to be FUZZY; in this case, there was an additional "-" in the middle of the specific epithet.
[Refer to Appendix B for the GBIF output file and Appendix C for the file containing normalized names]
Retrieval of the Synonyms of the Plant Names
[Refer to Appendix D for the complete file]
Retrieval of Common Names of Plants
After performing the steps mentioned in Section 5.2.5., common names of 378 plants were obtained out of a total of 1838 plants. Fig. 23 shows the common names of ten plants.
[Refer to Appendix D for the complete file]
Extraction of Wiki id for the Plants
After performing the steps mentioned in 5.2.6., the wiki id for 1710 plants were obtained out of a total of 1838 plants. Fig. 24 shows the first 50 plant names with their respective wiki id:
[Refer to Appendix D for the complete file]
Classification of Binomial Names Based on Their Status
A) A vector, representing whether the plant name is a synonym or not, was obtained. A sample of this is shown below:
B) A vector, representing whether the plant name is accepted or not, was obtained. A sample of this is shown below:
CONCLUSIONS
Through this project, I was able to gain a complete understanding of the use of RStudio and R packages. This also helped me in understanding the management of a database. Some of the identified errors are duplications, introduction of special characters in binomial names, presence of hybrids in the data and typographical errors. After the removal of inadvertent special characters, 99 typographical errors were identified and rectified. There were about 51 duplications that were totally identified. The duplicate entries have to be merged and the profile-keys related to the duplicated entries of plants should be mapped to the same plant. Some of the issues that are yet to be resolved are:
- There were 7 hybrids with incomplete information.
- Some of the plants have aff. in their names which implies that that species has affinity towards a particular species (for example, Mentha aff. Rotundifoliahas has affinity towards Mentha rotundifolia).
- Some entries such as Cinnamomum fragrans and Serotinocarpum insignis, are not found in any database.
- Suffix spp. is added to some generic epithets. This makes the plant unspecific. (Xanthostemon spp.)
ACKNOWLEDGEMENTS
The success and final outcome of this project required a lot of guidance and assistance from many people. I am grateful to Indian Academy of
Sciences, Indian National Science Academy and The National Academy of Sciences, India for providing me this opportunity to carry out this project.
I would like to extend my gratitude to Mrs. Vineeta Lamba and Mr. Manish Kumar for introducing me to the project and for providing a strong
foundation to work on. I am grateful to all the faculty members, research scholars and other employees at NIPGR for their assistance during the course of this project.
I am also very grateful to Department of Biotechnology, Ramaiah Institute of Technology for the guidance and encouragement provided by them during the application process of The Academies' Summer Research Fellowship Programme 2019.
APPENDICES
Appendix A:
Link to initial data: https://github.com/gilienv/EssOilDB/blob/master/v1.0/essoildb.plantdata.csv
Appendix B:
Link to the file containing normalized names and observed errors: https://github.com/gilienv/EssOilDB/blob/master/tables/plant/normalized_names.csv
Appendix C:
Link to the file containing normalized names and observed errors: https://github.com/gilienv/EssOilDB/blob/master/tables/plant/normalized_names.csv
Appendix D:
Link to the file containing the details such as Wiki ID, common names and synonyms: https://github.com/gilienv/EssOilDB/blob/master/tables/plant/details.txt
References
-
https://searchsqlserver.techtarget.com/definition/normalization
-
Kumari S, Pundhir S, Priya P, Jeena G, Punetha A, Chawla K, Jafaree Z, Mondal S and Yadav G (2014). EssOilDB: A database of essential oils reflecting terpene composition and variability in the plant kingdom. Database (DOI: 10.1093/database/bau120)
-
Allaire, J. (2012). RStudio: integrated development environment for R.Boston, MA,770.
-
Luis Cayuela & Anke Stein (2017). https://CRAN.R-project.org/package=Taxonstand
-
Kalwij, J. M. (2012). Review of ‘The Plant List, a working list of all plant species’.Journal of Vegetation Science,23(5), 998-1002.
-
Scott Chamberlain and Ethan Welty (2018). wikitaxa: Taxonomic Information from 'Wikipedia'. R package version 0.3.0. https://CRAN.R-project.org/package=wikitaxa
-
Vrandečić, D., & Krötzsch, M. (2014). Wikidata: a free collaborative knowledge base.
-
Oliver Keyes, Serena Signorelli, Christian Graul and Mikhail Popov (2017). WikidataR: API Client Library for 'Wikidata'. R package version 1.4.0. https://CRAN.R-project.org/package=WikidataR
-
https://www.gbif.org/en/what-is-gbif
-
Scott Chamberlain (2017). https://ropenscilabs.github.io/taxize-book/
Source
-
Fig 1: nipgr.ac.in/Essoildb/
-
Table 1: Chamberlain, S. A., & Szöcs, E. (2013). taxize: taxonomic search and retrieval in R. F1000Research, 2.
-
Fig 3: http://www.theplantlist.org/
-
Fig 5: https://www.wikidata.org/wiki/Wikidata:Main_Page
-
Fig 6: https://www.gbif.org/
-
Fig 7: Scott Chamberlain (2017). https://ropenscilabs.github.io/taxize-book/
-
Fig 8: Luis Cayuela, Anke Stein and Jari Oksanen (2017). Taxonstand: Taxonomic Standardization of Plant Species Names. R package version
-
Fig 9: Oliver Keyes, Serena Signorelli, Christian Graul and Mikhail Popov (2017). WikidataR: API Client Library for 'Wikidata'. R package version 1.4.0. https://CRAN.R-project.org/package=WikidataR
Post your comments
Please try again.