<?xml version="1.0" encoding="UTF-8"?>
<resource xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4">
 <titles>
  <title>Danube fish occurrence database</title>
 </titles>
 <descriptions>
  <description descriptionType="Other"><![CDATA[<p>The present database compiles and standardize fish occurrence datasets from federal agencies, research institutes, and conservation organizations, integrating data from sources such as the Global Biodiversity Information Facility, the Joint Danube Surveys, and the European Fish Index.&nbsp; It contains 133,103 occurrence records across 116 fish species, representing 30 families and 17 orders, with a temporal range from 1856 to 2024, organized into 38 columns. In total, 506,290 entries were collected and subsequently subjected to quality checks and cleaning procedures. To facilitate data collation, formatting, and quality control, an R package, <a href="https://github.com/ytorres-cambas/danubeoccurR">danubeoccurR</a> , was developed, which streamlined the entire process. Additional R packages, <a href="https://github.com/glowabio/hydrographr">hydrographr</a> and <a href="https://github.com/AnthonyBasooma/specleanr">specleanr</a>, were incorporated into the workflow to aid in data manipulation and geospatial analysis.</p>
<p>A visualization of the spatial distribution of records is available in <a href="https://geo.igb-berlin.de/layers/geonode:danube4all_fish_occurrence_records">https://geo.igb-berlin.de/layers/geonode:danube4all_fish_occurrence_records</a>.</p>
<p><strong>GBIF data citation</strong>: GBIF.org (19 October 2024) GBIF Occurrence Download <a href="https://doi.org/10.15468/dl.yug2ad">https://doi.org/10.15468/dl.yug2ad</a>.</p>
<p><strong>Taxonomic Validation</strong>: The taxonomic names of species were verified against FishBase to ensure compliance with the most up-to-date fish taxonomy.<br />&nbsp;&nbsp; &nbsp;<br /><strong>Spatial Distribution Validation</strong>: The spatial distribution of species occurrences was assessed to ensure that the recorded locations were geographically plausible and consistent with known habitats for each species. To achieve this, the dataset was first compared against environmental maps to identify and flag potential environmental outliers, and subsequently cross-checked by the data provider.<br />&nbsp;&nbsp; &nbsp;<br /><strong>Temporal Validation</strong>: The temporal distribution of occurrences was examined to check for inconsistencies or improbable records.<br />&nbsp;&nbsp; &nbsp;<br /><strong>Data Completeness and Consistency</strong>: The dataset was examined for missing values, duplicates, and inconsistencies in key fields such as coordinates, dates, and species names. Gaps in data were identified, and missing or inconsistent records were flagged for review and potential correction.</p>
<div><strong>Data Access and Format: </strong>The dataset is available in a standard tabular format (CSV) using Darwin Core-compliant terminology to ensure compatibility with biodiversity databases. Users should refer to the metadata file for a detailed description of the column names. For convenience, a custom function named <em>split_and_save_csv()</em> is provided in <a href="https://github.com/ytorres-cambas/danubeoccurR">danubeoccurR</a> to split the occurrence dataset into independent datasets. <br /><br /><strong>Geospatial Considerations: </strong>Species occurrences are georeferenced based on available locality information. Users should be aware that some records may have variable spatial precision, particularly historical occurrences. It is recommended to apply spatial filtering techniques suited to the intended analysis. For example, the coordinate uncertainty provided for records sourced from GBIF can help determine whether a record is suitable for a given analysis. Additionally, the function <em>snap_points_on_map()</em> in <a href="https://github.com/ytorres-cambas/danubeoccurR">danubeoccurR</a> allows users to manually adjust occurrence points for greater precision. <br /><br /><strong>Taxonomic Standardization: </strong>Users are advised to cross-check species names with updated taxonomic databases if taxonomic revisions occur after the dataset's publication.<br /><br /><strong>Data Quality and Potential Limitations: </strong>While efforts were made to standardize and clean the data, users should consider potential sources of bias, including sampling effort variations, taxonomic misidentifications, or incomplete historical records. Some records have been flagged as environmental outliers based on inconsistencies between species occurrence and expected environmental conditions. These flagged records should be reviewed carefully and may require further investigation or validation before inclusion in analyses.</div>
<div>&nbsp;</div>
<p>Authors: Yusdiel Torres-Cambas, Andr&aacute;s Ambrus, Mikl&oacute;s B&aacute;n, B&aacute;lint B&aacute;n&oacute;, Anthony Basooma, Vanessa Bremerich, Florian Borgwardt, Ma&scaron;a Čarf, Irina Cernisencu, Gorčin Cvijanović, Istv&aacute;n Czegl&eacute;di, Sami Domisch, Tibor Erős, Zolt&aacute;n Feh&eacute;r, Vivien F&uuml;st&ouml;s, Juergen Geist, Thomas Hein, Milica Jaćimović, Sonja C. J&auml;hnig, B&eacute;la Kiss, Maro&scaron; Kubala, Klaudija Lebar, Borislava Kostadinova Margaritova, Matej Marusic, Paul Meulenbroek, Stoyan Dobrev Mihov, Attila Mozs&aacute;r, Zolt&aacute;n M&uuml;ller, Christoffer Nagel, Iulian Nichersu, Du&scaron;an Nikolić, Sandi Orlic, Joachim Pander, Polona Pengal, Marina Piria, L&aacute;szl&oacute; Poly&aacute;k, B&aacute;lint Preiszner, Simon Rusjan, M&aacute;rton Sallai, Zolt&aacute;n Sallai, P&eacute;ter S&aacute;ly, Andrea Samu, Brigitte Sasano, Astrid Schmidt-Kloiber, Andr&aacute;s Sevcsik, Marija Smederevac-Lalić, Andr&aacute;s Speczi&aacute;r, Twan Stoffers, Zolt&aacute;n Szal&oacute;ky, Ren&aacute;ta Szita, G&aacute;bor Tak&aacute;cs, P&eacute;ter Tak&aacute;cs, Maxim Teichert, Milcho Todorov, Bal&aacute;zs T&oacute;th, Theodora Trichkova, Damir Valić, Zolt&aacute;n Vit&aacute;l, Martin Tschikof</p>]]></description>
 </descriptions>
 <geoLocations>
  <geoLocation>
   <geoLocationPlace></geoLocationPlace>
   <geoLocationPoint>
    <pointLongitude></pointLongitude>
    <pointLatitude></pointLatitude>
   </geoLocationPoint>
  </geoLocation>
 </geoLocations>
 <resourceType resourceTypeGeneral="Dataset"></resourceType>
 <contributors>
  <contributor contributorType="ContactPerson">
   <contributorName nameType="Personal">Yusdiel Torres-Cambas</contributorName>
  </contributor>
 </contributors>
</resource>
