Commit c847d826 authored by Willem ter Berg's avatar Willem ter Berg

Split several sections into separate chapters to increase readability

parent 9a4fbb16
<component name="ProjectDictionaryState">
<dictionary name="w.terberg">
<words>
<w>dataset</w>
</words>
</dictionary>
</component>
\ No newline at end of file
This diff is collapsed.
DCAT to CKAN mapping
===================================================
In the tables below the exact mapping of DCAT-AP-DONL properties to their CKAN schema counterparts is shown.
.. table::
:widths: 50 50
============================== ======================================
DCAT Property CKAN Property
============================== ======================================
Dataset.identifier Dataset.identifier
Dataset.description Dataset.notes
Dataset.title Dataset.name
Dataset.language Dataset.language
Dataset.modified Dataset.modified
Dataset.contactPoint Dataset.contact_point_name
\ Dataset.contact_point_email
\ Dataset.contact_point_website
\ Dataset.contact_point_phone
\ Dataset.contact_point_address
\ Dataset.contact_point_title
Dataset.distribution Dataset.resources
Dataset.keyword Dataset.tags
Dataset.publisher Dataset.publisher
Dataset.theme Dataset.theme
Dataset.landingPage Dataset.url
Dataset.spatial Dataset.spatial_scheme
\ Dataset.spatial_value
Dataset.temporal Dataset.temporal_label
\ Dataset.temporal_start
\ Dataset.temporal_end
Dataset.authority Dataset.authority
Dataset.accessRights Dataset.access_rights
Dataset.conformsTo Dataset.conforms_to
Dataset.documentation Dataset.documentation
Dataset.frequency Dataset.frequency
Dataset.hasVersion Dataset.has_version
Dataset.isVersionOf Dataset.is_version_of
Dataset.otherIdentifier Dataset.alternative_identifier
Dataset.provenance Dataset.provenance
Dataset.relatedResource Dataset.related_resource
Dataset.releaseDate Dataset.issued
Dataset.sample Dataset.sample
Dataset.source Dataset.source
Dataset.version Dataset.version
Dataset.versionNotes Dataset.version_notes
Dataset.grondslag Dataset.legal_foundation_ref
\ Dataset.legal_foundation_uri
\ Dataset.legal_foundation_label
Dataset.datasetStatus Dataset.dataset_status
Dataset.datePlanned Dataset.date_planned
Distribution.accessURL Resource.url
Distribution.description Resource.description
Distribution.format Resource.format
Distribution.license Resource.license
Distribution.byteSize Resource.size
Distribution.checksum Resource.hash
\ Resource.hash_algorithm
Distribution.documentation Resource.documentation
Distribution.downloadURL Resource.download_url
Distribution.language Resource.language
Distribution.linkedSchemas Resource.linked_schemas
Distribution.mediaType Resource.mimetype
Distribution.releaseDate Resource.release_date
Distribution.rights Resource.rights
Distribution.status Resource.status
Distribution.title Resource.name
Distribution.modified Resource.modification_date
CatalogRecord.modified Dataset.metadata_modified
CatalogRecord.conformsTo Dataset.conforms_to
CatalogRecord.changeType Dataset.changetype
CatalogRecord.listingDate Dataset.metadata_created
CatalogRecord.description Dataset.notes
CatalogRecord.language Dataset.metadata_language
CatalogRecord.sourceMetadata Dataset.source_catalog
CatalogRecord.title Dataset.title
============================== ======================================
......@@ -4,12 +4,11 @@ CKANEXT-DCATDONL
The CKAN extension that implements the DCAT-AP-DONL metadata standard into CKAN.
.. toctree::
:maxdepth: 3
:maxdepth: 2
:caption: Table of Contents
summary
installation
usage
ckan_schema
dcat_ckan_mapping
schema
plugin_structure
Setting up the background process
===================================================================================================
In order for the ckanext-dcatdonl plugin to function properly, a background process must run at
least once a day. This background process retrieves the latest versions of the valuelists and saves
these locally. This process is run by executing the following command once a day via a CRON job for
example.
.. code-block:: bash
python /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/ValuelistUpdater.py
Ensure that the python script has READ and WRITE access to the following directory and its contents
.. code-block:: bash
/usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources
The extension can function without the background process running, however this means that the
valuelists that are used as part of the DCAT-AP-DONL metadata standard will never be updated.
Installing the ckanext-dcatdonl plugin
===================================================================================================
Follow the steps listed below to install and activate the ckanext-dcatdonl extension into CKAN.
1. With your CKAN virtual environment activated:
.. code-block:: bash
. /usr/lib/ckan/default/bin/activate
pip install -e git+https://gitlab.textinfo.nl/opensource/ckanext-dcatdonl.git#egg=ckanext-dcatdonl
2. Edit your CKAN .ini configuration file and add the following
.. code-block:: ini
ckan.plugins = ... dcatdonl
3. In the same file, add (or change) the `licenses_group_url` property in the `[app:main]` section
to
.. code-block:: ini
licenses_group_url = file:///usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/overheid_license.json
4. Restart apache2
You have now successfully installed the `ckanext-dcatdonl` plugin
Requirements
===========================================
The plugin was developed with the following versions in mind.
CKAN
-------------------------------------------
The plugin functions correctly with these CKAN versions:
.. list-table::
:widths: 25 75
:header-rows: 1
* - Version
- Reference
* - `2.7.3`
- http://docs.ckan.org/en/ckan-2.7.3/
* - `2.7.4`
- http://docs.ckan.org/en/2.7/
* - `2.8.0`
- http://docs.ckan.org/en/2.8/
It is likely that the plugin functions correctly in earlier and later versions, however only the
above mentioned CKAN versions have been tested and confirmed to work.
PostgreSQL
-------------------------------------------
CKAN uses PostgreSQL with version :code:`9.2` or higher.
Python
-------------------------------------------
CKAN itself, and the ckanext-dcatdonl plugin are written in :code:`Python 2.7.x`. As such, the
CKAN host must have this version of Python installed.
Installation
===========================================
===================================================================================================
Follow the steps listed below to install and activate the ckanext-dcatdonl extension into CKAN.
This chapter covers all the information required to install the ckanext-dcatdonl plugin into a CKAN
installation.
1. With your CKAN virtual environment activated:
.. toctree::
:maxdepth: 3
:caption: Contents
.. code-block:: bash
. /usr/lib/ckan/default/bin/activate
pip install -e git+https://gitlab.textinfo.nl/opensource/ckanext-dcatdonl.git#egg=ckanext-dcatdonl
2. Edit your CKAN .ini configuration file and add the following
.. code-block:: ini
ckan.plugins = ... dcatdonl
3. In the same file, add (or change) the `licenses_group_url` property in the `[app:main]` section to
.. code-block:: ini
licenses_group_url = file:///usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/overheid_license.json
4. Restart apache2
You have now successfully installed the `ckanext-dcatdonl` plugin
Background process
--------------------------------------------
In order for the ckanext-dcatdonl plugin to function properly, a background process must run at least once a day. This
background process retrieves the latest versions of the valuelists and saves these locally. This process is run by
executing the following command once a day via a CRON job for example.
.. code-block:: bash
python /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/ValuelistUpdater.py
Ensure that the python script has READ and WRITE access to the following directory and its contents
.. code-block:: bash
/usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources
The extension can function without the background process running, however this means that the valuelists that are used
as part of the DCAT-AP-DONL metadata standard will never be updated.
Requirements
----------------------------------------------
The plugin was developed with the following versions in mind. While it is likely that the plugin will function at
earlier versions, no guarantees can be given for such use cases.
.. table::
:widths: 50 50
+------------+---------+
| Software | Version |
+============+=========+
| CKAN | 2.7.3 |
+------------+---------+
| PostgreSQL | 9.2 |
+------------+---------+
| Python | 2.7 |
+------------+---------+
installation-requirements
installation-plugin
installation-backgroundprocess
Plugin structure
===================================================
The structure of the ckanext-dcatdonl plugin can best be described by the model below. This model identifies all the
components and the relationships these components have. The entrypoint components have been marked grey.
The structure of the ckanext-dcatdonl plugin can best be described by the model below. This model
identifies all the components and the relationships these components have. The entrypoint
components have been marked grey.
.. image:: pluginstructure.png
.. image:: _static/pluginstructure.png
Dataset
=====================================================
.. list-table::
:widths: 22 45 33
:header-rows: 1
* - Property
- Description
- Validation
* - identifier
- A global identifier that identifies the dataset
- Required, String, Is URI
* - alternate_identifier
- Alternate identifiers that identify the dataset
- Optional, List, Are URIs
* - language
- The languages used for the data found in the dataset
- Required, From :code:`donl:language`
* - authority
- Entity that is responsible for the contents of the dataset
- Required, String, From :code:`donl:organizations`
* - publisher
- Entity responsible for maintenance and publication of the dataset
- Required, String, From :code:`donl:organizations`
* - contact_point_email
- Email of the contact point
- Optional, String
* - contact_point_address
- Address of the contact point
- Optional, String
* - contact_point_name
- Name of the contact point
- Required, String
* - contact_point_phone
- Phonenumber of the contact point
- Optional, String
* - contact_point_website
- Webaddress of the contact point
- Optional, String
* - contact_point_title
- Title of the contact point, if it describes a person
- Optional, String
* - access_rights
- The level of openness of the dataset
- Optional, String, From :code:`overheid:openhaarheidsniveau`
* - url
- Webpage that provides additional information about the dataset, its metadata or its authority
- Optional, String, Is URL
* - conforms_to
- Standards the dataset conforms to
- Optional, List, Are URIs
* - related_resource
- Resources related to the dataset
- Optional, List, Are URIs
* - source
- Dataset on which this dataset is based
- Optional, List, Are URIs
* - version
- The version of the dataset
- Optional, String
* - version_notes
- Version notes of the dataset
- Optional, List, Strings
* - issued
- Date and time on which the dataset was published
- Optional, String, :code:`yyyy-mm-ddThh:mm:ss`
* - has_version
- References to datasets which are based on this dataset
- Optional, List
* - is_version_of
- References to datasets on which this dataset is based
- Optional, List, Are URIs
* - legal_foundation_ref
- specific reference of the legal_foundation
- Optional, String
* - legal_foundation_uri
- URI of the legal_foundation
- Optional, String, Is URI
* - legal_foundation_label
- Label of the legal foundation
- Optional, String
* - frequency
- How often the dataset is updated
- Optional, String, From :code:`overheid:frequency`
* - provenance
- Webpages that describe how this dataset came to be
- Optional, List, Are URLs
* - documentation
- Webpages about the dataset
- Optional, List, Are URLs
* - sample
- Sample data of the dataset
- Optional, List, Are URLs
* - license
- The license that applies to the dataset
- Required, From :code:`overheid:license`
* - title
- The title of the dataset
- Required, String
* - notes
- The description of the dataset
- Required, String
* - tags
- Keywords to describe the dataset
- Optional, List
* - metadata_language
- The language used in the metadata of the dataset
- Required, List, From :code:`donl:language`
* - theme
- One or more themes that describe the dataset
- Required, List, From :code:`overheid:taxonomiebeleidsagenda`
* - source_catalog
- The original catalog of the dataset
- Optional, From :code:`donl:catalogs`
* - changetype
- The latest action taken on the dataset
- From :code:`adms:changetype`, ckanext-dcatdonl will set the correct value for this property
* - modified
- The date and time this dataset was last modified
- Required, String, :code:`yyyy-mm-ddThh:mm:ss`
* - spatial_scheme
- The schemes of the spatial value
- Optional, List, From :code:`overheid:spatial_scheme`
* - spatial_value
- Geographical locations based on the spatial_schemes provided
- Optional, List, Validates against schemes defined in spatial_scheme
* - temporal_label
- A name of a timeperiod
- Optional, String
* - temporal_start
- A point in time, together with temporal_end it describes a period in time
- Optional, String, :code:`yyyy-mm-ddThh:mm:ss` Must be smaller than temporal_end
* - temporal_end
- A point in time, together with temporal_start it describes a period in time
- Optional, String, :code:`yyyy-mm-ddThh:mm:ss` Must be greater than temporal_start
* - dataset_status
- State of the dataset, it describes the availability of the dataset
- Optional, String, From :code:`overheid:datasetStatus`
* - date_planned
- The date and time upon which it is planned that the dataset becomes available
- Optional, String, :code:`yyyy-mm-ddThh:mm:ss`
DCAT to CKAN mapping
===================================================
In the tables below the exact mapping of DCAT-AP-DONL properties to their CKAN schema counterparts
is shown.
DCAT Dataset
---------------------------------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - DCAT Property
- CKAN Property
* - Dataset.identifier
- Dataset.identifier
* - Dataset.description
- Dataset.notes
* - Dataset.title
- Dataset.name
* - Dataset.language
- Dataset.language
* - Dataset.modified
- Dataset.modified
* - Dataset.contactPoint
- Dataset.contact_point_name
* -
- Dataset.contact_point_email
* -
- Dataset.contact_point_website
* -
- Dataset.contact_point_phone
* -
- Dataset.contact_point_address
* -
- Dataset.contact_point_title
* - Dataset.distribution
- Dataset.resources
* - Dataset.keyword
- Dataset.tags
* - Dataset.publisher
- Dataset.publisher
* - Dataset.theme
- Dataset.theme
* - Dataset.landingPage
- Dataset.url
* - Dataset.spatial
- Dataset.spatial_scheme
* -
- Dataset.spatial_value
* - Dataset.temporal
- Dataset.temporal_label
* -
- Dataset.temporal_start
* -
- Dataset.temporal_end
* - Dataset.authority
- Dataset.authority
* - Dataset.accessRights
- Dataset.access_rights
* - Dataset.conformsTo
- Dataset.conforms_to
* - Dataset.documentation
- Dataset.documentation
* - Dataset.frequency
- Dataset.frequency
* - Dataset.hasVersion
- Dataset.has_version
* - Dataset.isVersionOf
- Dataset.is_version_of
* - Dataset.otherIdentifier
- Dataset.alternative_identifier
* - Dataset.provenance
- Dataset.provenance
* - Dataset.relatedResource
- Dataset.related_resource
* - Dataset.releaseDate
- Dataset.issued
* - Dataset.sample
- Dataset.sample
* - Dataset.source
- Dataset.source
* - Dataset.version
- Dataset.version
* - Dataset.versionNotes
- Dataset.version_notes
* - Dataset.grondslag
- Dataset.legal_foundation_ref
* -
- Dataset.legal_foundation_uri
* -
- Dataset.legal_foundation_label
* - Dataset.datasetStatus
- Dataset.dataset_status
* - Dataset.datePlanned
- Dataset.date_planned
DCAT Distribution
---------------------------------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - DCAT Property
- CKAN Property
* - Distribution.accessURL
- Resource.url
* - Distribution.description
- Resource.description
* - Distribution.format
- Resource.format
* - Distribution.license
- Resource.license
* - Distribution.byteSize
- Resource.size
* - Distribution.checksum
- Resource.hash
* -
- Resource.hash_algorithm
* - Distribution.documentation
- Resource.documentation
* - Distribution.downloadURL
- Resource.download_url
* - Distribution.language
- Resource.language
* - Distribution.linkedSchemas
- Resource.linked_schemas
* - Distribution.mediaType
- Resource.mimetype
* - Distribution.releaseDate
- Resource.release_date
* - Distribution.rights
- Resource.rights
* - Distribution.status
- Resource.status
* - Distribution.title
- Resource.name
* - Distribution.modified
- Resource.modification_date
DCAT CatalogRecord
---------------------------------------------------
.. list-table::
:widths: 50 50
:header-rows: 1
* - DCAT Property
- CKAN Property
* - CatalogRecord.modified
- Dataset.metadata_modified
* - CatalogRecord.conformsTo
- Dataset.conforms_to
* - CatalogRecord.changeType
- Dataset.changetype
* - CatalogRecord.listingDate
- Dataset.metadata_created
* - CatalogRecord.description
- Dataset.notes
* - CatalogRecord.language
- Dataset.metadata_language
* - CatalogRecord.sourceMetadata
- Dataset.source_catalog
* - CatalogRecord.title
- Dataset.title
Resource
=====================================================
.. list-table::
:widths: 22 45 33
:header-rows: 1
* - Property
- Description
- Validation
* - url
- The URL used to access the resource
- Required, String, Is URI
* - name
- The name of the resource
- Required, String
* - description
- A description of the resource
- Required, String
* - metadata_language
- The language used in the metadata of the resource
- Required, String, From :code:`donl:language`
* - language
- The languages used for the data found in the resource
- Required, List, From :code:`donl:language`
* - license
- The license that applies to the resource
- Required, From :code:`overheid:license`
* - format
- The format of the resource
- Required, String, From :code:`mdr:filetype_nal`
* - size
- The size of the contents of the resource in kilobytes
- Optional, Positive integer
* - download_url
- List of URLs referring to downloadable variants of the resource
- Optional, List, Are URLs
* - mimetype
- Mimetype of the resource
- Optional, String, From :code:`iana:mediatypes`
* - release_date
- The date the resource was released
- Optional, String, :code:`yyyy-mm-ddThh:mm:ss`
* - rights
- Rights that apply to the resource
- Optional, String
* - status
- Distributionstatus of the resource
- Optional, String, From :code:`adms:distributiestatus`
* - modification_date
- Date on which this resource was last modified
- Optional, String, :code:`yyyy-mm-ddThh:mm:ss`
* - linked_schemas
- Standards the resource applies to
- Optional, List, Are URIs
* - hash
- A hash calculated based on the contents of the resource
- Optional, String
* - hash_algorithm
- The hash algorithm used to determine the hash
- Optional, String
* - documentation
- A list of URLs that refer to documentation of the resource
- Optional, List, Are URLs
Schema validation
==================================================================================================
Outlined below are the possible validation messages that the ckanext-dcatdonl plugin can generate
based on the input it is given. The standard CKAN validation messages are not included in this
documentation.
Validation messages
--------------------------------------------------------------------------------------------------
website, email or phone is required for the contact_point
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This error occurs when the given dataset does not contain a value for either
:code:`contact_point_website`, :code:`contact_point_email` or :code:`contact_point_phone`. Atleast
one of these three properties must be provided in order for the dataset to be considered valid.
when hash is provided, has_algorithm must too be provided
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
when hash_algorithm is provided, hash must too be provided
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This error occurs when either the property :code:`hash` or :code:`hash_algorithm` is present, but
its counterpart is not. When either is provided, both are required.
legal_foundation_ref must be provided when providing any of the legal_foundation_* properties
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
legal_foundation_uri must be provided when providing any of the legal_foundation_* properties
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
legal_foundation_label must be provided when providing any of the legal_foundation_* properties
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When any of the legal_foundation_* properties is given, all are required.
spatial_value cannot be validated without a corresponding spatial_scheme
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
spatial_scheme must be accompanied by a spatial_value
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These errors may occur when the request body contains a :code:`spatial_scheme` but not a
:code:`spatial_value` or vice versa. Both properties are required to provide spatial metadata. To
resolve this, provide both properties in the request body.
Spatial validation
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
This complex validator spans the fields :code:`spatial_scheme` and :code:`spatial_value`. Both
fields are optional. However when one is provided, the other must too be provided. Furthermore the
value of :code:`spatial_scheme` determines the validator that will be used on :code:`spatial_value`.
In the table below the :code:`spatial_value` validators are shown for the possible values of
:code:`spatial_scheme`.
.. list-table::
:widths: 65 35
:header-rows: 1
* - spatial_scheme (base=http://standaarden.overheid.nl)
- spatial_value validation
* - /owms/4.0/doc/waardelijsten/overheid.gemeente
- Required, String, From overheid:spatial_gemeente
* - /owms/4.0/doc/waardelijsten/overheid.koninkrijksdeel
- Required, String, From overheid:spatial_koninkrijksdeel
* - /owms/4.0/doc/waardelijsten/overheid.provincie
- Required, String, From overheid:spatial_provincie
* - /owms/4.0/doc/waardelijsten/overheid.waterschap
- Required, String, From overheid_spatial_waterschap
* - /owms/4.0/doc/syntax-codeerschemas/overheid.epsg28992
- Required, String, Regex match :code:`^\d{6}(\.\d{3})? \d{6}(\.\d{3})?$`
* - /owms/4.0/doc/syntax-codeerschemas/overheid.postcodehuisnummer
- Required, String, Regex match :code:`^[1-9]\d{3}([A-Z]{2}(\d+(\S+)?)?)?$`
So when a list of spatial schemes is provided, e.g.
.. code-block:: json
[
"http://standaarden.overheid.nl/owms/4.0/doc/waardelijsten/overheid.gemeente",
"http://standaarden.overheid.nl/owms/4.0/doc/waardelijsten/overheid.waterschap"
]
Then the values in the list of :code:`spatial_value` must be values validate against the validators
defined in the table above. In this example the values must either be values of the valuelist
:code:`overheid:spatial_gemeente` or values of the valuelist :code:`overheid:spatial_waterschappen`.
value [{{ value }}] is not a valid spatial according to the schemes provided
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This error occurs when one of the values of the :code:`spatial_value` property does not validate
against the schemas provided in the :code:`spatial_scheme` property. To correct this, either update
the :code:`spatial_value` or the :code:`spatial_scheme` values so that they are in sync.
value must be a valid date (yyyy-mm-ddThh:mm:ss)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This error occurs when the temporal metadata is provided in the wrong datetime format. Ensure that
all temporal metadata is provided in the :code:`yyyy-mm-ddThh:mm:ss` format, e.g.
:code:`2017-12-31T13:15:00`.