Commit dff7b2a8 authored by Willem ter Berg's avatar Willem ter Berg

update of entire documentation regarding new version of the ckanext-dcatdonl plugin

parent c847d826
<component name="ProjectDictionaryState">
<dictionary name="w.terberg">
<words>
<w>dataset</w>
</words>
</dictionary>
</component>
\ No newline at end of file
Announcements
============================================
Below you'll find important announcements regarding this CKAN extension.
Scheduled update of public test application on https://dcat-ap-donl.nl
----------------------------------------------------------------------
On :code:`05/11/2018` the dcat-ap-donl.nl test environment will be unavailable while we roll out the latest changes of
the CKAN extension.
Changes applied on 29/10/2018
============================================
A new version has been released, the changes are list below!
- updated installation instructions.
- Now includes the installation of additional requirements
- Added instructions for targeting specific Apache Solr versions
- Introduced instructions to setup Apache Solr in various versions
- Currently contains support for Apache Solr 5.4 and 7.4
- Introduced several backwards compatible fixes for CKAN versions below 2.6
- Included Solr optimizations, searches against Solr now include several facets by default, namely
- facet_referentie_data
- facet_access_rights
- facet_publisher
- facet_authority
- facet_high_value
- facet_basis_register
- facet_dataset_status
- facet_metadata_language
- facet_frequency
- facet_license_id
- facet_source_catalog
- facet_theme
- Changes to the schemas:
- To declare a license for a package and/or resource you must now provide it in the :code:`license_id` key rather than the `license` key.
- The fields :code:`highvalue`, :code:`basisregister` and :code:`referentiedata` are no longer considered data.overheid system properties and are now part of the base DCAT-AP-DONL scheme
- The field :code:`highvalue` has been renamed to :code:`high_value`
- The field :code:`referentiedata` has been renamed to :code:`referentie_data`
- The field :code:`basisregister` has been renamed to :code:`basis_register`
- The field :code:`dataset_status` will now default to the URI for :code:`beschikbaar`
- The field :code:`high_value` will now default to :code:`false`
- The field :code:`referentie_data` will now default to :code:`false`
- The field :code:`basis_register` will now default to :code:`false`
- The list validation is now less strict, when a single value is provided it will silently convert this to a list of size 1 rather than returning a validation error message
- Updated the Usage chapter to incorporate the changes to the schemas
- The documented error messages have been updated
- Updated the logging format of the :code:`controlled_vocabulary_updater.py`
- Small fixes to various chapters of this documentation containing inaccuracies
Changelog
============================================
Contains the functional changelog of this CKAN extension.
.. toctree::
:maxdepth: 3
:caption: Changes
changelog-20181029
\ No newline at end of file
......@@ -8,7 +8,8 @@ The CKAN extension that implements the DCAT-AP-DONL metadata standard into CKAN.
:caption: Table of Contents
summary
announcements
changelog
installation
usage
schema
plugin_structure
......@@ -8,13 +8,16 @@ example.
.. code-block:: bash
python /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/ValuelistUpdater.py
python /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/task/controlled_vocabulary_updater.py
Ensure that the python script has READ and WRITE access to the following directory and its contents
The extension provides a `.sh` file which executes the above command, this file can easily be added to your
servers crontab. This file is located in `shell/valuelist_updater.sh`.
.. code-block:: bash
/usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources
/usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/controlled_vocabularies
The extension can function without the background process running, however this means that the
valuelists that are used as part of the DCAT-AP-DONL metadata standard will never be updated.
......@@ -9,6 +9,8 @@ Follow the steps listed below to install and activate the ckanext-dcatdonl exten
. /usr/lib/ckan/default/bin/activate
pip install -e git+https://gitlab.textinfo.nl/opensource/ckanext-dcatdonl.git#egg=ckanext-dcatdonl
cd ckanext-dcatdonl
pip install -r requirements.txt
2. Edit your CKAN .ini configuration file and add the following
......@@ -16,12 +18,13 @@ Follow the steps listed below to install and activate the ckanext-dcatdonl exten
ckan.plugins = ... dcatdonl
3. In the same file, add (or change) the `licenses_group_url` property in the `[app:main]` section
to
3. In the same file, add (or change) the following properties to:
.. code-block:: ini
licenses_group_url = file:///usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/overheid_license.json
solr_url = http://{{host}}:8983/solr/ckan
ckan.mimetype_guess = None
4. Restart apache2
......
Installation of Solr
===================================================================================================
To install Solr version 7.5.0 (assuming no previous Solr installation):
.. code-block:: bash
sudo apt-get install openjdk-9-jre-headless
cd /opt
sudo wget http://www-eu.apache.org/dist/lucene/solr/7.5.0/solr-7.5.0.tgz
sudo tar xzf solr-7.5.0.tgz solr-7.5.0/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-7.5.0.tgz
To create the CKAN core into Solr:
.. code-block:: bash
sudo -u solr /opt/solr/bin/solr create -c ckan
sudo rm /var/solr/data/ckan/conf/protwords.txt
sudo rm /var/solr/data/ckan/conf/solrconfig.xml
sudo rm /var/solr/data/ckan/conf/managed-schema
sudo rm /var/solr/data/ckan/conf/stopwords.txt
sudo rm /var/solr/data/ckan/conf/synonyms.txt
sudo mkdir /var/lib/solr
sudo chown solr /var/lib/solr -R
cd ~
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/currency.xml /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/elevate.xml /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/protwords.txt /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/schema.xml /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/solrconfig.xml /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/spellings.txt /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/stopwords.txt /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/synonyms.txt /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/synonyms_themes.txt /var/solr/data/ckan/conf
sudo ln -s /usr/lib/ckan/default/src/ckanext-dcatdonl/ckanext/dcatdonl/resources/solr/7.4/synonyms_themes_hierarchy.txt /var/solr/data/ckan/conf
sudo service solr restart
If your want to use the ckanext-dcatdonl solr optimizations for earlier CKAN versions it is advised to use the files present in the `ckanext/dcatdonl/resources/solr/5.5` directory instead.
\ No newline at end of file
......@@ -10,4 +10,5 @@ installation.
installation-requirements
installation-plugin
installation-solr
installation-backgroundprocess
Plugin structure
===================================================
The structure of the ckanext-dcatdonl plugin can best be described by the model below. This model
identifies all the components and the relationships these components have. The entrypoint
components have been marked grey.
.. image:: _static/pluginstructure.png
......@@ -92,7 +92,7 @@ Dataset
* - sample
- Sample data of the dataset
- Optional, List, Are URLs
* - license
* - license_id
- The license that applies to the dataset
- Required, From :code:`overheid:license`
* - title
......@@ -136,7 +136,16 @@ Dataset
- Optional, String, :code:`yyyy-mm-ddThh:mm:ss` Must be greater than temporal_start
* - dataset_status
- State of the dataset, it describes the availability of the dataset
- Optional, String, From :code:`overheid:datasetStatus`
- Optional, String, From :code:`overheid:datasetStatus`, defaults to the URI for :code:`beschikbaar`
* - date_planned
- The date and time upon which it is planned that the dataset becomes available
- Optional, String, :code:`yyyy-mm-ddThh:mm:ss`
* - high_value
- Indicates this dataset is considered of 'high value' by the Dutch government
- Optional, Boolean, defaults to :code:`False`
* - basis_register
- Indicates this dataset is part of the Dutch 'basisregister'
- Optional, Boolean, defaults to :code:`False`
* - referentie_data
- Indicates this dataset contains `highly` reusable data
- Optional, Boolean, defaults to :code:`False`
......@@ -21,6 +21,8 @@ DCAT Dataset
- Dataset.name
* - Dataset.language
- Dataset.language
* - Dataset.license
- Dataset.license_id
* - Dataset.modified
- Dataset.modified
* - Dataset.contactPoint
......@@ -95,6 +97,12 @@ DCAT Dataset
- Dataset.dataset_status
* - Dataset.datePlanned
- Dataset.date_planned
* - Dataset.highValue
- Dataset.high_value
* - Dataset.basisregister
- Dataset.basis_register
* - Dataset.referentieData
- Dataset.referentie_data
DCAT Distribution
---------------------------------------------------
......@@ -112,7 +120,7 @@ DCAT Distribution
* - Distribution.format
- Resource.format
* - Distribution.license
- Resource.license
- Resource.license_id
* - Distribution.byteSize
- Resource.size
* - Distribution.checksum
......
......@@ -23,7 +23,7 @@ Resource
* - language
- The languages used for the data found in the resource
- Required, List, From :code:`donl:language`
* - license
* - license_id
- The license that applies to the resource
- Required, From :code:`overheid:license`
* - format
......
Valuelists
=====================================================
The following valuelists are used to validate parts of the CKAN schemas.
The following valuelists (AKA controlled vocabularies) are used to validate parts of the CKAN schemas.
.. list-table::
:widths: 32 68
......@@ -10,39 +10,39 @@ The following valuelists are used to validate parts of the CKAN schemas.
* - Name
- Location
* - adms:changetype
- http://waardelijsten.dcat-ap-donl.nl/adms_changetype.json
- https://waardelijsten.dcat-ap-donl.nl/adms_changetype.json
* - adms:distributiestatus
- http://waardelijsten.dcat-ap-donl.nl/adms_distributiestatus.json
- https://waardelijsten.dcat-ap-donl.nl/adms_distributiestatus.json
* - donl:catalogs
- http://waardelijsten.dcat-ap-donl.nl/donl_catalogs.json
- https://waardelijsten.dcat-ap-donl.nl/donl_catalogs.json
* - donl:language
- http://waardelijsten.dcat-ap-donl.nl/donl_language.json
- https://waardelijsten.dcat-ap-donl.nl/donl_language.json
* - donl:organization
- http://waardelijsten.dcat-ap-donl.nl/donl_organization.json
- https://waardelijsten.dcat-ap-donl.nl/donl_organization.json
* - iana:mediatypes
- http://waardelijsten.dcat-ap-donl.nl/iana_mediatypes.json
- https://waardelijsten.dcat-ap-donl.nl/iana_mediatypes.json
* - mdr:filetype_nal
- http://waardelijsten.dcat-ap-donl.nl/mdr_filetype_nal.json
- https://waardelijsten.dcat-ap-donl.nl/mdr_filetype_nal.json
* - overheid:datasetStatus
- http://waardelijsten.dcat-ap-donl.nl/overheid_dataset_status.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_dataset_status.json
* - overheid:frequency
- http://waardelijsten.dcat-ap-donl.nl/overheid_frequency.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_frequency.json
* - overheid:license
- http://waardelijsten.dcat-ap-donl.nl/overheid_license.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_license.json
* - overheid:openbaarheidsniveau
- http://waardelijsten.dcat-ap-donl.nl/overheid_openbaarheidsniveau.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_openbaarheidsniveau.json
* - overheid:spatial_gemeente
- http://waardelijsten.dcat-ap-donl.nl/overheid_spatial_gemeente.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_spatial_gemeente.json
* - overheid:spatial_koninkrijksdeel
- http://waardelijsten.dcat-ap-donl.nl/overheid_spatial_koninkrijksdeel.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_spatial_koninkrijksdeel.json
* - overheid:spatial_provincie
- http://waardelijsten.dcat-ap-donl.nl/overheid_spatial_provincie.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_spatial_provincie.json
* - overheid:spatial_scheme
- http://waardelijsten.dcat-ap-donl.nl/overheid_spatial_scheme.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_spatial_scheme.json
* - overheid:spatial_waterschap
- http://waardelijsten.dcat-ap-donl.nl/overheid_spatial_waterschap.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_spatial_waterschap.json
* - overheid:taxonomiebeleidsagenda
- http://waardelijsten.dcat-ap-donl.nl/overheid_taxonomiebeleidsagenda.json
- https://waardelijsten.dcat-ap-donl.nl/overheid_taxonomiebeleidsagenda.json
Structure
-----------------------------------------------------
......@@ -55,14 +55,12 @@ valuelist:
{
"http://standaarden.overheid.nl/owms/terms/Afval_(thema)": {
"code": "overheid:Afval_(thema)",
"labels": {
"nl-NL": "Afval",
"en-US": "Rubbish"
}
},
"http://standaarden.overheid.nl/owms/terms/Arbeidsomstandigheden_(thema)": {
"code": "overheid:Arbeidsomstandigheden_(thema)",
"labels": {
"nl-NL": "Arbeidsomstandigheden",
"en-US": "Labour conditions"
......@@ -71,14 +69,10 @@ valuelist:
...
}
The :code:`labels` segment will not be present in every valuelist, as not all the values of all the
valuelist can be translated beyond a code which is identical in every language.
When supplying a value that must be part of a valuelist, provide the key of the value. In the
example above this would be
:code:`http://standaarden.overheid.nl/owms/terms/Arbeidsomstandigheden_(thema)`. The code and
labels properties are provided so that front-end applications can provide a proper translation of
the value.
:code:`http://standaarden.overheid.nl/owms/terms/Arbeidsomstandigheden_(thema)`. The labels are
provided so that front-end applications can provide a proper translation of the value.
The one exception to the format displayed above is for the valuelist of :code:`overheid:license`.
The format deviates here to accommodate CKAN, since CKAN requires its license source file to fit a
......@@ -104,14 +98,6 @@ specific format. This format is displayed below.
...
]
When providing a valid license value, the following must be provided:
.. code-block:: json
"license": {
"id": "value"
}
Caching
-----------------------------------------------------
......
......@@ -12,9 +12,8 @@ values wherever possible.
provide front-end templates for these extended schemas. Users of this extension with the desire
for such templates need to implement those templates themselves in a separate plugin.
This extension is currently used in the CKAN environment of the `Data.Overheid.nl`_ webapplication.
The most recent version can be found on
`gitlab.textinfo.nl/opensource/ckanext-dcatdonl/tree/public.version`_.
A modified version of this extension is currently used in the CKAN environment of the `Data.Overheid.nl`_ webapplication.
The most recent version of the public extension can be found on https://gitlab.textinfo.nl/opensource/ckanext-dcatdonl.
The following subjects are described
......
......@@ -24,9 +24,7 @@ Minimum dataset creation request
"authority": "http://standaarden.overheid.nl/owms/terms/'s-Hertogenbosch",
"publisher": "http://standaarden.overheid.nl/owms/terms/Centraal_Bureau_voor_de_Statistiek",
"contact_point_name": "John Doe",
"license": {
"id": "http://creativecommons.org/licenses/by/4.0/deed.nl"
},
"license_id": "http://creativecommons.org/licenses/by/4.0/deed.nl",
"language": [
"http://publications.europa.eu/resource/authority/language/NLD"
],
......@@ -90,9 +88,7 @@ Full dataset creation request
"sample": [
"https://www.mijn.organisatie.nl/datasets/mijndataset1/samples"
],
"license": {
"id": "http://creativecommons.org/licenses/by/4.0/deed.nl"
},
"license_id": "http://creativecommons.org/licenses/by/4.0/deed.nl",
"name": "mijndataset1",
"title": "mijndataset1",
"notes": "De omschrijving van mijndataset1!",
......@@ -118,7 +114,10 @@ Full dataset creation request
"temporal_start": "2017-01-01T00:00:00",
"temporal_end": "2017-12-31T23:59:00",
"dataset_status": "http://data.overheid.nl/status/beschikbaar",
"date_planned": "2018-01-11T13:29:00"
"date_planned": "2018-01-11T13:29:00",
"high_value": "True",
"basis_register": "False",
"referentie_data": "True"
}
Minimum resource creation request
......@@ -136,9 +135,7 @@ Minimum resource creation request
"metadata_language": "http://publications.europa.eu/resource/authority/language/NLD",
"format": "http://publications.europa.eu/resource/authority/file-type/ZIP",
"language": "http://publications.europa.eu/resource/authority/language/NLD",
"license": {
"id": "http://creativecommons.org/publicdomain/mark/1.0/deed.nl"
}
"license_id": "http://creativecommons.org/publicdomain/mark/1.0/deed.nl"
}
Full resource creation request
......@@ -156,9 +153,7 @@ Full resource creation request
"metadata_language": "http://publications.europa.eu/resource/authority/language/NLD",
"format": "http://publications.europa.eu/resource/authority/file-type/ZIP",
"language": "http://publications.europa.eu/resource/authority/language/NLD",
"license": {
"id": "http://creativecommons.org/publicdomain/mark/1.0/deed.nl"
},
"license_id": "http://creativecommons.org/publicdomain/mark/1.0/deed.nl",
"linked_schemas": "http://some.standard.nl/reference",
"size": 1234567890,
"download_url": "http://my.organization.com/mydataset/myresource1.zip",
......@@ -172,4 +167,4 @@ Full resource creation request
"documentation": "http://my.organization.com/mydataset/documentation"
}
.. _Postman collection: https://gitlab.textinfo.nl/opensource/ckanext-dcatdonl/raw/public.version/ckanext/dcatdonl/tests/postman/ckanext-dcatdonl.postman_collection.json
\ No newline at end of file
.. _Postman collection: https://www.getpostman.com/collections/c54a66d658d1dec274bb
\ No newline at end of file
......@@ -11,12 +11,6 @@ data.overheid.nl
Data.overheid.nl maintains several additional properties for datasets that may be encountered when
viewing datasets and resources. These properties are detailed below:
Dataset.referentiedata
States if a given dataset is considered referentiedata
Dataset.high_value_dataset
States if a given dataset is considered a high value dataset
Dataset.basisregister
States if a given dataset is part of the basisregister
Dataset.duplicate_resources
States which resources have duplicates on data.overheid.nl
Resource.link_status
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment