Data model/ ontology enabled usage control

Problem statement

The current state of usage policy implemented in FarmStack restricts the usage by a specific containerised application. This needs the containerised application hash value to be logged at the time of verification. There is no easy way to create a containerised app from the code and the process of verification of the code requires sharing the same by consumer to provider. This process requires a trusted intermediary to oversee the process and can have issues when number of participating nodes increase.

There could be cases where the datasets are defined by a data model or ontology.

  • Can the data labels be used to create services that restricts what can be done with the data?

  • Can we create a verification module to determine if a code handles the data the way it is required?

    • Does not call an external API?

    • Does only a defined set of actions like join, aggregate and filter?

    • There could be restrictions that the code is written in a specific language

Example case

Most of the data sharing for extension work involves PII like location, name, identifier etc. Many of the applications need to join two different data sets based on some identifier, remove/ filter the identifier and aggregate/ transform the data.

In this example, there are two different data tables:

  1. Farmer produce details

    1. Information about farmers (PII)

    2. Information about produce (quality, quantity)

    3. Date and other meta information

  2. Farmer extension details

    1. Information about farmers (PII)

    2. Information about the activity on farm

    3. Date and other meta information

There is an application that wants to join the two datasets based on the common identifier and remove and aggregate the number of farmers by data, region, crop type etc for creating analytics dashboard.

For both the datasets we have:

  1. Ontologies/ data models

  2. A json or json - ld response about data

  3. Sample data (dummy)

Questions

  1. Can we create a policy that can check a code that does join the two datasets based on an identifier?

    1. Example: SparQL query that does join only

  2. If we are able to do step 1, then can we also impose a label based usage control that does anonymise as given in the example here

  3. If we can do the above, is it possible to check code that does aggregate and outputs the tables?

    1. Could be a SparQL query