Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
indent2em
absoluteUrltrue

This page describes the principles of creating core vocabularies and application profiles is is a brief introduction to the modeling paradigm used on the FI-Platform. It is important to understand these principles so that any ambiguities are avoided, as the modeling languages and paradigm used here differ from the traditional conceptual model used by many data architects and modeling tools (UML in particular). The end-goal of mastering the principles is that the formal content of your models are semantically equivalent to their visual and human-readable content, i.e. that your models speak of the same things both in machine and human readable formats. This is particularly important, as the primary aim of this modeling platform is to create and maintain semantic interoperability between actors - data and models should retain their meaning when they are shared or exchanged between parties., as well as recommended modeling principles. Understanding the paradigm is crucial for data modelers as well as actors intending to use the models and data based on the models, whereas the recommended modeling principles give additional guidance on how to do modeling aligned to the paradigm and avoid common pitfalls.

The FI-Platform modeling tool is able to produce core vocabularies and application profiles. While they serve different purposes, they share a common foundation. We will first go through this foundation and then dive into the specifics of these two model types and their use cases.  When discussing the model paradigm, we sometimes contrast it with traditional technologies such as XML Schema and UML, as those are more familiar to many modelers and data architects.

In this guide we have highlighted the essential bits of knowledge key facts with spiral notepad (note) symbols.

The Linked Data Modeling

...

Paradigm

The models published on this platform are part of the FI-Platform are in essence Linked Data and thus naturally compatible with the linked data ecosystem. Putting the models into use in your own domain or organization though does not require a major overhaul of your information systems nor migrating your data into linked dataThe FI-Platform nevertheless does not expect modelers nor users of the models to migrate their information systems into Linked Data based ones, or to make a complete overhaul of their information management processes. There are multiple pathways to utilize for utilising the models, ranging from lightweight integration to full-fledged solutions. We will go through these options later, but first we will go through the key features of linked data.

How we organize and connect data greatly impacts our ability to use it effectively. Linked data offers a structured approach to data management that extends beyond its traditional uses both within closed organizational environments as well as publicly on the Internet. Typically, information systems tend to be siloed, requiring tailor-made and often fragile ETL (Extract, Transform, Load) processes for data interoperability. These ETL solutions, while necessary, do not inherently carry machine-readable semantics, leaving the interpretation and contextual understanding of the data largely to human operators. In contrast, linked data enables embedding rich, machine-readable semantics within the data itself, facilitating automated processing and integration. Additionally - as inherent in its name - linked data makes connecting datasets from heterogeneous sources and referencing other data (down to individual entities) tremendously easy. These capabilities not only enhance data usability but also provide the basis for more interoperable, intelligent and dynamic replacements for traditional data catalog, warehousing and similar management solutions.

Linked data based knowledge representation, validation and query languages are frameworks used to create, structure, and link data so that both humans and machines can understand its meaning and use it efficiently. They are part of the broader ecosystem known as the Semantic Web, which aims to make data on the entire World Wide Web readable by machines as well as by humans.

Linked data models are instrumental in several ways:

  • Data Integration: They facilitate the combination of data from diverse sources in a coherent manner. This can range from integrating data across different libraries to creating a unified view of information that spans multiple organizations.
  • Interoperability: A fundamental benefit of using linked data models is their ability to ensure interoperability among disparate systems. Data structured particularly with linked data knowledge representation languages such as OWL, SKOS or RDFS, can be shared, understood, and processed in a consistent way, regardless of the source. This capability is crucial for industries like healthcare, where data from various providers must be combined and made universally accessible and useful, or in supply chain management, where different stakeholders (manufacturers, suppliers, distributors) need to exchange information seamlessly.

  • Knowledge Management: These models help organizations manage complex information about products, services, and internal processes in a way that is easily accessible and modifiable. This structured approach supports more efficient retrieval and use of information.

  • Artificial Intelligence and Machine Learning: The knowedge representation languages provide a structured context for data, which is essential for training machine learning models. This structure allows AI systems to interpret data more accurately and apply learning to similar but previously unseen data. By using linked data models, organizations can ensure that their data is not only accessible and usable within their own systems but can also be easily linked to and from external systems. This creates a data ecosystem that supports richer, more connected, and more automatically processable information networks.

  • Enhancing Search Capabilities: By providing detailed metadata and defining relationships between data entities, these models significantly improve the precision and breadth of search engine results. This enriched search capability allows for more detailed queries and more relevant results.

When discussing linked data models, particularly in the context of the Semantic Web, there are two primary categories to consider: core vocabularies (ontologies) and application profiles (schemas). Each serves a unique role in structuring and validating data. We will explore these two categories after introducing linked data which functions as their foundation.

The Core Idea of Linked Data

As you can imagine based on the name, linked data is all about references (links or "pointers") between entities (pieces of data). In principle:

spiral notepad All entities (resources in the linked data jargon) in the data are named with URIs (Uniform Resource Identifier).

spiral notepad The names should in most cases be HTTP(S) URIs, as this allows a standardized way to resolve the names (i.e. access the resources).

spiral notepad When a client resolves the name, relevant information about the resource should be provided (to the extent depending on the access rights the resolving party has). This means for example, that a human user receives an easily human-readable representation of the resource, while a machine receives a machine-readable representation of it.

spiral notepad The resources should refer (be linked) to other resources when it aids in discoverability, contextualizing, validating, or otherwise improving the useability of the data.

How to Name Things

Image Removed

semantic layering to fully integrated solutions. In most cases you can think of Linked Data solutions as a semantic layer covering your information architecture, or as a translational layer between your domain and external actors or between individual information systems.

Linked Data is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It extends the Web to use URIs to name not just documents but also real-world objects and abstract concepts (in general, anything). The key issues Linked Data aims to solve are:

  • Data Silos: Information is typically stored in isolated databases with limited ability to interoperate. Linked Data was designed to break down these barriers by linking data across different sources, allowing for more comprehensive insights and analyses.
  • Semantic Disconnect: There is a lack of a common framework that could universally describe data relationships and semantics, and a disconnect between model types and associated specifications (conceptual, logical, physical, documentation, code lists, vocabularies). Linked Data describes data with the RDF (Resource Description Framework) language, which is able to encode meaning alongside data, enhancing its machine-readability and semantic interoperability.
  • Integration Complexity: Integrating data from various sources is typically complicated and costly due to heterogeneous data formats and access mechanisms, requiring peer-to-peer ETL solutions, API integrations or commonly agreed and restrictive schemas. Linked Data promotes a standard way to access data (HTTP), a common data format (RDF), and a standardized query language (SPARQL), simplifying data integration.
  • Reusability and Extension: The reusability and combining of data from various sources for new needs is typically limited or burdensome. Linked Data encourages reusing existing data by making the finding and combining data as well as inferring new data straightforward.

In essence, Linked Data offers an alternative solution for the typical data management issues and proposed solutions that rely on tailored and siloed data warehouses, catalogues, data pools, lakes etc.

The Core Idea of Linked Data

As you can imagine based on the name, Linked Data is all about references (links or "pointers") between entities (pieces of data). In principle:

spiral notepad All entities (resources in the Linked Data jargon), be they actual instance data or entities in a data model are named with IRIs (Internationalised Resource Identifier).

spiral notepad The names should (in most cases) be HTTP based IRIs, as this allows a standardized way to resolve the names (i.e. access the resource content named by the IRI).

spiral notepad When a client resolves the name, relevant and appropriate information about the resource should be provided.

spiral notepad The resources should refer (be linked) to other resources when it aids in discoverability, contextualising, validating, or otherwise improving the useability of the data.

How to Name Things

Image Added

As mentioned, all resources are named (minted in the jargon) with identifiers, which we can then use to refer to them when needed.

spiral notepad The FI-Platform gives every resource an HTTP IRI name in the form of <https://iri.suomi.fi/model/modelName/versionNumber/localName>

spiral notepad On the Web You will typically come across mentions of URIs (Uniform Resource Identifier) and HTTP URIs more often than IRIs or HTTP IRIs. On the FI-Platform we encourage internationalisation and usage of Unicode, and mint all resources with IRIs instead of URIs which are restricted to just a subset of the ASCII characters.

spiral notepad IRIs are an extension of URIs, i.e. each URI is already a valid IRI.

spiral notepad IRIs can be mapped to URIs (with percent encoding the path and punycoding the domain part) for backwards compatibility, but this is not recommended due to potential collisions. Instead, tools supporting IRIs and Unicode should always be preferred. You should generally never mint identifiers which are ambiguous (i.e. name a thing with an IRI and something else with an URI encoded version). The IRI will get encoded anyway when you're doing a HTTP request, so you should always consider the set of URIs that correspond to URI encoded versions of IRI as reserved and not use them for naming resources.

In the diagram above you can see the hierarchy of different identifier types. The identifier types based on the HTTP scheme are highlighted as they are primary in Linked Data, but the other scheme types (deprecated, current and potential upcoming ones) are of course possible (list of IANA schemes here: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml). Note that the diagram is conceptually incorrect in the sense that conceptually IRIs and URIs (including their HTTP subsets) are more broad than URLs which are merely Uniform Resource Locators. In other words, URLs tell where something on the Web is located for fetching, whereas URIs and IRIs give them identifiers (name them), and with a proper scheme version (like HTTP IRI) make the name resolvable. But if we simply look at the structure of these identifiers from a pragmatic perspective, the diagram is correct.

URNs are a special case, they are defined per RFC as a subset of URIs. They start with the scheme urn: with a local name or a potential sub-namespace. Well-known sub-namespace URNs are ISBNs, EANs, DOIs and UUIDs. An example of an ISBN URN is As mentioned, all resources are named ("minted") with URI/IRI identifiers, which we can then use to refer to them when needed. URLs and URNs form subsets of URIs (which in turn is a subset of all IRIs), so any URL - be it for an image, web site, REST endpoint address or whatever - is already ready to be incorporated to the linked data ecosystem. URNs (e.g. urn:isbn:0-123-456-789-123) can be used as well, but unlike the aforementioned URLs they can't be directly resolved. URNs with their own sub-namespaces have specific use cases (ISBN, EAN, DOI, UUID etc.) where they are preferred, but in general you should aim towards creating HTTP URIs (or to be more precise HTTP IRIs as explained later). The difference between an URL and HTTP URI/IRI is that the first is an address for locating something whereas the latter conceptually acts also as an identifier for naming it. URNs are not global and they always need a specific resolver. There is no central authority for querying ISBN or ISSN numbers, thus the URN must be resolved against a particular service, and in the case of UUIDs they mean nothing outside their specific context and thus mean they are tightly coupled with the information system(s) they inhabit.


There is a deep philosophical difference and reasons between how and what things are named in linked data compared to traditional data modeling e.g. with UML, but covering this requires going through some elementary principles first. We won't go deeply into the conceptual basis of URIs in this text, but the most important thing to keep in mind is: URLs only point to resources on the Web, whereas URIs are meant to describe any resources - also abstract or real-world ones. As an example, Finland as a country is not a resource created specifically and existing only on the Web, but it can still be named with a HTTP URI/IRI so that it can be described in machine-readable terms and used in information systems.

...

spiral notepad You should always publish your models and data with versioned URI identifiers to avoid unintended side-effects.

Linked Data is Atomic

It is crucial to understand that by nature, linked data is atomic and always in the form of a graph. The lingua franca of linked data is RDF (Resource Description Framework), which allows for a very intuitive and natural way of representing information. In RDF everything is expressed as triples (3-tuples): statements consisting of three components (resources). You can think of triples as simply rows of data in a three column data structure: the first column represents the subject resource (from whose point of view the statement is made), the second column represents the context resource of what is being stated by the subject, and the third column represents the object or value resource of the statement. Simplified to the extreme, "Finland is a country" is a statement in this form:

...

spiral notepad It is always possible to expand graphs by adding triples pointing to new resources or to state facts about the external world.

Linked Data Can Refer to Anything

You might have already guessed that this kind of graph data structure becomes cumbersome when it is used for example to store lists or arrays. Both are possible in RDF, but the flexibility of linked data allows us to leverage the fact that the same URI can offer us a large dataset e.g. in JSON format with the proper content accept type, while using the same URI as the identifier for the dataset and offer relevant RDF description of it. Additionally, e.g. REST endpoint URLs that point to individual records or fragments of a dataset can also be used to allow us to talk about individual dataset records in RDF while simultaneously keeping the original data structure and access methods intact. The URL based ecosystem does not have to be aware of the semantic layer that is added on top or parallel to it, so implementing linked data based semantics to your information management systems and flows is not by default disruptive.

Let us take an example: you might have a REST API that serves photos as JPEGs with a path: https://archive.myserver/photos/localName. When doing a HTTP request with Accept headers for image/jpeg, the URI will resolve a JPEG, but when using Accept header for application/ld+json, the same URI would resolve to a semantic RDF representation in JSON-LD of the photo (for example of its location, time and other EXIF data as well a provenance etc.).

What about models then? 

Everything Has an Identity

Because the above mentioned resources are all HTTP URIs (with the exception of the literal integer value for the number of lakes), we are free to refer to them to either expand the domain of this dataset or combine it with another dataset to cover new use cases. The killer feature of linked data is the ability to enrich data by combining it easily and infer new facts from it. As an example, we could expand the model to add the perspective of an individual ("Matti resides in Finland") and of a class ("A country has borders"):

...

A small exception to the naming rule is that the literal integer value of 187888 does not have an identity (nor do any other literal values).

All Resources Are First-Class Citizens

In traditional UML modeling classes are the primary entities with other elements being subservient to them. In RDF this is not the case. Referring to the diagram above, e.g. the resource <https://datamodel/hasCapital> is not bound to be only an association, even though here it is used as such. As it has been named with a URI, we can use it as the subject of additional triples that describe it. We could - for example - add triples that describe its meaning or metadata about its specification. So, when reading the diagrams you should always keep in mind that the sole reason some resources are represented as edges is due to the fact that they appear in the stated triples in the predicate position. 

...

Above, you can see another triple where the creator of the <https://datamodel/hasCapital> resource is stated as being some organization under the <https://finland.fi/> namespace.

Data and Model Can Coexist

From the diagrams above it is evident that both individuals ("instances") and classes are named the same way and can coexist in the same graph. From the plain linked data perspective such a distinction doesn't even exist: there are just resources pointing to other resources with RDF. Where the distinction becomes important is when the data is interpreted through the lens of a knowledge description language where some of the resources have a special meaning for the processing software. The meaning of these resources become evident when we discuss the core vocabularies and application profiles.

Relationships Are Binary

Third key takeaway is that all relationships are binary. This means that these kind of structures (n-ary relationships) are not possible:

...

Stating this would always require two triples with each one using the same "has child" as its predicate, meaning the association is used twice. This means that treating them as one requires an additional grouping structure, for example in the case where all of the child associations would share some feature that we do not want to replicate for each individual association.

You're not Modelling records but the Entities

What all the resources in a linked data dataset are depends of course entirely on the use case at hand. In general though, there is a deep philosophical distinction between how linked data and traditional data approach modelling. Traditionally data modelling is done in a siloed way with an emphasis on modelling records, in other words a data structure that describes a set of data for a specific use case. As an example, different information systems might hold data about an individual's income taxes, medical history etc. These data sets relate to the individual indirectly via some kind of a permanent identifier, such as the finnish Personal Identity Code. The identifier nor the records are meant to represent the concepts themselves but just a the relevant data of them. The data is typically somewhat denormalized and serves a process or system view of the domain at hand.

On the other hand, in linked data the modelling of specific entities is often approached by assuming that the minted URIs (or IRIs) are actually names for the entities themselves.

Core Vocabularies (Ontologies)

Core Vocabularies provide a structured framework to represent knowledge as a set of concepts within a domain and the relationships between those concepts. They are used extensively to formalize a domain's knowledge in a way that can be processed by computers. Ontologies allow for sophisticated inferences and queries because they can model complex relationships between entities and can include rules for how entities are connected.

  • RDFS (RDF Schema) is a basic ontology language providing basic elements for the description of ontologies. It introduces concepts such as classes and properties, enabling rudimentary hierarchical classifications and relationships.

  • OWL (Web Ontology Language) offers more advanced features than RDFS and is capable of representing rich and complex knowledge about things, groups of things, and relations between things. OWL is highly expressive and designed for applications that need to process the content of information instead of just presenting information.

Schemas

Schemas, on the other hand, are used for data validation. They define the shape of the data, ensuring it adheres to certain rules before it is processed or integrated into systems. Schemas help maintain consistency and reliability in data across different systems.

...

When transitioning from traditional modeling techniques like UML (Unified Modeling Language) or Entity-Relationship Diagrams (ERD) to linked data based modeling with tools like OWL, RDFS, and SHACL, practitioners encounter both conceptual and practical shifts. This chapter aims to elucidate these differences, providing a clear pathway for those accustomed to conventional data modeling paradigms to adapt to linked data methodologies effectively.

Conceptual Shifts

Graph-based vs. Class-based Structures

  • Traditional Models: Traditional data modeling, such as UML and ERD, uses a class-based approach where data is structured according to classes which define the attributes and relationships of their instances. These models create a somewhat rigid hierarchy where entities must fit into predefined classes, and interactions are limited to those defined within and between these classes.
  • Linked Data Models: Contrastingly, linked data models, utilizing primarily RDF-based technologies, adopt a graph-based approach. In these models, data is represented as a network of nodes (entities) and edges (relationships) that connect them. Each element, whether data or a conceptual entity, can be directly linked to any other, allowing for dynamic and flexible relationships without the confines of a strict class hierarchy. This structure facilitates more complex interconnections and seamless integration of diverse data sources, making it ideal for expansive and evolving data ecosystems.

From Static to Dynamic Schema Definitions

  • Traditional Models: UML and ERD typically define rigid schemas intended to structure database systems where the schema must be defined before data entry and is difficult to change.
  • Linked Data Models: OWL, RDFS etc. allow for more flexible, dynamic schema definitions that can evolve over time without disrupting existing data. They support inferencing, meaning new relationships and data types can be derived logically from existing definitions.

From Closed to Open World Assumption

  • Traditional Models: Operate under the closed world assumption where what is not explicitly stated is considered false. For example, if an ERD does not specify a relationship, it does not exist.
  • Linked Data Models: Typically adhere to the open world assumption, common in semantic web technologies, where the absence of information does not imply its negation, i.e. we cannot deduce falsity based on missing data. This approach is conducive to integrating data from multiple, evolving sources.

Entity Identification

  • Traditional Models: Entities are identified within the confines of a single system or database, often using internal identifiers (e.g., a primary key in a database). Not all entities (such as attributes of a class) are identifiable without their context (i.e. one can't define an attribute with an identity and use it in two classes).
  • Linked Data Models: Linked data models treat all elements as atomic resources that can be uniquely identified and accessed. Each resource, whether it's a piece of data or a conceptual entity, is assigned a Uniform Resource Identifier (URI). This ensures that every element in the dataset can be individually addressed and referenced, enhancing the accessibility and linkage of data across different sources.

Practical Shifts

Modeling Languages and Tools

  • Traditional Models: Use diagrammatic tools to visually represent entities, relationships, and hierarchies, often tailored for relational databases.
  • Linked Data Models: Employ declarative languages that describe data models in terms of classes, properties, and relationships that are more aligned with graph databases. These tools often focus on semantics and relationships rather than just data containment.

Data Integrity and Validation

  • Traditional Models: Data integrity is managed through constraints like foreign keys, unique constraints, and checks within the database system.
  • Linked Data Models: SHACL is used for validating RDF data against a set of conditions (data shapes), which can include cardinality, datatype constraints, and more complex logical conditions.

Interoperability and Integration

  • Traditional Models: Often siloed, requiring significant effort (e.g. ETL solutions, middleware) to ensure interoperability between disparate systems.
  • Linked Data Models: Designed for interoperability, using RDF (Resource Description Framework) as a standard model for data interchange on the Web, facilitating easier data merging and linking.

Transition Strategies

Understanding Semantic Relationships

  • Invest time in understanding how OWL and RDFS manage ontologies, focusing on how entities and relationships are semantically connected rather than just structurally mapped.

Learning New Validation Techniques

  • Learn SHACL to understand how data validation can be applied in linked data environments, which is different from constraint definition in relational databases.

Adopting a Global Identifier Mindset

  • Embrace the concept of using URIs for identifying entities globally, which involves understanding how to manage and resolve these identifiers over the web.
  • It is also worth learning about how URIs differ from URNs and URLs, how they enable interoperability with other identifier schemas (such as using UUIDs), what resolving identifiers means, and how URIs and their namespacing can be used to use URIs in a local scope.

Linked Data Modeling in

...

Practice

You might already have a clear-cut goal for modeling, or alternatively be tasked with a still loosely defined goal of improving the management of knowledge in your organization. As a data architect, you're accustomed to dealing with REST APIs, PostgreSQL databases, and UML diagrams. But the adoption of technologies like RDF, RDFS, OWL, and SHACL can elevate your data architecture strategies. Here is a short list explaining some of the most common use-cases as they would be implemented here:

Conceptual and logical models and database schemas

For conceptual and logical models, you should create a Core Vocabulary (i.e. an OWL ontology). You can do both lightweight conceptual modeling by merely specifying classes and their interrelationships with associations, or a more complex and comprehensive model by including attributes, their hierarchies and attribute/relationship-based constraints. In either case, the same OWL ontology acts both as a formally defined conceptual and logical model, there is no need for an atrificial separation of the two. This also helps to avoid inconsistencies cropping up between the two. The primary advantage of basing these models on OWL is the ability to use inferencing engines to logically validate the internal consistency of the created models, which is not possible with traditional conceptual and logical models.

...

  • Defining API specifications:
  • Defining message schemas:

Which one to create: a Core Vocabulary or Application profile?

Which model type should you start with? This naturally depends on your use-case. You might be defining a database schema, building a service that distributes information products adhering to a specific schema, trying to integrate two datasets... In general, all these and other use-cases start with the following workflow:

...

  • If you want to annotate data, check its logical soundness or infer new facts from it, you need a core vocabulary. With a core vocabulary you are essentially making a specification stating "individuals that fit these criteria belong to these classes".
  • If you want to validate the data structure or do anything you'd traditionally do with a schema, you need an application profile. With an application profile you are essentially making a specification stating "graph structures matching these patterns are valid".

Core Vocabularies in a Nutshell

As mentioned, the idea of a Core Vocabulary is to describe semantically the resources (entities) you will be using to describe your data with. In other words, what typically ends up as a conceptual model documentation or diagram, is now described by a formal model.

...

The OWL language has multiple profiles for different kind of inferencing. The one currently selected for the FI-Platform (OWL 2 EL) is computationally simple, but still logically expressive enough to fulfill most modeling needs. An important reminder when doing core vocabulary modeling is to constantly ask: is the feature I am after part of a specific use case (and thus application profile) or is it essential to the definition of these concepts?

Application Profiles in a Nutshell

Application profiles fill the need to not only validate the meaning and semantic consistency of data and specifications, but to enforce a specific syntactic structure and contents for data.

...

Following the key Semantic Web principles, SHACL validation is not based on whitelisting (deny all, permit some) like traditional closed schema definitions. Instead, SHACL works by validating the patterns we are interested in and ignoring everything else. Due to the nature of RDF data, this doesn't cause problems, as we can simply dump all triples from the dataset that are not part of the validated patterns. Also, it is possible to extend SHACL validation by SHACL-SPARQL or SHACL Javascript extensions to perform a vast amount of pre/postprocessing and validation of the data, though this is not currently supported by the FI-Platform nor within the scope of this document.

...

Core Vocabulary modeling

When modeling a core vocabulary, you are essentially creating three types of resources:

Attributes

Attributes are in principle very similar to attribute declarations in other data modeling languages. There are some differences nevertheless that you need to take into account:

  1. Attributes can be used without classes. For an attribute definition, one can specify rdfs:domain and/or rdfs:range. The domain refers to the subject in the <subject, attribute, literal value> triple, and range refers to the literal value. Basically what this means is that when such a triple is found in the data, its subject is assumed to be of the type specified by rdfs:domain, and the datatype is assumed to be of the type specified by rdfs:range.
  2. The attribute can be declared as functional, meaning that when used it will only have at most one value. As an example, one could define a functioanl attribute called age with a domain of Person. This would then indicate that each instance of Person can have at most one literal value for their age attribute. On the other hand, if the functional declaration is not used, the same attribute (e.g. nickname) can be used to point to multiple literal values.
  3. Attribute datatypes are by default XSD datatypes, which come with their own datatype hierarchy (see here).
  4. In core vocabularies it is sometimes preferable to define attribute datatype on a very general level, for example as rdfs:Literal. This allows using the same attribute in a multitude of application profiles with the same intended semantic meaning but enforcing a context-specific precise datatype in each application profile.
  5. Attributes can have hierarchies. This is an often overlooked but useful feature for inferencing. As an example, you could create a generic attribute called Identifier that represents the group of all attributes that act as identifiers. You could then create sub-attributes, for example TIN (Tax Identification Number), HeTu (the Finnish personal identity code) and so on.
  6. Attributes can have explicit equivalence declarations (i.e. an attribute in this model is declared to be equivalent to some other attribute).

Associations

Associations are similarly not drastically different compared to other languages. There are some noteworthy things to consider nevertheless:

  1. Associations can be used without classes as well. The rdfs:domain and rdfs:range options can here be used to define the source and target classes for the uses of a specific association. As an example, the association hasParent might have Person as both its domain and range, meaning that all triples using this association are assumed to describe connections between instances of Person.
  2. Associations in RDF are binary, meaning that the triple <..., association, ...> will always connect two resources with the association acting as the predicate.
  3. Associations can have hierarchies similarly to attributes.
  4. Associations have flags for determining whether they are reflexive meaning that both the subject and object of the association are assumed to be the same resource, whether they are transitive (meaning that if classes A and B as well as B and C are connected by association X, then this is equivalent to declaring that class A is connected to C by association X).
  5. Associations can have explicit equivalence declarations (i.e. an association in this model is declared to be equivalent to some other association).

Classes

Classes form the most expressive backbone of OWL. Classes can simply utilize the rdfs:subClassOf association to create hierarchies, but typically classes contain property restrictions - in the current FI-Platform case really simple ones. A class can simply state existential restrictions requiring that the members of a class must contain specific attributes and/or associations. Further cardinality restrictions are not declared here, as the chosen OWL profile does not support them, and cardinality can be explicitly defined in an application profile. In order to require specific associations or attributes to be present in an instance of a class, they must exist, as associations and attributes are never owned by a class, unlike in e.g. UML. They are individual definitions that are simply referred to by the class definition. This allows for situations where an extremely common definition (for example a date of birth or surname) can be defined only once in one model and then reused endlessly in all other models without having to be ever redefined.

...

Similarly to associations and attributes, classes have equivalence declarations. Additionally, classes can be declared as non-intersecting. It is important to understand that classes being sets doesn't by default in any way force them to be strictly separated. From the perspective of the inference reasoner, classes for inanimate objects and people could well be overlapping, unless it is explicitly declared logically inconsistent. With a well laid out class hierarchy, simply declaring a couple of superclasses as non-intersecting will automatically make all their subclasses non-intersecting as well.

...

Application profile modeling 

With application profiles we use strictly separate set of terms to avoid mixing up the core vocabulary structures we are validating and the validating structures themselves. The application profile entities are called restrictions:

Attribute and association restrictions

These restrictions are tied to specific attribute and association types that are used in the data being validated. Creating a restriction for a specific core vocabulary association allows it to be reused in one or more class restrictions. In the future the functionality of the FI-Platform might be extended to cover using attribute and association restrictions individually without class restrictions, but currently this is not possible.

...

For association restrictions, the currently supported extra restriction is the class type requirement for the association restriction target (i.e. what type of an instance must be at the object end of the association).

Class restrictions

Similarly to core vocabulary classes, also class restrictions utilize a group of predefined attribute and association definitions. Again, this allows for example the specification of some extremely reusable association and attribute restrictions which can then be reused a multitude of times in various application profiles.

...

Class restrictions don't operate in a set-theoretical manner like core vocabulary definitions, but there is a way to implement "inheritance" in validated classes. If a class restriction utilizes another class restriction, its target classes contents are checked against both of these class restrictions.

General word of caution on modeling

SHACL is a very flexible language and due to this nature it allows the creation of validation patterns that might seem legit but are actually unsatisfiable by any instance data. As an example, the utilization of other class restrictions might lead to a situation where an attribute can never be validated as it is required to conform to two conflicting datatypes at the same time.

...