Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
indent0.5em2em
absoluteUrltrue

This page describes the principles of creating core vocabularies and application profiles. It is important to understand these principles so that any ambiguities are avoided, as the modeling languages and paradigm used here differ from the traditional conceptual model used by many data architects and modeling tools. For example, the UML specification grants a certain degree of conceptual freedom in both some diagram types as well as their interpretation, whereas the knowledge representation languages used on this platform are formally defined, i.e. all the modeling components have a precise meaning defined in mathematical logic.

...

In the digital world, how we organize and connect data can significantly influence how effectively we can use that information. Linked data knowledge representation languages (you can read this as i.e. "modeling languages"), such as OWL (Web Ontology Language) and RDFS (RDF Schema), are tools that help us define and interlink data in a meaningful way across the Internet. Let's break down what these models are and what they are used for, using concrete examples.

...

  • Data Integration: They facilitate the combination of data from diverse sources in a coherent manner. This can range from integrating data across different libraries to creating a unified view of information that spans multiple organizations.
  • Interoperability: A fundamental benefit of using linked data models is their ability to ensure interoperability among disparate systems. This means that data structured with OWL, RDFS, etc. can be shared, understood, and processed in a consistent way, regardless of the source. This capability is crucial for industries like healthcare, where data from various healthcare providers must be combined and made universally accessible and useful, or in supply chain management, where different stakeholders (manufacturers, suppliers, distributors) need to exchange information seamlessly.

  • Knowledge Management: These models help organizations manage complex information about products, services, and internal processes in a way that is easily accessible and modifiable. This structured approach supports more efficient retrieval and use of information.

  • Artificial Intelligence and Machine Learning: OWL, RDFS etc. provide a structured context for data, which is essential for training machine learning models. This structure allows AI systems to interpret data more accurately and apply learning to similar but previously unseen data. By using linked data models, organizations can ensure that their data is not only accessible and usable within their own systems but can also be easily linked to and from external systems. This creates a data ecosystem that supports richer, more connected, and more automatically processable information networks.

  • Enhancing Search Capabilities: By providing detailed metadata and defining relationships between data entities, these models significantly improve the precision and breadth of search engine results. This enriched search capability allows for more detailed queries and more relevant results.

When discussing linked data models, particularly in the context of the Semantic Web, there are two primary categories to consider: ontologies and schemas. Each serves a unique role in structuring and validating data. Let's explore these categories, specifically highlighting ontologies (like RDFS and OWL) and schemas (such as SHACL).

Ontologies

Ontologies provide a structured framework to represent knowledge as a set of concepts within a domain and the relationships between those concepts. They are used extensively to formalize a domain's knowledge in a way that can be processed by computers. Ontologies allow for sophisticated inferences and queries because they can model complex relationships between entities and can include rules for how entities are connected.

...

  • SPARQL (SPARQL Protocol and RDF Query Language), which is used to query RDF data. It allows users to extract and manipulate data stored in RDF format across various sources. SPARQL can be embedded in SHACL constraints to further increase its expressivity.

  • SKOS (Simple Knowledge Organization System), which is used for representing knowledge organization systems such as thesauri and classification schemes within RDF.

...

When transitioning from traditional modeling techniques like UML (Unified Modeling Language) or Entity-Relationship Diagrams (ERD) to linked data based modeling with tools like OWL, RDFS, and SHACL, practitioners encounter both conceptual and practical shifts. This chapter aims to elucidate these differences, providing a clear pathway for those accustomed to conventional data modeling paradigms to adapt to linked data methodologies effectively.

Conceptual Shifts

...

Graph-based vs. Class-based Structures

  • Traditional Models:
  • UML and ERD typically define rigid schemas intended to structure database systems where the schema must be defined before data entry and is difficult to change
  • Traditional data modeling, such as UML and ERD, uses a class-based approach where data is structured according to classes which define the attributes and relationships of their instances. These models create a somewhat rigid hierarchy where entities must fit into predefined classes, and interactions are limited to those defined within and between these classes.
  • Linked Data Models:
  • OWL, RDFS etc. allow for more flexible, dynamic schema definitions that can evolve over time without disrupting existing data. They support inferencing, meaning new relationships and data types can be derived logically from existing definitions.

From Closed to Open World Assumption:

  • Traditional Models: Operate under the closed world assumption where what is not explicitly stated is considered false. For example, if an ERD does not specify a relationship, it does not exist.
  • Linked Data Models: Typically adhere to the open world assumption, common in semantic web technologies, where the absence of information does not imply its negation, i.e. we cannot deduce falsity based on missing data. This approach is conducive to integrating data from multiple, evolving sources.

Entity Identification:

  • Contrastingly, linked data models, utilizing primarily RDF-based technologies, adopt a graph-based approach. In these models, data is represented as a network of nodes (entities) and edges (relationships) that connect them. Each element, whether data or a conceptual entity, can be directly linked to any other, allowing for dynamic and flexible relationships without the confines of a strict class hierarchy. This structure facilitates more complex interconnections and seamless integration of diverse data sources, making it ideal for expansive and evolving data ecosystems.

From Static to Dynamic Schema Definitions

  • Traditional Models: UML and ERD typically define rigid schemas intended to structure database systems where the schema must be defined before data entry and is difficult to change.
  • Linked Data Models: OWL, RDFS etc. allow for more flexible, dynamic schema definitions that can evolve over time without disrupting existing data. They support inferencing, meaning new relationships and data types can be derived logically from existing definitions.

From Closed to Open World Assumption

  • Traditional Models: Operate under the closed world assumption where what is not explicitly stated is considered false. For example, if an ERD does not specify a relationship, it does not exist.
  • Linked Data Models: Typically adhere to the open world assumption, common in semantic web technologies, where the absence of information does not imply its negation, i.e. we cannot deduce falsity based on missing data. This approach is conducive to integrating data from multiple, evolving sources.

Entity Identification

  • Traditional Models: Entities are identified within the confines of a single system or database, often using internal identifiers (e.g., a primary key in a database). Not all entities (such as attributes of a class) are identifiable without their context (i.e. one can't define an attribute with an identity and use it in two classes).
  • Linked Data Models: Linked data models treat all elements as atomic resources that can be uniquely identified and accessed. Each resource, whether it's a piece of data or a conceptual entity, is assigned a Uniform Resource Identifier (URI). This ensures that
  • Traditional Models: Entities are identified within the confines of a single system or database, often using internal identifiers (e.g., a primary key in a database). Not all entities (such as attributes of a class) are identifiable without their context (i.e. one can't define an attribute with an identity and use it in two classes).
  • Linked Data Models: Linked data models treat all elements as atomic resources that can be uniquely identified and accessed. Each resource, whether it's a piece of data or a conceptual entity, is assigned a Uniform Resource Identifier (URI). This ensures that every element in the dataset can be individually addressed and referenced, enhancing the accessibility and linkage of data across different sources.

Practical Shifts

Modeling Languages and Tools

...

  • Traditional Models: Use diagrammatic tools to visually represent entities, relationships, and hierarchies, often tailored for relational databases.
  • Linked Data Models: Employ declarative languages that describe data models in terms of classes, properties, and relationships that are more aligned with graph databases. These tools often focus on semantics and relationships rather than just data containment.

Data Integrity and Validation

...

  • Traditional Models: Data integrity is managed through constraints like foreign keys, unique constraints, and checks within the database system.
  • Linked Data Models: SHACL is used for validating RDF data against a set of conditions (data shapes), which can include cardinality, datatype constraints, and more complex logical conditions.

Interoperability and Integration

...

  • Traditional Models: Often siloed, requiring significant effort (e.g. ETL solutions, middleware, data federation) to ensure interoperability between disparate systems.
  • Linked Data Models: Designed for interoperability, using RDF (Resource Description Framework) as a standard model for data interchange on the Web, facilitating easier data merging and linking.

Transition Strategies

Understanding Semantic Relationships

...

  • Invest time in understanding how OWL and RDFS manage ontologies, focusing on how entities and relationships are semantically connected rather than just structurally mapped.

Learning New Validation Techniques

...

  • Learn SHACL to understand how data validation can be applied in linked data environments, which is different from constraint definition in relational databases.

Adopting a Global Identifier Mindset

...

  • Embrace the concept of using URIs for identifying entities globally, which involves understanding how to manage and resolve these identifiers over the web.
  • It is also worth learning about how URIs differ from URNs and URLs, how they enable interoperability with other identifier schemas (such as using UUIDs), what resolving identifiers means, and how URIs and their namespacing can be used to use URIs in a local scope.

Linked Data Modeling

...

in practice

Conclusion

...

You might already have a clear-cut goal for modeling, or alternatively be tasked with a still loosely defined goal of improving the management of knowledge in your organization. As a data architect, you're accustomed to dealing with REST APIs, PostgreSQL databases, and UML diagrams. But the adoption of technologies like RDF, RDFS, OWL, and SHACL can elevate your data architecture strategies. Here is a short list explaining some of the most common use-cases as they would be implemented here:

Conceptual and logical models and database schemas

For conceptual and logical models, you should create a Core Vocabulary (i.e. an OWL ontology). You can do both lightweight conceptual modeling by merely specifying classes and their interrelationships with associations, or a more complex and comprehensive model by including attributes, their hierarchies and attribute/relationship-based constraints. In either case, the same OWL ontology acts both as a formally defined conceptual and logical model, there is no need for an atrificial separation of the two. This also helps to avoid inconsistencies cropping up between the two. The primary advantage of basing these models on OWL is the ability to use inferencing engines to logically validate the internal consistency of the created models, which is not possible with traditional conceptual and logical models.

For relational-RDF mappings, please check the following resources:

Regarding database schemas, even for relational databases, conceptual and logical models are necessary to define the structure and interrelationships of data. In the case 

RDFS and OWL can also be used to document the schema of an existing relational database in RDF format. This can serve as an interoperability layer to help integrate the relational database with other systems, facilitating easier data sharing and querying across different platforms.

You should create 

  • Defining API specifications:
  • Defining message schemas:

Which one to create: a Core Vocabulary or Application profile?

...