SWDWG Amsterdam F2F October 2007

Topic: SKOS Label Relations
Discussion of Option 2 (Guus' "Simple Extension" Proposal)

This document gives some discussion of Guus' "Simple Extension" proposal for stating label relations in SKOS, as input to the "Label Relations" topic at the October 2007 SWDWG F2F meeting in Amsterdam.

Latest Version: http://purl.org/net/skos/2007/10/f2f/label-relations.html

$Revision: 1.3 $ on $Date: 2007/10/03 11:04:48 $

Contents


Introduction

The basic requirement is to be able to assert relationships between "lexical labels", to indicate e.g. a relationship between an acronym and it's expansion, or an abbreviation and its expansion, or a literal translation between labels in different languages.

Note that the notion of a "lexical label" was invented for SKOS, and has never been precisely defined. However, originally, there was intended to be a close correspondance between "lexical labels" and RDF plain literals. I.e. a "lexical label" has at least has two components, it's lexical form (a UNICODE string) and it's language (denoted by e.g. a valid [RFC-4646] tag). See also [LABEL-SEMANTICS].

RDF does not allow plain literals to occur in the subject position of a triple, therefore there is no direct way to make statements about a plain literal in RDF.

One work-around for this restriction is to use the n-ary relations design pattern. This is the basis for the [MINIMAL-LABEL-RELATION] proposal.

Another work-around is to define a new class of resources to represent "lexical labels". This is the basis for the [SIMPLE-EXTENSION] proposal.

This document discusses the [SIMPLE-EXTENSION] proposal, and the practical implications of introducing a new class of resource in SKOS to denote a class of "lexical labels" or similar.


Alternative Semantics of skos:Label

The basic idea of [SIMPLE-EXTENSION] is that we introduce a new class ... skos:Label.

We can then state that some resource is an instance of skos:Label, for example:

ex:cow rdf:type skos:Label.

We then need some property to state a corresponding plain literal value of this instance. In [SIMPLE-EXTENSION], rdfs:label is used. However, this makes it harder to illustrate the different possible semantics for the skos:Label class. Therefore, for discussion here, I will invent skos:plainLiteralValue, and assume that the domain of skos:plainLiteralValue is the class skos:Label, and the range is the class of RDF plain literals. For example:

ex:cow rdf:type skos:Label; skos:plainLiteralValue "cow"@en.

We then need some property to state relationships between two instances of skos:Label. Following [SIMPLE-EXTENSION], let's use skos:labelRelation. For example:

ex:cow rdf:type skos:Label; skos:plainLiteralValue "cow"@en.
ex:vache rdf:type skos:Label; skos:plainLiteralValue "vache"@fr.
ex:cow skos:labelRelation ex:vache.

Of course, skos:labelRelation would not be used directly, but as an extension point for other properties representing more specific types of label relationship, such as acronym or literal translation relationships.

Now consider two basic questions.

Firstly, can an instance of skos:Label have more than one plain literal value?

For example, is the following graph allowed (i.e. consistent):

ex:foo rdf:type skos:Label; 
  skos:plainLiteralValue "foetus"@en; 
  skos:plainLiteralValue "fetus"@en.

...?

Secondly, can two different instances of skos:Label have the same plain literal value?

For example, is the following graph allowed (i.e. consistent):

ex1:cow rdf:type skos:Label; skos:plainLiteralValue "cow"@en.
ex2:cow rdf:type skos:Label; skos:plainLiteralValue "cow"@en.
ex1:cow owl:differentFrom ex2:cow.

...?

Given that each question can have a yes-or-no answer, this gives us four possible options.

Each of these four options gives a fundamentally different semantics, and each of these options leads to a very different "story" about what an instance of skos:Label actually "is".

[SIMPLE-EXTENSION] actually rules out two of these four options (it says "no" to the first question), but it does not specify which of the remaining two it chooses. That is why there are actually two variants of the [SIMPLE-EXTENSION] proposal, which I will attempt to illustrate below. For completeness, I will attempt to illustrate all four possible options.

A Note on Binary Relations, Functions, Inverse Functions and Bijections

What we are looking at here are the possible options for the relationship between the set of instances of skos:Label and the set of RDF plain literals.

The essential difference between the four possible options for this relationship is captured in the difference between four mathematical concepts: a binary relation between two sets (many-to-many), a functional relation (many-to-one), an inverse functional relation (one-to-many), and a bijection (one-to-one).

N.B. the set of instances of skos:Label is also called the Class Extension of skos:Label, which is written in [RDF-SEMANTICS] as ICEXT(I(skos:Label)).

Binary Relation (Many-to-Many)

If we answer "yes" to both of the questions above, then there is a many-to-many relationship between the class extension of skos:Label and the set of RDF plain literals.

In the diagram below, the ellipse on the left depicts the class extension of skos:Label, and the ellipse on the right depicts the set of all RDF plain literals. The diagram as a whole depicts a binary relation between these two sets. The diagram shows a many-to-many relationship between the two sets.

Image of a binary relation

This diagram as a whole is also a depiction of the Property Extension of skos:plainLiteralValue.

Functional Relation (Many-to-One)

If we answer "no" to the first question, but "yes" to the second question, then there is a many-to-one relationship between the class extension of skos:Label and the set of RDF plain literals.

A many-to-one relationship between two sets is also called a functional relation (or simply function). The diagram below illustrates a functional relation.

Image of a functional relation

This semantics is captured in the following triples:

skos:plainLiteralValue rdf:type owl:FunctionalProperty.

Inverse Functional Relation (One-to-Many)

If we answer "yes" to the first question, but "no" to the second, then there is a one-to-many relationship between the class extension of skos:Label and the set of RDF plain literals.

A one-to-many relationship between the two sets is also called an inverse functional relation. The diagram below illustrates an inverse functional relation.

Image of an inverse functional relation

This semantics is captured in the following triples:

skos:plainLiteralValue rdf:type owl:InverseFunctionalProperty.

Bijection (One-to-One)

If we answer "no" to both questions, then there is a one-to-one relationship between the class extension of skos:Label and the set of RDF plain literals.

A one-to-one relationship between two sets is also called a bijection, which is illustrated below.

Image of a bijection

This semantics is captured in the following triples:

skos:plainLiteralValue rdf:type owl:FunctionalProperty, owl:InverseFunctionalProperty.

Consequences

As I mentioned above, [SIMPLE-EXTENSION] rules out two of these four options. It rules out a many-to-many relation, and it rules out a one-to-many relation.

However, for completeness, let us consider these two options briefly.

Many-to-Many / One-to-Many

If we allowed there to be one or more plain literal values for any instance of skos:Label, then an example like the one given above (repeated below) is allowed.

ex:foo rdf:type skos:Label; 
  skos:plainLiteralValue "foetus"@en; 
  skos:plainLiteralValue "fetus"@en.

The difficulty with these two options is then defining the circumstances under which it makes sense to allow more than one plain literal value for an instance of skos:Label. The same problem is encountered when we try to tell a "story" about what an instance of skos:Label actually "is". E.g. could we say that an instance of this class is a sequence of phonemes? We immediately get into the modeling of natural language, which is not something I am an expert on, and is not necessarily relevant to our use cases (which are mostly focused on information retrieval). If there is an existing model of language we could adopt, then these options might become feasible, but even then I'm not sure it's worth the effort.

One-to-One

If there is one and only one instance of skos:Label for every RDF plain literal, then neither of the examples given above are allowed.

Under these circumstances, the class skos:Label is effectively equivalent to the class of RDF plain literals.

There are some important consequences here.

Firstly, a number of inferences are licensed. For example, the graph below:

ex:foo rdf:type skos:Label;
  skos:plainLiteralValue "cow"@en;
  dcterms:modified "2007-09-09".

ex:bar rdf:type skos:Label;
  skos:plainLiteralValue "cow"@en.

... entails ...

ex:bar dcterms:modified "2007-09-09".

I.e. if two instances of skos:Label have the same plain literal value, then they are the same thing.

The danger here is that people coming from the thesaurus community might see skos:Label as being like a "term" in some thesaurus, and attach all sorts of extra information to an instance of skos:Label such as status, source, creation date, modification date etc., not realising that this would be very inappropriate -- because when the data was merged with other thesaurus data, if two "terms" happen to share the same plain literal value, all sorts of inappropriate inferences would be drawn.

I.e. we would have to be very careful to tell a "story" about skos:Label which clearly explains that instances of this class are not like thesaurus terms. This would be quite a hard sell I think.

On the other hand, from a logical point of view, this option is good. We have a semantics for skos:Label which essentially makes it equivalent to the class of RDF plain literals. We know exactly the conditions under which two instances of this class are identical -- when they share the same lexical form and the same language -- and the conditions under which they are not identical -- when they have a different lexical form and/or different language. I.e. we can piggyback on the formal definition of literal equality given in [RDF-CONCEPTS].

In fact, the correspondance between skos:Label and the class of RDF plain literals is so close, that for all practical purposes, we might as well state that skos:Label is the class of RDF plain literals.

Many-to-One

If we allow two different instances of skos:Label to share the same plain literal value, then an example like the following is allowed:

ex1:cow rdf:type skos:Label; skos:plainLiteralValue "cow"@en.
ex2:cow rdf:type skos:Label; skos:plainLiteralValue "cow"@en.
ex1:cow owl:differentFrom ex2:cow.

What we have here is a semantics for skos:Label which at least compatible with its use to represent a "term" in a thesaurus. I.e. extra information could be attached to instances of skos:Label, without danger of inappropriate inferences, e.g.:

ex1:cow rdf:type skos:Label; 
  skos:plainLiteralValue "cow"@en;
  dcterms:created "2007-09-09".

ex2:cow rdf:type skos:Label; 
  skos:plainLiteralValue "cow"@en;
  dcterms:created "1903-05-05".

... is OK.

The difficulty here comes when we want to tell a "story" about what what an instance of skos:Label actually "is". This difficulty is closely related to the problem of defining under what circumstances two instances of skos:Label are identical. What determines the identity of an instance of this class? Can instances have owners, for example? Can they be "in" a concept scheme? Must they be "in" a concept scheme? If two instances have the same plain literal value, and are both "in" the same concept scheme, are they then the same thing?

Remember that we are not only catering for thesauri, but for taxonomies, subject heading systems and classification schemes. Therefore we can't tell a story based entirely on the idea of a "thesaurus term". We have to have a story that is agnostic, and that bridges across these paradigms.


Summary

There are two possible variants of the [SIMPLE-EXTENSION] proposal. In one variant, there is a one-to-one relationship (bijection) between the class extension of skos:Label and the set of RDF plain literals. In the second variant, there is a many-to-one relationship (function) between the class extension of skos:Label and the set of RDF plain literals.

Each of these two variants has significantly different logical consequences, and requires a significantly different "story" to be told. Therefore, each of these two variants should be considered as a distinct proposal in its own right.

To be considered as a proposal, each of these two variants needs further work, especially on the "story" that goes with it.


References

[SKOS-GUIDE]
http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102
[SKOS-SPEC]
http://www.w3.org/TR/2005/WD-swbp-skos-core-spec-20051102
[ISSUE-31]
http://www.w3.org/2006/07/SWD/track/issues/31
[RDF-CONCEPTS]
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
[RDF-SEMANTICS]
http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
[RFC-4646]
http://www.ietf.org/rfc/rfc4646.txt
[LABEL-SEMANTICS]
http://purl.org/net/skos/2007/10/f2f/labelling-properties.html
[OWL-REFERENCE]
http://www.w3.org/TR/2004/REC-owl-ref-20040210/
[MINIMAL-LABEL-RELATION]
http://www.w3.org/2006/07/SWD/wiki/SkosDesign/RelationshipsBetweenLabels/ProposalFour
[SIMPLE-EXTENSION]
http://www.w3.org/2006/07/SWD/wiki/SkosDesign/RelationshipsBetweenLabels/ProposalThree
[REMOVE-RANGE]
http://lists.w3.org/Archives/Public/public-swd-wg/2007Sep/0013.html

--- Change Log ---
$Log: label-relations.html,v $
Revision 1.3  2007/10/03 11:04:48  ajm65
First completed draft.

Revision 1.2  2007/10/02 17:14:13  ajm65
Minor edita.

Revision 1.1  2007/10/02 17:09:19  ajm65
Initial check-in, part baked.