SWDWG Amsterdam F2F October 2007

Topic: SKOS Labelling Properties

This document gives background information and suggested resolutions for the "Labelling Properties" topic at the October 2007 SWDWG F2F meeting in Amsterdam.

Latest Version: http://purl.org/net/skos/2007/10/f2f/labelling-properties.html

$Revision: 1.2 $ on $Date: 2007/10/02 12:09:21 $

Contents


Introduction

Scope

This topic concerns three URIs from the SKOS vocabulary:

These URIs denote SKOS's "lexical labelling properties". This topic does not include or concern or have any dependencies on any other URIs from the SKOS vocabulary.

Goals

I suggest our goals for this topic be to agree:

  1. a semantics for these three URIs, and
  2. how to specify those semantics.

Background

The most recent specifications concerning these properties are the section on "Labelling Properties" in [SKOS-GUIDE] (ignore the sub-section on symbolic labelling) and the relevant 3 sections of the properties table in [SKOS-SPEC].

A while ago I raised [ISSUE-31], which asks some questions about the intended semantics of these three properties.

Approach

You would have thought three simple properties like these would be easy to deal with, but this topic is surprisingly complicated, and takes us deep into the "nitti gritti" of both RDF and OWL!

The intended semantics are -- I think -- fairly obvious, and agreeing informally on the semantics should (hopefully) be easy.

However, the issue is complicated because there are:

To try and make things easier, I've broken this topic down into sub-topics below, each of which should be the focus of a decision by the WG. Some topics discuss what the semantics should be, other topics discuss alternatives for formally specifying and implementing those semantics:


Sub-Topic A: Range Semantics?

The three SKOS lexical labelling properties were, originally, intended to be used with RDF plain literals in the object position of a triple. (RDF plain literals are formally defined in [RDF-CONCEPTS].)

So, for example, [SKOS-GUIDE] gives the following example:

ex:shrubs 
  skos:prefLabel "shrubs"@en;
  skos:altLabel "bushes"@en;
  skos:prefLabel "arbuste"@en;
  skos:altLabel "buisson"@fr.

I.e. the notion of a "lexical label" in SKOS was, originally, tightly coupled to the notion of a plain literal in RDF. The phrase "lexical label" was chosen, rather than e.g. "plain literal label", because it was thought that most people would be coming to SKOS with little prior knowledge of RDF and without a computer science background, and would recognise and understand "lexical" (of or relating to words) better than "literal" (a computer science concept).

However, it has been suggested that the range of these properties might be allowed to include not only plain literals, but also other types of resource.

Resolution

There are two options here:

  1. State that the range of these properties *is* the class of RDF plain literals (i.e. the range is closed).
  2. State that the range of these properties *contains* the class of RDF plain literals, but may also contain other classes of resource (i.e. the range is left open).

I prefer the first option. My main reason for this preference is that, as will become obvious from the discussion below, the semantics already get quite complicated with the range closed as the class of RDF plain literals. Allowing for an open-ended range will introduce additional complexity, which I'm not sure I know how to handle.


Sub-Topic B: Disjoint Properties?

Originally, these three properties were intended to be pairwise disjoint, although this was never stated explicitly. That is, a preferred label cannot also be an alternative label or a hidden label; and an alternative label cannot also be a hidden label.

I.e. there is something intuitively wrong with each of the following graphs:

ex:foo skos:prefLabel "foo"@en; skos:altLabel "foo"@en.

...

ex:foo skos:prefLabel "foo"@en; skos:hiddenLabel "foo"@en.

...

ex:foo skos:altLabel "foo"@en; skos:hiddenLabel "foo"@en.

Resolution

There is really only one option here:

  1. State that these three properties are pairwise disjoint.

Sub-Topic C: Cardinality?

Intuitively, it's obvious that a resource can have no more than one preferred lexical label per language. In other words, two different labels in the same language cannot both be "preferred". This is already stated informally in [SKOS-GUIDE].

I.e. there is something intuitively wrong with the following graph:

ex:shrubs skos:prefLabel "shrubs"@en; skos:prefLabel "bushes"@en.

The matter of cardinality is complicated by languages spoken differently in different regions (e.g. the English Language, British English, US English), by the usage of different scripts in written languages (E.g. the Japanese Language, Japanese Hiragana, Japanase Katakana), and by other variants of a natural language.

Because there is a natural relationship between a language spoken in a specific region, such as British English, and its "parent" language, it could be argued that for example, the graph below...

ex:shrubs skos:prefLabel "shrubs"@en-GB.

... should entail ...

ex:shrubs skos:prefLabel "shrubs"@en.

If this entailment were allowed, this would lead to subtle issues with the cardinality skos:prefLabel, because for example the following graph would lead to a violation of the intuitive cardinality constraint given above...

ex:shrubs skos:prefLabel "shrubs"@en-GB; skos:prefLabel "bushes"@en.

However, if we treat the English Language, British English and US English as 3 distinct "languages", then there is no problem. Similarly, if we treat the Japanese Language, Japanese Hiragana and Japanese Katakana as 3 distinct "languages", then there is no problem.

This is equivalent to assuming that each distinct tag allowed by [RFC-4646] denotes a distinct "language".

Resolution

I think there is only one option here:

  1. Assume that each distinct tag allowed by [RFC-4646] denotes a distinct "language", and use this special meaning for the word "language" when stating the cardinality constraint for skos:prefLabel (i.e. that there cannot be more than one preferred lexical label per "language").

Sub-Topic D: Super-Property?

In the most recent specifications, these three properties are stated as sub-properties of rdfs:label.

Is this a useful statement? Should it be retained?

Resolution

This still seems valid, and I cannot see any reason to drop it, so I suggest we:

  1. Retain the statement that SKOS lexical labelling properties are sub-properties of rdfs:label.

Sub-Topic E: Formally Stating the Range Semantics?

If we adopt my suggested resolution to the range semantics (that the range of these three properties be closed as the class of RDF plain literals), how should this then be formally stated?

We could use RDF triples (e.g. skos:prefLabel rdfs:range ex:PlainLiteral.) but unfortunately there is no official URI denoting the class of RDF plain literals.

Note that the class of RDF plain literals is a sub-class of rdfs:Literal, which also contains XML literals and typed literals. Therefore, the following graph...

skos:prefLabel rdfs:range rdfs:Literal.
skos:altLabel rdfs:range rdfs:Literal.
skos:hiddenLabel rdfs:range rdfs:Literal.

...whilst not incorrect, does not fully express our intended semantics.

Resolution

There are two options:

  1. Coin a URI for the class of RDF plain literals, and use it to formally declare the range of the SKOS lexical labelling properties in axiomatic RDF triples.
  2. State in normative prose that the range of the SKOS lexical labelling properties is actually the class of RDF plain literals; retain the current declaration in RDF triples that the range of these properties is rdfs:Literal.

I don't think there would be any value in option 1, so I suggest option 2.


Sub-Topic F: Literal Object Syntax Constraint?

There is a subtle but important complication here, due to the semantics of rdfs:range in both the RDF and OWL Full semantics.

Even if the range of these three properties is the class of RDF plain literals, this would not make the following graph inconsistent:

ex:foo skos:prefLabel ex:bar.

The only thing to be gained by the range semantics is the inference that the URI ex:bar denotes an RDF plain literal, which is probably not very useful.

If, however, we believe there is something wrong with the graph above, because we believe only RDF plain literals should ever appear in the object position of such a triple in an RDF graph, then we would need to do something completely different.

We would need to state constraints on the RDF abstract syntax. However, as far as I know, there is no standard way to formally express constraints over the RDF abstract syntax.

So there are two problems here: (1) how do you state syntax constraints for RDF? and (2) what should those constraints be for SKOS lexical labelling properties (if any)?

Before diving into the detail, I should say that the possible benefit of having syntax constraints is that it allows applications to do some simple "validation" at the graph syntax level. This makes for a stricter specification, which promotes interoperability (at the cost of flexibility) and makes implementation simpler (because there are less options to deal with). More on this below.

The only precedent I know of for declaring constraints over the RDF abstract syntax is the set of constraints imposed by the OWL DL language. Those constraints are, however, implicit in the mapping between the OWL abstract syntax and RDF triples. I would not recommend this approach for SKOS (i.e. to define a separate abstract syntax for SKOS) because it would seriously complicate the specifications.

One way to express an RDF syntax constraint is to use normative prose, with the keywords MUST, SHOULD, MAY etc.

For example, we could state that:

Where skos:prefLabel, skos:altLabel or skos:hiddenLabel appears in the predicate position of a triple, the object MUST be an RDF plain literal.

A syntax constraint like this gives a stricter specification. This might be good, because it would lead to tighter interoperability and simpler implementations. This might be bad, because it gives less room for manouevre where people might want to experiment, for example by using URIs to denote plain literals.

Personally, I favour a stricter specification. If people want to experiment, they can always invent new properties.

Another way to express an RDF syntax constraint is to use a SPARQL graph pattern, where the binding of any variables in the pattern indicates a violation of the constraint in the matching graph.

For example:

{
  ?x skos:prefLabel ?y.
  FILTER (!isLiteral(?y))
}

(...although note that the SPARQL operator isLiteral() returns true for any RDF literal, included typed literals, so the syntax constraint expressed by this pattern is not equivalent to the normative prose stated above.)

There are further complications.

Suppose we adopt the syntax constraint stated in normative prose above. How should applications implement this constraint? Should an application consuming SKOS data generate a fatal error if it encounters a graph which violates the constraint? Or should it quietly recover according to some standard error-handling strategy?

Also, given that RDF data can be merged from multiple graphs, each individual graph might be valid, but the merge might be invalid. In practice, this means that 2 different applications might each produce perfect SKOS data, but an application consuming the merged data might find problems.

Initially, I thought it would be better to define ways in which applications could quietly handle any syntax problems, rather than forcing them to generate fatal errors. This was the motivation for the proposal I made at [LABEL-SEMANTICS], in which the syntax constraint stated above is instead given by the following statement:

An application MAY ignore any triple in an RDF graph where the predicate is either skos:prefLabel, skos:altLabel or skos:hiddenLabel and the object is NOT a plain literal.

However, this proposal is complicated, and has a number of potential problems as highlighted in [VATANT].

Resolution

I don't have any clear options for this at present.

I only know that, if we do want syntax constraints, then we should also state how applications MUST/SHOULD/MAY handle violations of those constraints.

We should also separate syntax constraints from application behaviour. I.e. we should state syntax constraints without talking about application behaviour, then state how applications MUST/SHOULD/MAY implement those constraints. I think I made the mistake of confusing these two considerations in the proposal I made at [LABEL-SEMANTICS].


Sub-Topic G: Formally Stating Disjointness?

Unfortunately, neither RDF nor OWL has a concept of disjoint properties.

We could use normative prose to state a semantic condition on the interpretation of the three properties, for example:

The property extensions of skos:prefLabel, skos:altLabel and skos:hiddenLabel are pairwise disjoint.

We could also go a bit further, and follow the conventions and definitions set out in [RDF-SEMANTICS] for stating semantic conditions on the interpretation of some RDF vocabulary, for example:

For any resource x, the sets { y | <x,y> is in IEXT(I(skos:prefLabel)) }, { y | <x,y> is in IEXT(I(skos:altLabel)) } and { y | <x,y> is in IEXT(I(skos:hiddenLabel)) } are pairwise disjoint.

Note that expressing the disjointness of the SKOS lexical labelling properties in semantic conditions requires a notion of SKOS-interpretation and SKOS-inconsistency.

We could also handle disjointness entirely at the syntax level (see next sub-topic).

Resolution

I'm not sure what to recommend here. We could state a semantic condition for completeness, but I would not expect anyone to implement it in a reasoner -- it's possible, and easier, to test for "inconsistencies" by looking for graph patterns (i.e. by checking the syntax).


Sub-Topic H: Disjointness Syntax Constraint?

As mentioned above, it would be possible to state syntax constraints which effectively capture our intension for the three properties to be disjoint.

For example, as normative prose:

A graph MUST NOT contain a triple with skos:prefLabel as predicate and a triple with skos:altLabel as predicate, where both triples share the same subject and object.

The same constraint, expressed as a SPARQL graph pattern matching an "invalid" graph:

{
  ?x skos:prefLabel ?y.
  ?x skos:altLabel ?y.
}

The advantage of syntax constraints is that they are (arguably) easier to implement than a reasoner, e.g. by using SPARQL.

Note that we could have both semantic conditions and syntax constraints -- these options are not mutually exclusive.

However, we still have the problem of stating how applications should handle a violation of these constraints. Should they generate a fatal error? Should they quietly recover, and if so, how?

Resolution

Again, I'm not sure what to recommend here. Syntax constraints are undoubtedly useful, but should they be part of the normative specification, or should they be informative? How should applications handle violations?


Sub-Topic I: Formally Stating Cardinality?

Our intuitive cardinality constraint -- that there cannot be more than one preferred lexical label per "language" for any given resource -- cannot be expressed as axiomatic triples using either RDF or OWL vocabularies. This is because the cardinality has to take into account the value of the language tag on an RDF plain literal.

This can be stated as a semantic condition, following the conventions and definitions set out in [RDF-SEMANTICS], for example:

For any resource x, all members of the set { y | <x,y> is in IEXT(I(skos:prefLabel)) } are RDF plain literals and no two members of this set share the same language tag.

However, again, this semantics can also be effectively captured in a syntax constraint (see sub-topic below).

Resolution

Again, I'm not sure what to recommend here. As with disjointness, we could state a semantic condition for completeness, but I would not expect anyone to implement it in a reasoner -- it's possible, and easier, to test for "inconsistencies" by looking for graph patterns (i.e. by checking the syntax).


Sub-Topic J: Cardinality Syntax Constraint?

As mentioned above, it would be possible to state a syntax constraint which effectively captures our intension that there cannot be more than one preferred lexical label per "language" for any given resource.

For example, as normative prose:

A graph MUST NOT contain two or more triples with the same subject, where the predicate is skos:prefLabel, and where the objects are plain literals with the same language tag.

For example, as a SPARQL graph pattern matching an "invalid" graph:

{
  ?x skos:prefLabel ?y, ?z.
  FILTER ( (str(?y) != str(?z)) && (lang(?y) = lang(?z)) )
}

Resolution

Again, I'm not sure what to recommend. Syntax constraints are undoubtedly useful, but should they be part of the normative specification, or should they be informative? How should applications handle violations?


Sub-Topic K: OWL Property Type?

OWL has the notions of Datatype Properties and Object Properties. See section 4 of [OWL-REFERENCE] for some important details on these two categories of property.

OWL also has a notion of Annotation Properties, although note that these are only needed for reasoning in OWL DL. If we are building a semantics for SKOS on OWL Full, we can ignore Annotation Properties.

Note also that, in OWL Full, object properties and datatype properties are not disjoint. Because data values can be treated as individuals, datatype properties are effectively subclasses of object properties. In OWL Full owl:ObjectProperty is equivalent to rdf:Property.

Should the SKOS lexical labelling properties be instances of owl:ObjectProperty or owl:DatatypeProperty?

Resolution

The resolution depends on our decision for sub-topic A (Range Semantics). If we choose to set the range as the class of RDF plain literals, then these properties should be instances of owl:DatatypeProperty. If we choose an open-ended range, then we can only say they are instances of owl:ObjectProperty.


Summary

Wow! As I said at the beginning, who would have thought three little properties could cause so many problems!

Hopefully, the first step will be to agree on the actual semantics of these three properties, stated informally. I.e. to agree a resolution on sub-topics A, B, C and D.

Then we can discuss how best to state the semantics formally (if at all), and whether syntax constraint might be useful, as either normative or informative parts of the specification.


References

[SKOS-GUIDE]
http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102
[SKOS-SPEC]
http://www.w3.org/TR/2005/WD-swbp-skos-core-spec-20051102
[ISSUE-31]
http://www.w3.org/2006/07/SWD/track/issues/31
[RDF-CONCEPTS]
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
[RDF-SEMANTICS]
http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
[RFC-4646]
http://www.ietf.org/rfc/rfc4646.txt
[LABEL-SEMANTICS]
http://lists.w3.org/Archives/Public/public-swd-wg/2007Jun/0170.html
[VATANT]
http://lists.w3.org/Archives/Public/public-esw-thes/2007Jun/0039.html
[OWL-REFERENCE]
http://www.w3.org/TR/2004/REC-owl-ref-20040210/

--- Change Log ---
$Log: labelling-properties.html,v $
Revision 1.2  2007/10/02 12:09:21  ajm65
Added reference links. Minor edit to discussion of languages and language tags.

Revision 1.1  2007/10/02 11:43:39  ajm65
First completed draft. TODO reference links.