Copyright ©2001 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
XML Schema: Structures specifies the XML Schema definition language, which offers facilities for describing the structure and constraining the contents of XML 1.0 documents, including those which exploit the XML Namespace facility. The schema language, which is itself represented in XML 1.0 and uses namespaces, substantially reconstructs and considerably extends the capabilities found in XML 1.0 document type definitions (DTDs). This specification depends on XML Schema Part 2: Datatypes.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This document has been reviewed by W3C Members and other interested parties and has been endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited as a normative reference from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.
This document has been produced by the W3C XML Schema Working Group as part of the W3C XML Activity. The goals of the XML Schema language are discussed in the XML Schema Requirements document. The authors of this document are the XML Schema WG members. Different parts of this specification have different editors.
This version of this document incorporates some editorial changes from earlier versions.
Please report errors in this document to www-xml-schema-comments@w3.org (archive). The list of known errors in this specification is available at http://www.w3.org/2001/05/xmlschema-errata.
The English version of this specification is the only normative version. Information about translations of this document is available at http://www.w3.org/2001/05/xmlschema-translations.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
This document sets out the structural part (XML Schema: Structures) of the XML Schema definition language.
Chapter 2 presents a Conceptual Framework (§2) for XML Schemas, including an introduction to the nature of XML Schemas and an introduction to the XML Schema abstract data model, along with other terminology used throughout this document.
Chapter 3, Schema Component Details (§3), specifies the precise semantics of each component of the abstract model, the representation of each component in XML, with reference to a DTD and XML Schema for an XML Schema document type, along with a detailed mapping between the elements and attribute vocabulary of this representation and the components and properties of the abstract model.
Chapter 4 presents Schemas and Namespaces: Access and Composition (§4), including the connection between documents and schemas, the import, inclusion and redefinition of declarations and definitions and the foundations of schema-validity assessment.
Chapter 5 discusses Schemas and Schema-validity Assessment (§5), including the overall approach to schema-validity assessment of documents, and responsibilities of schema-aware processors.
The normative appendices include a Schema for Schemas (normative) (§A) for the XML representation of schemas and References (normative) (§B).
The non-normative appendices include the DTD for Schemas (non-normative) (§G) and a Glossary (non-normative) (§F).
This document is primarily intended as a language definition reference. As such, although it contains a few examples, it is not primarily designed to serve as a motivating introduction to the design and its features, or as a tutorial for new users. Rather it presents a careful and fully explicit definition of that design, suitable for guiding implementations. For those in search of a step-by-step introduction to the design, the non-normative [XML Schema: Primer] is a much better starting point than this document.
The purpose of XML Schema: Structures is to define the nature of XML schemas and their component parts, provide an inventory of XML markup constructs with which to represent schemas, and define the application of schemas to XML documents.
The purpose of an XML Schema: Structures schema is to define and describe a class of XML documents by using schema components to constrain and document the meaning, usage and relationships of their constituent parts: datatypes, elements and their content and attributes and their values. Schemas may also provide for the specification of additional document information, such as normalization and defaulting of attribute and element values. Schemas have facilities for self-documentation. Thus, XML Schema: Structures can be used to define, describe and catalogue XML vocabularies for classes of XML documents.
Any application that consumes well-formed XML can use the XML Schema: Structures formalism to express syntactic, structural and value constraints applicable to its document instances. The XML Schema: Structures formalism allows a useful level of constraint checking to be described and implemented for a wide spectrum of XML applications. However, the language defined by this specification does not attempt to provide all the facilities that might be needed by any application. Some applications may require constraint capabilities not expressible in this language, and so may need to perform their own additional validations.
The definition of XML Schema: Structures depends on the following specifications: [XML-Infoset], [XML-Namespaces], [XPath], and [XML Schemas: Datatypes].
See Required Information Set Items and Properties (normative) (§D) for a tabulation of the information items and properties specified in [XML-Infoset] which this specification requires as a precondition to schema-aware processing.
The section introduces the highlighting and typography as used in this document to present technical material.
Special terms are defined at their point of introduction in the text. For example [Definition:] a term is something used with a special meaning. The definition is labeled as such and the term it defines is displayed in boldface. The end of the definition is not specially marked in the displayed or printed text. Uses of defined terms are links to their definitions, set off with middle dots, for instance ·term·.
Non-normative examples are set off in boxes and accompanied by a brief explanation:
<schema targetNamespace="http://www.example.com/XMLSchema/1.0/mySchema">
The definition of each kind of schema component consists of a list of its properties and their contents, followed by descriptions of the semantics of the properties:
References to properties of schema components are links to the relevant definition as exemplified above, set off with curly braces, for instance {example property}.
The correspondence between an element information item which is part of the XML representation of a schema and one or more schema components is presented in a tableau which illustrates the element information item(s) involved. This is followed by a tabulation of the correspondence between properties of the component and properties of the information item. Where context may determine which of several different components may arise, several tabulations, one per context, are given. The property correspondences are normative, as are the illustrations of the XML representation element information items.
In the XML representation, bold-face
attribute names (e.g. count below) indicate a required
attribute information item, and the rest are
optional. Where an attribute information item has an enumerated type
definition, the values are shown separated by vertical bars, as for
size below; if there is a default value, it is shown
following a colon. Where an attribute information item has a built-in simple
type definition defined in [XML Schemas: Datatypes], a hyperlink to its
definition therein is given.
The allowed content of the information item is
shown as a grammar fragment, using the Kleene operators ?,
* and +. Each element name therein is a hyperlink to
its own illustration.
NOTE: The illustrations are derived automatically from the Schema for Schemas (normative) (§A). In the case of apparent conflict, the Schema for Schemas (normative) (§A) takes precedence, as it, together with the ·Schema Representation Constraints·, provide the normative statement of the form of XML representations.
example Element Information Item
<example
count = integer
size = (large | medium | small) : medium>
Content: (all | any*)
</example>
| Example Schema Component | ||||
|---|---|---|---|---|
|
References to elements in the text are links to the relevant illustration as exemplified above, set off with angle brackets, for instance <example>.
References to properties of information items as defined in [XML-Infoset] are notated as links to the relevant section thereof, set off with square brackets, for example [children].
Properties which this specification defines for information items are introduced as follows:
References to properties of information items defined in this specification are notated as links to their introduction as exemplified above, set off with square brackets, for example [new property].
The following highlighting is used for non-normative commentary in this document:
NOTE: General comments directed to all readers.
Following [XML 1.0 (Second Edition)], within normative prose in this specification, the words may and must are defined as follows:
Note however that this specification provides a definition of error and of conformant processors' responsibilities with respect to errors (see Schemas and Schema-validity Assessment (§5)) which is considerably more complex than that of [XML 1.0 (Second Edition)].
This chapter gives an overview of XML Schema: Structures at the level of its abstract data model. Schema Component Details (§3) provides details on this model, including a normative representation in XML for the components of the model. Readers interested primarily in learning to write schema documents may wish to first read [XML Schema: Primer] for a tutorial introduction, and only then consult the sub-sections of Schema Component Details (§3) named XML Representation of ... for the details.
An XML Schema consists of components such as type definitions and element declarations. These can be used to assess the validity of well-formed element and attribute information items (as defined in [XML-Infoset]), and furthermore may specify augmentations to those items and their descendants. This augmentation makes explicit information which may have been implicit in the original document, such as normalized and/or default values for attributes and elements and the types of element and attribute information items.
Schema-validity assessment has two aspects:
Throughout this specification, [Definition:] the word valid and its derivatives are used to refer to clause 1 above, the determination of local schema-validity.
Throughout this specification, [Definition:] the word assessment is used to refer to the overall process of local validation, schema-validity assessment and infoset augmentation.
This specification builds on [XML 1.0 (Second Edition)] and [XML-Namespaces]. The concepts and definitions used herein regarding XML are framed at the abstract level of information items as defined in [XML-Infoset]. By definition, this use of the infoset provides a priori guarantees of well-formedness (as defined in [XML 1.0 (Second Edition)]) and namespace conformance (as defined in [XML-Namespaces]) for all candidates for ·assessment· and for all ·schema documents·.
Just as [XML 1.0 (Second Edition)] and [XML-Namespaces] can be described in terms of information items, XML Schemas can be described in terms of an abstract data model. In defining XML Schemas in terms of an abstract data model, this specification rigorously specifies the information which must be available to a conforming XML Schema processor. The abstract model for schemas is conceptual only, and does not mandate any particular implementation or representation of this information. To facilitate interoperation and sharing of schema information, a normative XML interchange format for schemas is provided.
[Definition:] Schema component is the generic term for the building blocks that comprise the abstract data model of the schema. [Definition:] An XML Schema is a set of ·schema components·. There are 13 kinds of component in all, falling into three groups. The primary components, which may (type definitions) or must (element and attribute declarations) have names are as follows:
The secondary components, which must have names, are as follows:
Finally, the "helper" components provide small parts of other components; they are not independent of their context:
During ·validation·, [Definition:] declaration components are associated by (qualified) name to information items being ·validated·.
On the other hand, [Definition:] definition components define internal schema components that can be used in other schema components.
[Definition:] Declarations and definitions may have and be identified by names, which are NCNames as defined by [XML-Namespaces].
[Definition:] Several kinds of component have a target namespace, which is either ·absent· or a namespace name, also as defined by [XML-Namespaces]. The ·target namespace· serves to identify the namespace within which the association between the component and its name exists. In the case of declarations, this in turn determines the namespace name of, for example, the element information items it may ·validate·.
NOTE: At the abstract level, there is no requirement that the components of a schema share a ·target namespace·. Any schema for use in ·assessment· of documents containing names from more than one namespace will of necessity include components with different ·target namespaces·. This contrasts with the situation at the level of the XML representation of components, in which each schema document contributes definitions and declarations to a single target namespace.
·Validation·, defined in detail in Schema Component Details (§3), is a relation between information items and schema components. For example, an attribute information item may ·validate· with respect to an attribute declaration, a list of element information items may ·validate· with respect to a content model, and so on. The following sections briefly introduce the kinds of components in the schema abstract data model, other major features of the abstract model, and how they contribute to ·validation·.
The abstract model provides two kinds of type definition component: simple and complex.
[Definition:] This specification uses the phrase type definition in cases where no distinction need be made between simple and complex types.
Type definitions form a hierarchy with a single root. The subsections below first describe characteristics of that hierarchy, then provide an introduction to simple and complex type definitions themselves.
[Definition:] Except for a distinguished ·ur-type definition·, every ·type definition· is, by construction, either a ·restriction· or an ·extension· of some other type definition. The graph of these relationships forms a tree known as the Type Definition Hierarchy.
[Definition:] A type definition whose declarations or facets are in a one-to-one relation with those of another specified type definition, with each in turn restricting the possibilities of the one it corresponds to, is said to be a restriction. The specific restrictions might include narrowed ranges or reduced alternatives. Members of a type, A, whose definition is a ·restriction· of the definition of another type, B, are always members of type B as well.
[Definition:] A complex type definition which allows element or attribute content in addition to that allowed by another specified type definition is said to be an extension.
[Definition:] A distinguished ur-type definition is present in each ·XML Schema·, serving as the root of the type definition hierarchy for that schema. The ur-type definition, whose name is anyType, has the unique characteristic that it can function as a complex or a simple type definition, according to context. Specifically, ·restrictions· of the ur-type definition can themselves be either simple or complex type definitions.
[Definition:] A type definition used as the basis for an ·extension· or ·restriction· is known as the base type definition of that definition.
A simple type definition is a set of constraints on strings and information about the values they encode, applicable to the ·normalized value· of an attribute information item or of an element information item with no element children. Informally, it applies to the values of attributes and the text-only content of elements.
Each simple type definition, whether built-in (that is, defined in [XML Schemas: Datatypes]) or user-defined, is a ·restriction· of some particular simple ·base type definition·. For the built-in primitive types, this is the simple version of the ·ur-type definition·, whose name is anySimpleType. This is in turn understood to be a restriction of the ·ur-type definition·. Simple types may also be defined whose members are lists of items themselves constrained by some other simple type definition, or whose membership is the union of the memberships of some other simple type definitions. List and union simple type definitions are also understood as restrictions of the simple ·ur-type definition·.
For detailed information on simple type definitions, see Simple Type Definitions (§3.14) and [XML Schemas: Datatypes]. The latter also defines an extensive inventory of pre-defined simple types.
A complex type definition is a set of attribute declarations and a content type, applicable to the [attributes] and [children] of an element information item respectively. The content type may require the [children] to contain neither element nor character information items (that is, to be empty), to be a string which belongs to a particular simple type or to contain a sequence of element information items which conforms to a particular model group, with or without character information items as well.
Each complex type definition is either
or
or
A complex type which extends another does so by having additional content model particles at the end of the other definition's content model, or by having additional attribute declarations, or both.
NOTE: This specification allows only appending, and not other kinds of extensions. This decision simplifies application processing required to cast instances from derived to base type. Future versions may allow more kinds of extension, requiring more complex transformations to effect casting.
For detailed information on complex type definitions, see Complex Type Definitions (§3.4).
There are three kinds of declaration component: element, attribute, and notation. Each is described in a section below. Also included is a discussion of element substitution groups, which is a feature provided in conjunction with element declarations.
An element declaration is an association of a name with a type definition, either simple or complex, an (optional) default value and a (possibly empty) set of identity-constraint definitions. The association is either global or scoped to a containing complex type definition. A top-level element declaration with name 'A' is broadly comparable to a pair of DTD declarations as follows, where the associated type definition fills in the ellipses:
<!ELEMENT A . . .> <!ATTLIST A . . .>
Element declarations contribute to ·validation· as part of model group ·validation·, when their defaults and type components are checked against an element information item with a matching name and namespace, and by triggering identity-constraint definition ·validation·.
For detailed information on element declarations, see Element Declarations (§3.3).
In XML 1.0, the name and content of an element must correspond exactly to the element type referenced in the corresponding content model.
[Definition:] Through the new mechanism of element substitution groups, XML Schemas provides a more powerful model supporting substitution of one named element for another. Any top-level element declaration can serve as the defining element, or head, for an element substitution group. Other top-level element declarations, regardless of target namespace, can be designated as members of the substitution group headed by this element. In a suitably enabled content model, a reference to the head ·validates· not just the head itself, but elements corresponding to any member of the substitution group as well.
All such members must have type definitions which are either the same as the head's type definition or restrictions or extensions of it. Therefore, although the names of elements can vary widely as new namespaces and members of the substitution group are defined, the content of member elements is strictly limited according to the type definition of the substitution group head.
Note that element substitution groups are not represented as separate components. They are specified in the property values for element declarations (see Element Declarations (§3.3)).
An attribute declaration is an association between a name and a simple type definition, together with occurrence information and (optionally) a default value. The association is either global, or local to its containing complex type definition. Attribute declarations contribute to ·validation· as part of complex type definition ·validation·, when their occurrence, defaults and type components are checked against an attribute information item with a matching name and namespace.
For detailed information on attribute declarations, see Attribute Declarations (§3.2).
A notation declaration is an association between a name and an identifier for a
notation. For an attribute information item to be ·valid· with respect to a
NOTATION simple type definition, its value must have been declared
with a notation declaration.
For detailed information on notation declarations, see Notation Declarations (§3.12).
The model group, particle, and wildcard components contribute to the portion of a complex type definition that controls an element information item's content.
A model group is a constraint in the form of a grammar fragment that applies to lists of element information items. It consists of a list of particles, i.e. element declarations, wildcards and model groups. There are three varieties of model group:
For detailed information on model groups, see Model Groups (§3.8).
A particle is a term in the grammar for element content, consisting of either an element declaration, a wildcard or a model group, together with occurrence constraints. Particles contribute to ·validation· as part of complex type definition ·validation·, when they allow anywhere from zero to many element information items or sequences thereof, depending on their contents and occurrence constraints.
[Definition:] A particle can be used in a complex type definition to constrain the ·validation· of the [children] of an element information item; such a particle is called a content model.
NOTE: XML Schema: Structures ·content models· are similar to but more expressive than [XML 1.0 (Second Edition)] content models; unlike [XML 1.0 (Second Edition)], XML Schema: Structures applies ·content models· to the ·validation· of both mixed and element-only content.
For detailed information on particles, see Particles (§3.9).
An attribute use plays a role similar to that of a particle, but for attribute declarations: an attribute declaration within a complex type definition is embedded within an attribute use, which specifies whether the declaration requires or merely allows its attribute, and whether it has a default or fixed value.
A wildcard is a special kind of particle which matches element and attribute information items dependent on their namespace name, independently of their local names.
For detailed information on wildcards, see Wildcards (§3.10).
An identity-constraint definition is an association between a name and one of several varieties of identity-constraint related to uniqueness and reference. All the varieties use [XPath] expressions to pick out sets of information items relative to particular target element information items which are unique, or a key, or a ·valid· reference, within a specified scope. An element information item is only ·valid· with respect to an element declaration with identity-constraint definitions if those definitions are all satisfied for all the descendants of that element information item which they pick out.
For detailed information on identity-constraint definitions, see Identity-constraint Definitions (§3.11).
There are two kinds of convenience definitions provided to enable the re-use of pieces of complex type definitions: model group definitions and attribute group definitions.
A model group definition is an association between a name and a model group, enabling re-use of the same model group in several complex type definitions.
For detailed information on model group definitions, see Model Group Definitions (§3.7).
An attribute group definition is an association between a name and a set of attribute declarations, enabling re-use of the same set in several complex type definitions.
For detailed information on attribute group definitions, see Attribute Group Definitions (§3.6).
An annotation is information for human and/or mechanical consumers. The interpretation of such information is not defined in this specification.
For detailed information on annotations, see Annotations (§3.13).
The [XML 1.0 (Second Edition)] specification describes two kinds of constraints on XML documents: well-formedness and validity constraints. Informally, the well-formedness constraints are those imposed by the definition of XML itself (such as the rules for the use of the < and > characters and the rules for proper nesting of elements), while validity constraints are the further constraints on document structure provided by a particular DTD.
The preceding section focused on ·validation·, that is the constraints on information items which schema components supply. In fact however this specification provides four different kinds of normative statements about schema components, their representations in XML and their contribution to the ·validation· of information items:
The last of these, schema information set
contributions, are not as new as they might at first seem. XML 1.0
validation augments the XML 1.0 information set in similar ways,
for example by
providing values for attributes not present in instances, and by implicitly
exploiting type information for normalization or access.
(As an example of the latter case, consider the
effect of NMTOKENS on attribute white space, and the semantics of
ID and IDREF.) By including schema
information set contributions, this specification makes explicit some features
that XML 1.0 left implicit.
This specification describes three levels of conformance for schema aware processors. The first is required of all processors. Support for the other two will depend on the application environments for which the processor is intended.
[Definition:] Minimally conforming processors must completely and correctly implement the ·Schema Component Constraints·, ·Validation Rules·, and ·Schema Information Set Contributions· contained in this specification.
[Definition:] ·Minimally conforming· processors which accept processors which accept schemas represented in the form of XML documents as described in Layer 2: Schema Documents, Namespaces and Composition (§4.2) are additionally said to provide conformance to the XML Representation of Schemas. Such processors must, when processing schema documents, completely and correctly implement all ·Schema Representation Constraints· in this specification, and must adhere exactly to the specifications in Schema Component Details (§3) for mapping the contents of such documents to ·schema components· for use in ·validation· and ·assessment·.
NOTE: By separating the conformance requirements relating to the concrete syntax of XML schema documents, this specification admits processors which use schemas stored in optimized binary representations, dynamically created schemas represented as programming language data structures, or implementations in which particular schemas are compiled into executable code such as C or Java. Such processors can be said to be ·minimally conforming· but not necessarily in ·conformance to the XML Representation of Schemas·.
[Definition:] Fully conforming processors are network-enabled processors which are not only both ·minimally conforming· and ·in conformance to the XML Representation of Schemas·, but which additionally must be capable of accessing schema documents from the World Wide Web according to Representation of Schemas on the World Wide Web (§2.7) and How schema definitions are located on the Web (§4.3.2). .
NOTE: Although this specification provides just these three standard levels of conformance, it is anticipated that other conventions can be established in the future. For example, the World Wide Web Consortium is considering conventions for packaging on the Web a variety of resources relating to individual documents and namespaces. Should such developments lead to new conventions for representing schemas, or for accessing them on the Web, new levels of conformance can be established and named at that time. There is no need to modify or republish this specification to define such additional levels of conformance.
See Schemas and Namespaces: Access and Composition (§4) for a more detailed explanation of the mechanisms supporting these levels of conformance.
As discussed in XML Schema Abstract Data Model (§2.2), most schema components (may) have ·names·. If all such names were assigned from the same "pool", then it would be impossible to have, for example, a simple type definition and an element declaration both with the name "title" in a given ·target namespace·.
Therefore [Definition:] this specification introduces the term symbol space to denote a collection of names, each of which is unique with respect to the others. A symbol space is similar to the non-normative concept of namespace partition introduced in [XML-Namespaces]. There is a single distinct symbol space within a given ·target namespace· for each kind of definition and declaration component identified in XML Schema Abstract Data Model (§2.2), except that within a target namespace, simple type definitions and complex type definitions share a symbol space. Within a given symbol space, names are unique, but the same name may appear in more than one symbol space without conflict. For example, the same name can appear in both a type definition and an element declaration, without conflict or necessary relation between the two.
Locally scoped attribute and element declarations are special with regard to symbol spaces. Every complex type definition defines its own local attribute and element declaration symbol spaces, where these symbol spaces are distinct from each other and from any of the other symbol spaces. So, for example, two complex type definitions having the same target namespace can contain a local attribute declaration for the unqualified name "priority", or contain a local element declaration for the name "address", without conflict or necessary relation between the two.
The XML representation of schema components uses a vocabulary
identified by the namespace name http://www.w3.org/2001/XMLSchema. For brevity, the text and examples in this specification use the prefix
xs: to stand for this namespace; in practice,
any prefix can be used.
XML Schema: Structures also defines several attributes for direct use in any XML documents. These attributes are in a different namespace,
which has the namespace name http://www.w3.org/2001/XMLSchema-instance.
For brevity, the text and examples in this specification use the prefix
xsi: to stand for this latter namespace; in practice,
any prefix can be used. All schema processors have appropriate attribute
declarations for these attributes built in, see Attribute Declaration for the 'type' attribute (§3.2.7),
Attribute Declaration for the 'nil' attribute (§3.2.7), Attribute Declaration for the 'schemaLocation' attribute (§3.2.7) and Attribute Declaration for the 'noNamespaceSchemaLocation' attribute (§3.2.7).
The Simple Type Definition (§2.2.1.2) or Complex Type Definition (§2.2.1.3) used in ·validation· of an element is usually
determined by reference to the appropriate schema components.
An element information item in an instance may, however,
explicitly assert its type using the attribute xsi:type.
The value of this attribute is a ·QName·; see QName Interpretation (§3.15.3) for
the means by which the ·QName· is
associated with a type definition.
XML Schema: Structures introduces a mechanism for signaling that an element should
be accepted as ·valid· when it has no
content despite a content type which does not require or even necessarily allow empty content. An
element may be
·valid· without content if it has the attribute xsi:nil with
the value true. An element so labeled must be empty, but can
carry attributes if permitted by the corresponding complex type.
The xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes can be used in a document to provide
hints as to the physical location of schema documents which may be used for ·assessment·.
See How schema definitions are located on the Web (§4.3.2) for details on the use of these attributes.
On the World Wide Web, schemas are conventionally represented as XML
documents (preferably of MIME type
application/xml or text/xml, but see clause 1.1 of Inclusion Constraints and Semantics (§4.2.1)), conforming to the specifications in Layer 2: Schema Documents, Namespaces and Composition (§4.2). For more information on
the representation and use of schema documents on the World Wide Web see Standards for representation of schemas and retrieval of schema documents on the Web (§4.3.1) and
How schema definitions are located on the Web (§4.3.2).
The following sections provide full details on the composition of all schema components, together with their XML representations and their contributions to ·assessment·. Each section is devoted to a single component, with separate subsections for
The sub-sections immediately below introduce conventions and terminology used throughout the component sections.
Components are defined in terms of their properties, and each property in turn is defined by giving its range, that is the values it may have. This can be understood as defining a schema as a labeled directed graph, where the root is a schema, every other vertex is a schema component or a literal (string, boolean, number) and every labeled edge is a property. The graph is not acyclic: multiple copies of components with the same name in the same ·symbol space· may not exist, so in some cases re-entrant chains of properties must exist. Equality of components for the purposes of this specification is always defined as equality of names (including target namespaces) within symbol spaces.
NOTE: A schema and its components as defined in this chapter are an idealization of the information a schema-aware processor requires: implementations are not constrained in how they provide it. In particular, no implications about literal embedding versus indirection follow from the use below of language such as "properties . . . having . . . components as values".
[Definition:] Throughout this specification, the term absent is used as a distinguished property value denoting absence.
Any property not identified as optional is required to be present; optional properties which are not present are taken to have ·absent· as their value. Any property identified as a having a set, subset or list value may have an empty value unless this is explicitly ruled out: this is not the same as ·absent·. Any property value identified as a superset or subset of some set may be equal to that set, unless a proper superset or subset is explicitly called for. By 'string' in Part 1 of this specification is meant a sequence of ISO 10646 characters identified as legal XML characters in [XML 1.0 (Second Edition)].
The principal purpose of XML Schema: Structures is to define a set of
schema components that constrain the contents of instances and augment the
information sets thereof. Although no external representation
of schemas is required for this purpose, such representations will
obviously be widely used. To provide for this in an appropriate and
interoperable way, this specification provides a normative XML representation for schemas which
makes provision for every kind of schema
component. [Definition:] A document in
this form (i.e. a <schema> element information item) is a schema document. For the schema document as a whole, and
its constituents, the sections below define correspondences between element
information items (with declarations in
Schema for Schemas (normative) (§A) and DTD for Schemas (non-normative) (§G)) and
schema components. All the element information items in the XML representation
of a schema must be in the XML Schema namespace, that is their [namespace name] must be http://www.w3.org/2001/XMLSchema. Although a common way of creating the XML Infosets which are or contain ·schema documents· will be using an XML parser, this is not required: any mechanism which constructs conformant infosets as defined in [XML-Infoset] is a possible starting point.
Two aspects of the XML representations of components presented in the following sections are constant across them all:
For each kind of schema component there is a corresponding normative XML representation. The sections below describe the correspondences between the properties of each kind of schema component on the one hand and the properties of information items in that XML representation on the other, together with constraints on that representation above and beyond those implicit in the Schema for Schemas (normative) (§A).
The language used is as if the correspondences were mappings from XML representation to schema component, but the mapping in the other direction, and therefore the correspondence in the abstract, can always be constructed therefrom.
In discussing the mapping from XML representations to schema components below, the value of a component property is often determined by the value of an attribute information item, one of the [attributes] of an element information item. Since schema documents are constrained by the Schema for Schemas (normative) (§A), there is always a simple type definition associated with any such attribute information item. [Definition:] The phrase actual value is used to refer to the member of the value space of the simple type definition associated with an attribute information item which corresponds to its ·normalized value·. This will often be a string, but may also be an integer, a boolean, a URI reference, etc. This term is also occasionally used with respect to element or attribute information items in a document being ·validated·.
Many properties are identified below as having other schema components or sets of components as values. For the purposes of exposition, the definitions in this section assume that (unless the property is explicitly identified as optional) all such values are in fact present. When schema components are constructed from XML representations involving reference by name to other components, this assumption may be violated if one or more references cannot be resolved. This specification addresses the matter of missing components in a uniform manner, described in Missing Sub-components (§5.3): no mention of handling missing components will be found in the individual component descriptions below.
Forward reference to named definitions and declarations is allowed, both within and between ·schema documents·. By the time the component corresponding to an XML representation which contains a forward reference is actually needed for ·validation· an appropriately-named component may have become available to discharge the reference: see Schemas and Namespaces: Access and Composition (§4) for details.
Throughout this specification, [Definition:] the initial value of some attribute information item is the value of the [normalized value] property of that item. Similarly, the initial value of an element information item is the string composed of, in order, the [character code] of each character information item in the [children] of that element information item.
The above definition means that comments and processing instructions, even in the midst of text, are ignored for all ·validation· purposes.
[Definition:] The normalized value of an element or attribute information item is an ·initial value· whose white space, if any, has been normalized according to the value of the whiteSpace facet of the simple type definition used in its ·validation·:
#x9 (tab), #xA (line feed) and
#xD (carriage return) are replaced with #x20 (space).
#x20s are collapsed to a single
#x20, and initial and/or final #x20s are deleted.
There are three alternative validation rules which may supply the necessary background for the above: Attribute Locally Valid (§3.2.4) (clause 3), Element Locally Valid (Type) (§3.3.4) (clause 3.1.3) or Element Locally Valid (Complex Type) (§3.4.4) (clause 2.2).
These three levels of normalization correspond to the processing mandated in XML 1.0 for element content, CDATA attribute content and tokenized attributed content, respectively. See Attribute Value Normalization in [XML 1.0 (Second Edition)] for the precedent for replace and collapse for attributes. Extending this processing to element content is necessary to ensure a consistent ·validation· semantics for simple types, regardless of whether they are applied to attributes or elements. Performing it twice in the case of attributes whose [normalized value] has already been subject to replacement or collapse on the basis of information in a DTD is necessary to ensure consistent treatment of attributes regardless of the extent to which DTD-based information has been made use of during infoset construction.
NOTE: Even when DTD-based information has been appealed to, and Attribute Value Normalization has taken place, the above definition of ·normalized value· may mean further normalization takes place, as for instance when character entity references in attribute values result in white space characters other than spaces in their ·initial value·s.
Attribute declarations provide for:
<xs:attribute name="age" type="xs:positiveInteger" use="required"/>
The attribute declaration schema component has the following properties:
The {name} property must match the local part of the names of attributes being ·validated·.
The value of the attribute must conform to the supplied {type definition}.
A non-·absent· value of the {target namespace} property provides for ·validation· of namespace-qualified attribute information items (which must be explicitly prefixed in the character-level form of XML documents). ·Absent· values of {target namespace} ·validate· unqualified (unprefixed) items.
A {scope} of global identifies attribute declarations available for use in complex type definitions throughout the schema. Locally scoped declarations are available for use only within the complex type definition identified by the {scope} property. This property is ·absent· in the case of declarations within attribute group definitions: their scope will be determined when they are used in the construction of complex type definitions.
{value constraint} reproduces the functions of XML 1.0 default and #FIXED
attribute values. default specifies that the attribute is to appear unconditionally in
the post-schema-validation infoset, with the supplied value used
whenever the attribute is not actually present; fixed indicates that the attribute value if present must equal the supplied
constraint value, and if absent receives the supplied value as for
default. Note that it is values that are supplied and/or
checked, not strings.
See Annotations (§3.13) for information on the role of the {annotation} property.
NOTE: A more complete and formal presentation of the semantics of {name}, {target namespace} and {value constraint} is provided in conjunction with other aspects of complex type ·validation· (see Element Locally Valid (Complex Type) (§3.4.4).)
[XML-Infoset] distinguishes attributes with names such as xmlns or xmlns:xsl from
ordinary attributes, identifying them as [namespace attributes]. Accordingly, it is unnecessary and in fact not possible for
schemas to contain attribute declarations corresponding to such
namespace declarations, see xmlns Not Allowed (§3.2.6). No means is provided in
this specification to supply a
default value for a namespace declaration.
The XML representation for an attribute declaration schema component is an <attribute> element information item. It specifies a simple type definition for an attribute either by reference or explicitly, and may provide default information. The correspondences between the properties of the information item and properties of the component are as follows:
attribute Element Information Item
<attribute
default = string
fixed = string
form = (qualified | unqualified)
id = ID
name = NCName
ref = QName
type = QName
use = (optional | prohibited | required) : optional
{any attributes with non-schema namespace . . .}>
Content: (annotation?, (simpleType?))
</attribute>
| Attribute Declaration Schema Component | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
ref [attribute] is absent, it corresponds to an
attribute use with properties as follows (unless use='prohibited', in which case the item
corresponds to nothing at all):| Attribute Use Schema Component | ||||||||
|---|---|---|---|---|---|---|---|---|
|
| Attribute Declaration Schema Component | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
ref [attribute] is present), it corresponds to an
attribute use with properties as follows (unless use='prohibited', in which case the item
corresponds to nothing at all):| Attribute Use Schema Component | ||||||||
|---|---|---|---|---|---|---|---|---|
|
Attribute declarations can appear at the top level of a schema document, or within complex
type definitions, either as complete (local) declarations, or by reference to top-level
declarations, or within attribute group definitions. For complete declarations, top-level or local, the type attribute is used when the declaration can use a
built-in or pre-declared simple type definition. Otherwise an
anonymous <simpleType> is provided inline.
The default when no simple type definition is referenced or provided is the simple ·ur-type definition·, which imposes no constraints at all.
Attribute information items ·validated· by a top-level declaration must be qualified with the
{target namespace} of that declaration (if this is ·absent·, the item must be unqualified). Control over whether attribute information items
·validated· by a local declaration must be similarly qualified or not
is provided by the form [attribute], whose default is provided
by the attributeFormDefault [attribute] on the enclosing <schema>, via its determination of {target namespace}.
The names for top-level attribute declarations are in their own ·symbol space·. The names of locally-scoped attribute declarations reside in symbol spaces local to the type definition which contains them.
default and fixed must not both be present.
ref or name must be present, but not both.
type and <simpleType>
must not both be present.
[Definition:] During ·validation·, associations between element and attribute information items among the [children] and [attributes] on the one hand, and element and attribute declarations on the other, are established as a side-effect. Such declarations are called the context-determined declarations. See clause 3.1 (in Element Locally Valid (Complex Type) (§3.4.4)) for attribute declarations, clause 2 (in Element Sequence Locally Valid (Particle) (§3.9.4)) for element declarations.
For an attribute information item's schema-validity to have been assessed all of the following must be true:
[Definition:] For attributes, there is no difference between assessment and strict assessment, so if the above holds, the attribute information item has been strictly assessed.
Either