XSD Complexity Analysis

XSD Complexity Analysis

Metrics describing the complexity of an XML Schema (XSD) design are helpful in determining aspects that drive the cost of Information Technology (IT) projects relying on XSDs to describe document structure and content.

What constitutes a "complexity" in XSD depends on many things that are often outside of the schema; as an example, binding technologies may impose additional challenges and limitations that are typically thought to be an XSD "complexity", rather than a binding mishap.

Consistent use of XSD patterns may appear to increase complexity due to repeated, scaffolding content. In many cases these patterns are to be treated nominally; in other words, it is not the complexity of the XSD syntax that people want to measure, but that of the model expressed by the XSD language.

Query XSD (QXSD) provides all the necessary building blocks to quantify complexity metrics. Below is an example that demonstrates how to "de-normalize" the type hierarchies (as used to create and populate the AllDescendents table in our own QXSD.

SET CASESENSITIVE OFF

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE INFORMATION_SCHEMA.TABLES.TABLE_NAME = 'TypeHierarchy' AND INFORMATION_SCHEMA.TABLES.TABLE_SCHEMA = 'DESIGN')
DROP TABLE DESIGN.TypeHierarchy
CREATE TABLE DESIGN.TypeHierarchy (
 RowId NVARCHAR,
 LocalName NVARCHAR,
 [Namespace] NVARCHAR,
 BaseTypeRowId NVARCHAR,
 BaseTypeName NVARCHAR,
 BaseTypeNamespace NVARCHAR
)
INSERT into DESIGN.TypeHierarchy
SELECT XSSimpleType.RowId, XSSimpleType.LocalName, XSSimpleType.Namespace, XSSimpleTypeRestriction.BaseTypeRowId, 
XSSimpleTypeRestriction.BaseTypeName, XSSimpleTypeRestriction.BaseTypeNamespace
from XSSimpleType INNER JOIN XSSimpleTypeRestriction
on XSSimpleType.ContentRowId = XSSimpleTypeRestriction.RowId
UNION ALL
SELECT XSSimpleType.RowId, XSSimpleType.LocalName, XSSimpleType.Namespace, XSSimpleTypeList.ItemTypeRowId,
XSSimpleTypeList.ItemTypeName, XSSimpleTypeList.ItemTypeNamespace
from XSSimpleType INNER JOIN XSSimpleTypeList
on XSSimpleType.ContentRowId = XSSimpleTypeList.RowId
UNION ALL
SELECT XSSimpleType.RowId, XSSimpleType.LocalName, XSSimpleType.Namespace, XSSimpleTypeUnionMembers.XSSimpleTypeRowId,
XSType.LocalName, XSType.Namespace
from XSSimpleType INNER JOIN XSSimpleTypeUnionMembers
left JOIN XSType on XSSimpleTypeUnionMembers.XSSimpleTypeRowId = XSType.RowId
on XSSimpleType.ContentRowId = XSSimpleTypeUnionMembers.XSSimpleTypeUnionRowId
UNION ALL
SELECT XSSimpleType.RowId, XSSimpleType.LocalName, XSSimpleType.Namespace, NULL,
NULL, NULL
from XSSimpleType 
WHERE XSSimpleType.ContentRowId is NULL
UNION ALL
SELECT XSComplexType.RowId, XSComplexType.LocalName, XSComplexType.Namespace, XSComplexType.BaseXmlSchemaType,
XSType.LocalName, XSType.Namespace
FROM XSComplexType LEFT JOIN XSType on XSComplexType.BaseXmlSchemaType = XSType.RowId

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE INFORMATION_SCHEMA.TABLES.TABLE_NAME = 'Temp' )
DROP TABLE Temp
WITH DirectDescendents (BaseTypeRowId, RowId, BaseTypeName, BaseTypeNamespace, LocalName, [Namespace], Level)
AS
(
-- Anchor member definition
    SELECT Data.BaseTypeRowId, Data.RowId, Data.BaseTypeName, Data.BaseTypeNamespace, Data.LocalName, Data.[Namespace], 0 AS Level
    from DESIGN.TypeHierarchy Data 
    where Data.BaseTypeRowId is NULL
    UNION ALL    
    SELECT Data.BaseTypeRowId, Data.RowId, Data.BaseTypeName, Data.BaseTypeNamespace, Data.LocalName, Data.[Namespace], Level + 1
    from DESIGN.TypeHierarchy Data
    INNER JOIN DirectDescendents AS d ON Data.BaseTypeRowId = d.RowId
)
SELECT * 
INTO Temp
FROM DirectDescendents
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE INFORMATION_SCHEMA.TABLES.TABLE_NAME = 'Temp1' )
DROP TABLE Temp1
CREATE TABLE Temp1 (
    BaseTypeRowId NVARCHAR, 
    RowId NVARCHAR, 
    BaseTypeName NVARCHAR, 
    BaseTypeNamespace NVARCHAR, 
    LocalName NVARCHAR, 
    [Namespace] NVARCHAR, 
    Level1 int)
INSERT INTO Temp1
SELECT Temp.BaseTypeRowId, Temp.RowId, Temp.BaseTypeName, Temp.BaseTypeNamespace, Temp.LocalName, Temp.[Namespace], 0
FROM Temp
                
DECLARE @level int;
DECLARE @maxLevel int;
SET @level = 1
SET @maxLevel = (SELECT MAX(Level) FROM Temp)
WHILE (@level <= @maxLevel)
BEGIN
    INSERT INTO Temp1
    SELECT Acc.BaseTypeRowId, Data.RowId, Acc.BaseTypeName, Acc.BaseTypeNamespace, Data.LocalName, Data.[Namespace], @level
    from Temp as Data inner JOIN Temp1 as Acc ON Data.BaseTypeRowId = Acc.RowId
    WHERE Data.Level = @level
    
    SET @level = @level + 1
END
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE INFORMATION_SCHEMA.TABLES.TABLE_NAME = 'AllDescendents')
DROP TABLE AllDescendents
CREATE TABLE AllDescendents (
    BaseTypeRowId NVARCHAR, 
    RowId NVARCHAR, 
    BaseTypeName NVARCHAR, 
    BaseTypeNamespace NVARCHAR, 
    LocalName NVARCHAR, 
    [Namespace] NVARCHAR, 
    Level1 int)
INSERT INTO AllDescendents
SELECT DISTINCT * 
FROM Temp1
            
DROP TABLE DESIGN.TypeHierarchy
DROP TABLE Temp
DROP TABLE Temp1

As an example, AllDescendents may be used to determine the depth distribution of the type hierarchy described by XSD.

SELECT Level1, COUNT(*) FROM AllDescendents
GROUP BY  Level1

Combining complexity metrics with XSD Design Quality Check will result in a robust model checking solution.