Processing XML DTDs in Haskell

The following packages, available on Hackage, relate to XML Document Type Declarations (DTD) as defined in W3C specifications:

dtd-types

This package provides types to represent a DTD. It is intended to be compatible with and extend the set of types in Data.XML.Types provided by the xml-types package.

Following the philosophy of Data.XML.Types, the types in this module are not intended to be a strict and complete representation of the model in the W3C specifications; rather, they are intended to be convenient and type-safe for the kinds of processing of DTDs that are commonly done in practice. As such, this model is compatible with both Version 1.0 and Version 1.1 of the XML specification.

Therefore, these types are not suitable for type-level validation of the syntax of a DTD. For example: these types are more lenient than the specs about the characters that are allowed in various locations in a DTD; entities of various kinds only appear as distinct syntactic elements in places where they are commonly needed when processing DTDs; etc.

Conditional sections are not represented in these types. They should be handled directly by parsers and renderers, if needed.

Hackage: http://hackage.haskell.org/package/dtd-types
Darcs repo:  http://code.haskell.org/dtd/dtd-types

dtd-text

This package provides a parser and renderer for XML DTDs. It implements most of the parts of the W3C XML specification relating to DTDs.

Parser

The parser is based on attoparsec-text. Parsers are provided for all of the semantically important components of a DTD, and for the entire DTD itself. In addition, several functions are provided that parse the DTD while performing resolution of parameter entities.

The results of the parsers are Haskell DTD objects from the dtd-types package.

Synopsis:
          -- Parse a DTD without parameter entity resolution.
          -- See the attoparsec-text package for information
          -- about how to use this parser.
          dtd :: Parse DTD

          -- Parse a DTD from a Data.Text.Lazy while resolving
          -- references to internal parameter entities. If you
          -- are not sure which interface to use, use this one.
          dtdParse :: L.Text -> DTD

          -- Parse a DTD from a Data.Text.Lazy while resolving
          -- references to internal and external parameter entities.
          dtdParseWithExtern :: SymTable -> L.Text -> DTD

          -- where type SymTable = M.Map Text L.Text
            

This parser does not attempt to go out and fetch the values of external references for you from files and URLs. You need to fetch them yourself and provide them to the parser in a SymTable.

If you need to extract information, such as system IDs and public IDs, from the DTD before you fetch external values, you might be able to get them by applying parseDTD to all or part of the DTD as an initial parse. The parser tries very hard to give partial results when things are missing, while still doing its best to avoid problems like looping references.

Renderer

The renderer is based on the blaze-builder package. Renderers are provided for all of the semantically important components of a DTD, and for the entire DTD itself.

The builders take as input Haskell DTD objects from the dtd-types package.

See the blaze-builder package for information about how to use these builders.

Note: The current version of dtd-text does not support conditional sections.

Hackage: http://hackage.haskell.org/package/dtd-text
Darcs repo:  http://code.haskell.org/dtd/dtd-text