View
1.228
Download
2
Category
Preview:
DESCRIPTION
How to build Perl classes with roundtrip data binding to XML, painlessly, using W3C XML Schema and XML::PastorSlides from a previous revision of this talk are online at:http://www.slideshare.net/joelbernstein/painless-oo-xml-with-xmlpastorq-presentation/I will be presenting an expanded, more practical, 2009 version of this talk. Now with more code and less theory!- XML is hard, right? Some things which are hard.- XML data binding- Comparisons of modules- XML::Twig- XML::Smart- XML::Simple- XML::Pastor- Pastor howto- XML schema inference- Trang, Relaxer- Relaxer howto- The future?For more information on XML::Pastor see:http://search.cpan.org/~aulusoy/XML-Pastor/Relaxer download:http://www.relaxer.jp/download/relaxer-1.0.zipRelaxer book (Japanese...):http://www.amazon.co.jp/exec/obidos/ASIN/4894715279/Trang:http://www.thaiopensource.com/download/trang-20030619.zip
Citation preview
Painless OO <-> XMLwith XML::Pastor
(2009 remix)
Joel BernsteinYAPC::EU 2009
It’s all Greek to me
schemaσχήµα (skhēma)shape, plan
How many of you?
How many of you?
• Use XML
How many of you?
• Use XML
• Hate XML
How many of you?
• Use XML
• Hate XML
• Like XML
A Confession
• I do not like XML
• People use it wrong
XML Data Binding
• Binding XML documents to objects specifically designed for the data in those documents.
• I often have to do this.
XML is hard, right?Some hard things:
• Roundtripping data
• Manipulating XML via DOM API
• Preserving element sibling order, comments, XML entities etc.
Typical horrendous XML document
Sales Order XML Logical data model
XML DOM
I shouldn’t need to care about this
How this makes me feel:
Fundamental problem
• I don’t think in elements and attributes
• I think about my data, not how it’s stored
• This is Perl. DWIM.
SolutionTools should make both the syntax and the details of
the manipulation of XML invisible
Do you write XML
Do you write XML
• By hand?
Do you write XML
• By hand?
• Programmatically?
Do you write XML
• By hand?
• Programmatically?
• Schemata?
Do you write XML
• By hand?
• Programmatically?
• Schemata?
• Validation?
Do you write XML
• By hand?
• Programmatically?
• Schemata?
• Validation?
• Transformation?
XML::Pastor is forall of you.
XML::Pastor
• Available on CPAN
• Abstracts away some of the pain of XML
• Ayhan Ulusoy is the author
• I am just a user
What does it do?
• Generates Perl code from W3C XML Schema (XSD)
• Roundtrip and validate XML to/from Perl without loss of schema information
• Lets you program without caring about XML structure
pastorize
• Automates codegen process
• Conceptually similar to DBIC::Schema::Loader
• TMTOWTDI - offline or runtime
• Works on multiple XSDs (caveat, collisions)
pastorize in usepastorize --mode offline --style multiple \
--destination /tmp/lib/perl \--class_prefix MyApp::Data \/some/path/to/schema.xsd
Very simple contrived Album XML demo
Album XML document
Album XML schema
Pastorize the Album XML schema:
Resulting code tree like:
Modify some XML
Roundtrip and modify XML data using Pastor:
# Load XML# Accessors
# Modify
# Write XML
The result!
Real world Pastor
Real world Pastor
$HASH1 = { 1 => 'Vodafone UK', 2 => 'O2 UK', 3 => 'Orange UK', 4 => 'T-Mobile UK', 8 => 'Hutchinson 3 UK'};
Country XML
Dynamic schema parsing of Country XML
Query the Country object
Modify elements and attributes with uniform syntax
Manipulate array-like data
Create new City data and combine with existing Country object
Validate modified data against the stored schema
Turn Pastor objects back into XML, or transform to XML::LibXML DOM
Parsing with Pastor
• Parse entire XML into XML::LibXML::DOM object
• Convert XML DOM tree into native Perl objects
• Throw away DOM, no longer needed
Reasons to not use XML::Pastor
• When you have no XML Schema
• Although several tools can infer XML schemata from documents
• It’s a code-generator
• No stream parsing
XML::Pastor Scope
• Good for “data XML”
• Unsuitable for “mixed markup”
• e.g. XHTML
• Unsuitable for “huge” documents
XML::Pastorknown limitations
• Mixed elements unsupported
• Substitution groups unsupported
• ‘any’ and ‘anyAttribute’ elements unsupported
• Encodings (only UTF-8 officially supported)
• Default values for attributes - help needed
Other XML modules• XML::Twig
• XML::Compile
• XML::Simple
• XML::Smart
XML::Twig
• Manipulates XML directly
• Using code is coupled closely to document structure
• Optimised for processing huge documents as trees
• No schemata, no validation
XML::Compile
• Original design rationale is to deal with SOAP envelopes and WSDL documents
• Different approach but similar goals to Pastor - processes XML based on XSD into Perl data structures
• More like XML::Simple with Schema support
XML::Compile pt. 2
• Schema support incomplete
• Shaky support for imports, includes
• Include restriction on targetNamespace
• I haven’t used it yet but it looks good
XML::Simple
• Working roundtrip binding for simple cases
• e.g. XMLout(XMLin($file)) works
• Simple API
• Produces single deep data structure
• Gotchas with element multiplicity
XML::Simple pt. 2
• No schemata, no validation
• Can be teamed with a SAX parser
• More suitable for configuration files?
XML::Smart
• Similar implementation to XML::Pastor
• Uses tie() and lots of crac^H^H^H^Hmagic
• Gathers structure information from XML instance, rather than schema
• No code generation!
XML::Smart pt. 2
• No schemata, so no schema validation
• Based on Object::MultiType - overloaded objects as HASH, ARRAY, SCALAR, CODE & GLOB
• Like Pastor, overloads array/hashref access to the data - promotes decoupling
• Reasonable docs, some community growing
Any questions?
Thanks for comingSee you next year
http://search.cpan.org/dist/XML-Pastor/
Bonus MaterialIf we have enough time
XML::Pastor Supported XML Schema Features• Simple and Complex Types• Global Elements• Groups, Attributes, AttributeGroups• Derive simpleTypes by extension• Derive complexTypes by restriction• W3C built-in Types, Unions, Lists• (Most) Restriction Facets for Simple types• External Schema import, include, redefine
XML Schema Inference
• Create an XML schema from an XML document instance
• Every document has an (implicit) schema
• Tools like Relaxer, Trang, as well as the System.Xml.Serializer the .NET Framework can all infer XML Schemata from document instances
Simple D::HA object
Rekeying data
Rekeying data deeper
Warning, boring bit
XML::Pastor Code Generation
• Write out static code to tree of .pm files
• Write out static code to single .pm file
• Create code in a scalar in memory
• Create code and eval() it for use
How Pastor worksCode generation
• Parse schemata into schema model
• Perl data structures containing all the global elements, types, attributes, ...
• “Resolve” Model - determine class names, resolve references, etc
• Create boilerplate code, write out / eval
How Pastor worksGenerated classes
• Each generated class (i.e. type) has classdata “XmlSchemaType” containing schema model
• If the class isa SimpleType it may contain restriction facets
• If the class isa ComplexType it will contain info about child elements and attributes
How Pastor worksIn use
• If classes generated offline, then “use” them, if online then they are already loaded
• These classes have methods to create, retrieve, save object to/from XML
• Manipulate/query data using OO API to complexType fields
• Validate modified objects against schema
Thanks for comingSee you next year
http://search.cpan.org/dist/XML-Pastor/
Recommended