Tech 802: Data, Databases & XML

Preview:

DESCRIPTION

Monday, January 14, 2012 presentation on 3 different data types (unstructured, structured and semi-structured) and how xml plays a role in content management systems, onix (bibliographic data sharing), RSS (real simple syndication) and xml-first publishing for ebooks.

Citation preview

Data,  Databases  &  XMLA  Crash  Course.    

Monique  Sherre8monique@boxcarmarke>ng.com

3  Types  of  DataUnstructured  Data• eg.  Word  documents,  PDFs,  audio/video  files,  emails,  • No  search• No  version  controlStructured  Data• eg.  Inventory  management  database,  wordpress• Searchable• Version  and  user  control  (secure  access)• Rela>onship  structures  (show  everything  tagged  “winter”)• Import  /  Export• Display  op>ons• Machine  readable;  run  queries  against  the  dataSemi-­‐Structured  Data• eg.  xml  (html,  onix,  rss)  • formal/standardized  data  

2

Structured  Data:  Wordpress• Open  Source  content  management  system  based  on  PHP  and  MySQL

– Open  Source:  source  code  is  freely  available,  which  encourages  development  by  many  independent  programmers.  

– CMS:  a  database  +  presenta>on  layer  (set  of  templates)– MySQL:  a  type  of  database

– PHP:  a  scrip>ng  language  designed  to  produce  dynamic  web  pages

• Plugin  architecture  (Akismet  for  spam,  SEO  by  Yoast,  WP  to  Twi8er,  etc.)

• Pages  &  Posts

• Categories  &  Tags

3

Pages  vs  PostsPage  (~unstructured)

• Sta>c  content,  won’t  change  frequently

• eg.  About  page

• Can  be  organized  manually  a  hierarchy.  Page  (parent)  and  subpages  (child)

– About  Us  >  Team;  About  Us  >  History

Post  (~structured)

• Frequently  updated  content  dynamically  organized  in  a  hierarchy  (chronological,  category),  plus  archive

– News  ar>cles,  Event  informa>on

– Frequently  published  in  an  RSS  feed  that  is  subscribed  to  by  users

4

Semi-­‐Structured  Data:  RSS• Real  Simple  Syndica>on  or  Rich  Site  Summary

• Publish  it.  Subscribe  to  it.  Pull  it  into  other  websites.  

• RSS  is  a  standardized  XML  file  format.

5

WordPress  As  Database• Instead  of  a  series  of  HTML  files,  WordPress  offers  a  system  that  allows  for  the  

organiza>on  and  efficient  storage  &  retrieval  of  informa>on.

– Structured  data  can  be  exported  into  semi-­‐structured  data  (RSS,  XML)

6

RSS  is  XML• eXtensible  Markup  Language  (XML)  is  a  markup  language  that  defines  a  set  of  rules  

for  encoding  documents  in  a  format  that  is  machine-­‐  and  human-­‐readable.

• RSS,  XHTML  (unzipped  EPUB)  and  ONIX  (ONline  Informa>on  eXchange—standard  for  sharing  bibliographic  data)  are  some  of  the  100s  of  XML-­‐based  languages  that  have  been  developed.

• How  might  we  use  XML  for  the  Tech  Project?  

7

8

Current db

New db

Export to XML

Rename / Modify XML

Import from XML

9

ONIX  is  XML• Interna>onal  standard  for  represen>ng  and  communica>ng  book  and  product  info  

in  electronic  form

– text-­‐readable  (human  &  computer)

– tagged/markup– transferred  by  email  or  rp  (file  transfer  protocol)

– More  info  Bisg.org

10

11

Publisher db

Bookseller db

Export to ONIX & FTP file to

Server

Grab file from Server & Import

from ONIX

Server

12

Publisher db

Bookseller db

Export to ONIX & FTP file to

Server

Grab file from Server & Import

from ONIX

Server

EDI:  Electronic  Data  Interchange• structured  (db  to  db)  transmission  of  data

• Oren  XML  tagged  format

13

Sour

ce

Ques>ons  on  XML?

• Data,  database  ques>ons?• Tech  project?

14

WEBCAST

A Roadmap to Efficiently ProducingMulti-Format/Multi-Screen eBooks

Lessons from Market Innovators

November 8, 2012

Speakers

§ Thad McIlroy– Electronic publishing analyst and author

The Future of Publishing

§ Stephen Driver – Vice President, Production Services

The Rowman & Littlefield Publishing Group

XML  Workflows  for  eBooks

17

XML Adoption by Sector

STM Educational Trade

XML Defined

XML is:n A device-independent, system-

independent method of storing and processing electronic text

n Markup for form and/or meaningn A data interchange format used by many

applications on the Web.

XML Provides Real Solutionsn But it is a big, ugly, unwieldy bearn And its conceptual metaphors bear little

resemblance for book publishersn It’s based on 25-year-old thinking about

technical documents and ecommercen Yet it’s the only real game in townn ONIX book metadata is enabled by XML

The Importance of XMLn XML enables content managementn Separates form from contentn Combines of style sheets with the power

of databases in an extensible languagen Its long-term killer feature is semantic

markup – marking up meaning, making text discoverable

n Future-proofing content

XML TaggingSemantic tagging requires human judgmentbut offers the benefit of meaning

<book price=“49.95" ISBN="string" publicationdate="2012-12-09"> <title>string</title> <author> <first-name>string</first-name> <last-name>string</last-name> </author> <genre>string</genre> </book>

24

Structured Taggingby Authors?

Typéfi sample approach

If you show this to editors... “They’re going to start drinking at their desks”

Templated DesignsHow much book content fitsinto automatic composition?

The Human FactorNew Internal Skills & Positions

n The production skill set changes substantially

n Much of the existing knowledge base changes or obsoletes

n The move from design & composition & production management to content & product architecting and engineering

n There is an enormous training challenge ahead

Key Takeaways

n XML is complex, but packed with valuen XML is not an all-or-nothing deal

n Your should start with small stepsn XML’s complexity demands outside help

n Services, consultants, trainers, associationsn The rapid proliferation of output formats

can only be mastered with a structured approach like XML

Obstacles  to  using  XML

• XML  is  in>mida>ng,  full  of  jargon

• We’re  editors,  not  programmers

• And  what  about  the  authors?

• You  mean  I  can’t  move  that  line  of  text  half  a  pica?!  And  other  design  concerns

• Editorial,  or  “my  book’s  too  good  for  a  template”

So  how’d  we  solve  it?

• We  manipulated  XML  to  our  uses,  not  the  other  way  around

• We  s>ll  used  authors’  Word  documents  as  the  source

• Template  interiors  were  something  we  had  already  been  doing  for  years

• XML  coding  was  translated  into  a  coding  structure  virtually  all  produc>on  people  know:    typeseung  short  tags

• We  adapted  exis>ng  XML  approaches  to  our  specific  needs  by  discarding  coding  that  didn’t  fit  our  content

But  weren’t  there  problems?

A  Mul>-­‐Channel  Workflow  Example

1.  Word  document  received  from  author

2.  Word  file  coded  for  XML  conversion            (resembles  standard  typeseung  short  tags)

         3.    Typeseung  short  tags  replaced  with  XML  via                    conversion  process  (some  file  edi>ng  required.)

 4.  Final  PDF  generated            arer  style  template          applied  to  XML  file.

         EPUB,  .mobi  and            WebPDF  generated.

Insider  Tips

• Know  your  staffWho  can  adjust  and  how  will  you  address  those  who  can’t?

• Know  your  contentUsing  the  right  tool  for  the  job  is  cri>cal,  not  all  content  is  suitable  for  XML  composi>on

• Be  realisCc  about  the  learning  curveIf  you’re  s>ll  paper  edi>ng,  making  the  leap  straight  to  XML  may  be  too  great,  so  start  small

• Be  flexibleYou’ll  likely  revisit  several  core  values  of  your  publishing  program,  iden>fy  the  most  important  things  and  be  honest  about  the  less  important  ones

Insider  Tips,  cont.

• XML  need  not  be  an  off-­‐the-­‐shelf  productYou  can  and  should  work  to  customize  it  to  your  own  produc>on  needs

• See  it  throughIt’s  taken  us  two  years  to  arrive  at  a  point  where  we’re  comfortable,  and  we’re  s>ll  making  changes

• Partner  with  the  right  vendorsFind  someone  willing  and  capable  of  adap>ng  to  your  publishing  needs

• When  you  need  a  hammer,  use  a  hammerRemember  XML  is  just  another  tool,  it  shouldn’t  be  your  only  tool.  

Ques>ons?

38

What’s  NextTech  Course  802

1. Chris>ne  on  Tues  15th:  coming  in  to  talk  templates  and  wordpress

2. Next  Tues  22nd:  Chloe  and  Stacey  coming  in  to  talk  about  ebooks,  and  xml3. Following  Mon  28  and  Tues  29:  Brenda  J  Walker  and  Haig  Armen  on  apps

Tech  Project  6071. This  Wed  16th:  Content  to  present  assignment  to  Design  &  Tech  so  we  can  all  be  on  

the  same  page  and  on  Thurs  carry  on  with  wireframes/design  mockups  (Design),  plaworm  set  up  (Tech)  and  discoverability/ed  calendar  (Content)

2. Following  Wed  23rd:  Present  to  Alan  and  David  designs  and  ideas  so  far.  

Recommended