Gambas/Manipular documentos XML

Segun la W3C: "XML es un simple y muy flexible formato de texto derivado de SGML (ISO 8879). Originalmente diseñado para cumplir los retos de gran escala de publicación electrónica. XML también juega un rol de creciente importancia en el intercambio de una gran variedad de datos en internet y otros lugares" Y no estan exagerando. Por ejemplo, el formato XHTML usado en internet es una aplicación de XML, y puedes encontrar aplicaciones de XML en casi cualquier intercambio de datos. En estos días, XML es una parte esencial del repertorio de un programador completo. Gambas provee casi todo lo que necesitas para trabajar con XML.

consejo
consejo
Es importante aclarar que XHTML es un documento tipo XML pero no son lo mismo. XHTML esta limitado a paginas de internet como su formato estándar. Sin embargo, XML es un formato de propósito general y puede ser usado para trabajar con cualquier tipo de datos

Basic structure of an XML document

editar

All XML documents begin with

<?xml version=”1.0” encoding=”UTF-16”?>

indicating that the rest of the document contains XML data, specifies which version is being used and Unicode characters in it. In version 1.0 you can omit the XML declaration, but version 1.1 is mandatory. The XML declaration is followed by an element 'root' that can contain any number of sub-elements between their start and end tag. Elements can contain attributes, and attribute names can only appear only once in an element. Elements must be properly nested, should not overlap, and therefore must be closed in the opposite order as they were open. Comments begin with a <! - And End up with ->. With this in mind, take a look at the following'well-formed XML document.

<?xml version=”1.1” encoding=”UTF-16” ?>
< !-- The line above is the XML declaration -- >
< !-- This line and the preceding are comments -- >
<root>
  <element_name attribute_name = "attribute_value">
    Element content
  </element_name>
</root>

As you can see, the document is separated into different lines and there is some indentation to indicate the various levels of nesting, however this is not necessary, any document can go in a single line of text. This formatted this way to help you understand the document. You will find very useful this way to format XML documents when you need to read or edit an XML document into a plain text editor.

Writing XML

editar

You’ll write a program that must create the following XML document based on one of the classes you were working on back in the Objects lesson.

<?xml version=”1.0” encoding=”UTF-16” ?>
<characters serie=”Heroes”>
  <heroe id=”1” name=”Claire Bennet”>
    <name>Claire Bennet</name>
    <played_by>Hayden Panettiere</played_by>
    <ability>
      Rapid cellular regeneration
    </ability>
  </heroe>
  <heroe id=”2” name=”Hiro Nakamura”>
    <name>Hiro Nakamura</name>
    <played_by>Masi Oka</played_by>
    <ability>
      Space-time manipulation: teleportation & time travel
    </ability>
  </heroe>
  <villain id=”1” name=” Gabriel Sylar”>
    <name>Gabriel Sylar</name>
    <played_by>Zachary Quinto</played_by>
    <ability>
      Understand how things work and multiple other abilities acquired
    </ability>
  </villain>
</characters>

Start a new command-line application named WriteXML, just make sure to check the XML/XSLT programming option on the New Project – Project Type window. By doing this, you let Gambas know the gb.xml and gb.xml.xslt need to be included as components of the project. Or if you forgot to select the XML/XSLT option you can include it by selecting the gb.xml component on Project\Properties…\Components. Then you’ll create on the MMain function a XmlWriter object that will be opened to write and saved on your home directory as Heroes.xml and will be generated with the helpful indentation we discussed earlier.

Dim writer as XmlWriter
writer = NEW XmlWriter
writer.Open(User.Home & “/Heroes.xml”, TRUE)
writer.Comment(“Heroes Characters”)
 ‘Following code goes here
writer.EndDocument()

The file opened with Open is not actually written to the user home folder until the EndDocument method is called. This method, also add any end-tag missing in order to insure compliance with the XML guidelines. The root element is character, and to indicate Gambas this you’ll write the following code inside the Open and EndDocument lines:

writer.StartElement(“characters”)
  writer.Attribute(“serie”, “Heroes”)
  ‘Elements code will replace this comment line
writer.EndElement ‘characters

The Oracle used to say “everything that has a beginning has an end” and here for XML is absolutely true. That is why you wrote the EndElement method right after its corresponding StartElement. It is a good practice to do this every time you write an open statement that needs to be closed lately, especially to avoid debug headaches. Inside the StartElement and EndElement methods, you’ll write the first heroe element.

writer.StartElement(“heroe”)
  writer.Attribute(“id”, “1”)
  writer.Attribute(“name”, “Claire Bennet”)
    writer.StartElement(“name”)
      writer.Text(“Claire Bennet”)
    writer.EndElement ‘name
    writer.StartElement(“played_by”)
      writer.Text(“Hayden Panettiere”)
    writer.EndElement ‘played_by
    writer.StartElement(“ability”)
      writer.Text(“Rapid cellular regeneration”)
    writer.EndElement ‘ability
writer.EndElement ‘heroe 1

Right after you’ll write the code for the second heroe element with some shortcuts that you’ll love.

writer.StartElement(“heroe”, [“id”, “2”, “name”, “Hiro Nakamura”])
    writer.Element(“name”, “Hiro Nakamura”)
    writer.Element(“played_by”, “Masi Oka”)
    writer.Element(“ability”, “Space-time manipulation: teleportation & time travel”)
writer.EndElement ‘heroe 2

Run the application and open the Heroes.xml file with a plain text editor or with the web browser to see the resulting XML document.

Leer XML

editar

The easiest way to read a XML document is using the XmlReader class that allow to navigate thru the document by moving forward a pull parser over each node; and allow know the name, type and value of each node. The way it works is inside a loop and repeatedly call the Read() method to retrieve the parsing events; then the parser descent recursively to reflect the structure of the XML document being parsed. Now, you’ll create an application to read the XML file you just created and with the info contained on it populate a Tree View control. You’ll start a new graphical application project called ReadXML and create the following controls with their settings. Don’t forget to include the gb.xml component. Widget Property Setting TreeView1


Button1 Text “Populate Tree” When the user clicks on the Populate Tree button, the content on the Heroes.xml file will fill TreeView1. In order to accomplish this, you’ll add some following code to the Click() event of Button1. First, you need to create a new instance of a XmlReader object, then you’ll try to open the .xml file, if the file can’t be opened a error message will be displayed.

 DIM reader AS XmlReader
 reader = NEW XmlReader
 TRY reader.Open(User.home & "/Heroes.xml")
 IF ERROR THEN 
   Message.Error("Error when trying to open the Heroes.XML file!")
   RETURN 
 ENDIF

Please consider that you can proactively check if the file exists prior to attempt to open it by using the Exist function. Then you need to declare the loop where the pull parser will work. For each loop iteration you’ll call the method Read() that locate the parser on the next node on the XML file. During this process an error can occur, because of this you need to handle any potential error. Also, before to perform any other task we need to verify if the parser reached the end of file has reached in order to exit the loop; and just before exit the procedure you need to close the XML document.

 DO WHILE TRUE
   TRY reader.Read()
   IF reader.Eof THEN BREAK

‘The rest of the code goes here

 LOOP 
 reader.Close()

You can anticipate potential errors if you perform some validation on the .xml file just to be sure that the file you are attempting to open actually contains the type of data you expect. If you remember, we discuss some paragraphs ago that the pull parser descent recursively through the XML structure. In order to do this, the application needs to make some decisions based on the name of the Element. If the element “characters” contains an attribute called “series” you’ll add the first item to the Tree view.

   SELECT CASE reader.Node.Name
     CASE "characters"
       FOR EACH reader.Node.Attributes
         IF reader.Node.Name = "series" THEN 
           TRY TreeView1.Add(reader.Node.Value, "Characters of the TV Series: " & reader.Node.Value, NULL, NULL)
           PRINT "Characters of the TV Series: " & reader.Node.Value
         ENDIF 
       NEXT
     ‘Here goes the next block of code
   END SELECT

In order to fill properly the Tree view widget, you’re going to need to declare some variables that are going to help you to keep track of the attributes id and name of each element heroe.

 DIM iNode AS String
 DIM iName AS String

If an element “heroe” or “villain” is found you need to insert the parent item on TreeView1; store the values of the parent item to use them later to insert the sub-elements to the Tree view and move to the next node.

     CASE "heroe", "villain"
       FOR EACH reader.Node.Attributes
         IF reader.Node.Name = "id" THEN iNode = reader.Node.Value
         IF reader.Node.Name = "name" THEN iName = reader.Node.Value
       NEXT 
       IF iNode <> "" AND iName <> "" THEN 
         TRY TreeView1.Add(iNode, iNode & " - " & iName)
         PRINT iNode & " - " & iName
       ENDIF
       TRY reader.Read()
       IF ERROR THEN RETURN

Then the parser must continue looping inside the next nested elements, in this case the elements name, played_by and ability, and add them under the parent item on TreeView1

       DO WHILE TRUE
         IF reader.Node.Type = XmlReaderNodeType.Element THEN 
           SELECT CASE reader.Node.Name
             CASE "name"
               TRY reader.Read()
               TRY TreeView1.Add(iNode & "-n", "Name: " & reader.Node.Value, NULL, iNode)
               PRINT "    Name: " & reader.Node.Value
             CASE "played_by"
               TRY reader.Read()
               TRY TreeView1.Add(iNode & "-p", "Played by: " & reader.Node.Value, NULL, iNode)
               PRINT "    Played by: " & reader.Node.Value
             CASE "ability"
               TRY reader.Read()
               TRY TreeView1.Add(iNode & "-a", "Ability: " & reader.Node.Value, NULL, iNode)
               PRINT "    Ability: " & reader.Node.Value
           END SELECT

After this, the parser must be moved to the next node, and if the end of the element has been reached move to the next node.

           TRY reader.Read()
           IF ERROR THEN BREAK 
         ELSE 
           IF reader.Node.Type = XmlReaderNodeType.EndElement THEN BREAK 
         ENDIF 
         TRY reader.Read()
         IF ERROR THEN BREAK 
       LOOP

ReadXML must look like this.

Archivo:ReadXML.png

If the previous example still kind of confusing probably you must try to understand first how the Read() parser moves over the XML structure. Add a new button, and include the following code and compare the output on the console against the content of the XML file.

 DIM reader AS XmlReader
 reader = NEW XmlReader
 TRY reader.Open(User.home & "/Heroes.xml")
 IF ERROR THEN 
   Message.Error("Error when trying to open the Heroes.XML file!")
   RETURN 
 ENDIF
 DO WHILE TRUE
   TRY reader.Read()
   IF reader.Eof THEN BREAK

PRINT “Node: Type=” & reader.Node.Type & “, Name=” & reader.Node.Name & “, Value=” & reader.Node.Value

 LOOP 
 reader.Close()

Usar XSLT

editar

XSLT is a language for transforming XML documents into other XML documents. For instance, you can convert any XML into a HTML document or more complex task as create a PDF file. The XSLT template contains all the required instructions to transform a given XML document into other XML format. You’ll create the following XSLT template to transform Heroes.XML into HTML format. Using a plain text editor write the following XSLT file and save it on the users home folder as xml2xhtml.XSLT.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns="http://www.w3.org/1999/xhtml">
<xsl:template match="/">
<html>
  <head>
    <title>Example of XSL Transformation</title>
    <style type="text/css">
      table {padding: 2pt; widht:100%;}
      th {font-weight: bold; background-color: #cccc99; color: #000000}
      tr.odd {background-color: #ffffff; text-align: center}
      tr.even {background-color: #f5f5dc; text-align: center}
    </style>
  </head>
  <body>
    < h1 >Characters of the series: Heroes< /h1 >
    < table >
      < tr >
        < th >Name< /th >
        < th >Played by< /th >
        < th >Ability< /th >
      < /tr >
      <x s l:for-each select="characters/*" >
        <xsl:choose>
          <xsl:when test="position() mod 2 = 0">
            < tr class="even" >
             < td ><xsl:value-of select="name" />< /td >
             < td ><xsl:value-of select="played_by" />< /td >
             < td ><xsl:value-of select="ability" />< /td >
            < /tr >
          </xsl:when>
          <xsl:otherwise>
            < tr class="odd" >
             < td ><xsl:value-of select="name" />< /td >
             < td ><xsl:value-of select="played_by" />< /td >
             < td ><xsl:value-of select="ability" />< /td >
            < /tr >
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    < /table >
  </body>
</html>
</xsl:template>
</xsl:stylesheet>

Create a new console application with support for XML & XSLT. For Main() function write the following code:

 DIM docXML AS NEW XmlDocument
 DIM docXSLT AS NEW XmlDocument
 DIM docXHTML AS NEW XmlDocument
 docXML.Open(User.Home & "/Heroes.xml")
 docXSLT.Open(User.Home & "/xml2xhtml.xslt")
 docXHTML = Xslt.Transform(docXML, docXSLT)
 docXHTML.Write(User.Home & "/Heroes.html")

Obviously, you can add any fancy error-handling code you want. Or proactively check if both files (Heroes.xml & xml2xhtml.xslt) exist, or if is enough space on disk to save the resultant XHTML file. Here we’ll focus to understand how the XSLT template works, because as you just see the code in Gambas is pretty straight forward. The application open the two files Heroes.xml & xml2xhtml.xslt, then calls the Transform() method that perform the conversion to XHTML and finally save the file to the specified location. I’m going to skip all HTML related and the XSL headers. The key element of the XSLT language you used is the for-each loop that will navigate thru all the elements arranged under the root tag characters. Inside the loop, the values of the elements name, played_by & ability are accessed using value-of. There is other item of the XSLT language used in this example: the choose-when-otherwise. This was used in order to apply some styling to the HTML report created from the XML document. If you need to learn more about XML, why don’t you try http://www.w3.org/TR/xslt20/