Print Topic - Archive

XBLite Forum  /  XBLite Programming  /  gxml 2.3.x
Posted by: Rhett Thompson, September 4, 2008, 1:40pm
9-3-08 :: 2.3.9 :: Added XmlCompile$() which compiles the currently loaded tree into a string.
-When changing a nodes data, the .dataStart/.dataEnd methods are now correctly broken.
-XmlAddAttribute() now rightly replaces the specified attribute if it already exists.
-Fixed a bug in XmlGetChildren().
-Fixed a bug in XmlGetNodePoolAddress ().
Posted by: Spyke, September 4, 2008, 1:53pm; Reply: 1
Thanks Rhett.

The new XmlCompile$() function works on my data as well. Many thanks! ;D

I had an issue with parsing the data to start with which I tracked down to badly-formed xml on my part, but you may want to consider it.

In my xml files the data is not held in CDATA constructs, and there are a couple of nodes where the data contains an apostrophe rather than the entity '. When this happens the subsequent nodes are skipped until we reach a node which also has an apostrophe. The remaining nodes are then collected as children of the first node that had an apostrophe.

Clearly my apostrophes should be held as ', but the program I'm dealing with strips these out and returns xml with normal apostrophes in the data values each time it saves the file, and I imagine this could happen elsewhere. It's easy enough to work round, so it's not causing me a problem, but maybe gxml could take this into account?

EDIT 1: Hmm. It is going to cause me a problem, since because XmlCompile() works on the whole tree I have to save the whole file rather than just the small bit I'm dealing with, and the apostrophes throughout the source document corrupt the output.

EDIT 2: Problem solved by cleaning up the input file by adding the entities first.

Cheers,
Spyke

Posted by: Rhett Thompson, September 4, 2008, 4:56pm; Reply: 2
Hi,

I was contemplating this very problem a few minutes ago actually:P  Aside from the quotes (which is a simple parser problem, it should ignore containers unless they are between < >'s), I came up with a function that would partially parse the document and translate "dangerous" characters into references.  This would obviously slow things down unnecessarily, as a well-formed document wouldn't have any trouble to begin with.  So I don;t know if I'm going to do that or not, but trust that I'll fix that little bug with the container symbols.

Quoted Text
Problem solved by cleaning up the input file by adding the entities first.


You mean replacing the characters with the references right?
Posted by: Spyke, September 5, 2008, 10:29am; Reply: 3
Quoted from Rhett Thompson
You mean replacing the characters with the references right?

Yes, I ran my input file through a function that replaces the characters in data sections with references before using XmlLoadFromString():

Code
FUNCTION UpdateXmlWithEntities (@data$)

  inNode = $$FALSE
  len = LEN(data$)
  FOR i = 0 TO len
	
    IF data${i} == '<' THEN inNode = $$TRUE
    IF data${i} == '>' THEN inNode = $$FALSE
	
    IFF inNode THEN
      SELECT CASE data${i}
        CASE '''
          data$=XstMergeStrings$ (data$, "&apos;", i+1, 1)
          len=LEN(data$)
        CASE '"'
          data$=XstMergeStrings$ (data$, "&quot;", i+1, 1)
          len=LEN(data$)
        ' CASE and so on for the other entities
        END SELECT
      END IF
   NEXT
END FUNCTION


Posted by: Rhett Thompson, September 5, 2008, 5:29pm; Reply: 4
Hi,

Yeah that is decent solution!  I don't know if I will incorporate something similar into 2.4 or not, but probably:P I know for a fact tht I'm going to rewrite the core parser, as it is getting a little old.  There probably won't be any API changes, just the main parsing routine.  I think I might also steal something from one of my earlier projects, and use it with how data is gathered.  Which would allow for "mutations" to the parser.

P.S.  If you discover any more bugs please let me know;)
Posted by: Spyke, September 5, 2008, 6:15pm; Reply: 5
Haven't found any more bugs yet, and I've been working with it most of the day  :)

It works well. A couple of times I've thought "Why isn't there a function to do x" but then realised that it was pretty easy to work round the issue using the data structure.

These included a function to return the child nodes of a specific parent, when you don't know the name of the child elements. I had to resort to parsing the whole data tree to find nodes with the specified parent. But a function where you specified the parent and a number n that returned the nth child node would be useful. You would run this after XmlGetChildren().*

It would also be good if XmlGetChild() returned an error if the Child node didn't exist (or am I missing something here?).

Cheers,
Spyke

* In my file I have a number of lists where I know the list tag name, and the list structure, but each element of the list is given an incremental tag <id-00001>, <id-00002>, etc, and there may be gaps, so I know the parent, but not the exact child name.
Posted by: Rhett Thompson, September 5, 2008, 7:48pm; Reply: 6
Quoted Text
Haven't found any more bugs yet, and I've been working with it most of the day :)


Cool:)

Quoted Text
It would also be good if XmlGetChild() returned an error if the Child node didn't exist (or am I missing something here?).


Well, XmlGetChild() returns an XMLNODE, so if anything went wrong it returns an empty one.  To see if it found the child, would could just test for the .name member(i.e. a=XmlGetChild():IF a.name THEN childExists=$$TRUE)

Quoted Text
In my file I have a number of lists where I know the list tag name, and the list structure, but each element of the list is given an incremental tag <id-00001>, <id-00002>, etc, and there may be gaps, so I know the parent, but not the exact child name.


This would work without messing with the lower level functions; although not exactly as your example has the leading zeros...

Code
n=XmlGetChildren(parent)
FOR i=1 TO n
	child=XmlGetChild(parent, "id-"+STRING$(i), 0)
	IF child.name THEN ? CSTRING$(child.name)
NEXT i


A simpler solution would be to add a function like the one you described (although I swear I had a function like that already), expect it in 2.4:)
Posted by: Spyke, September 6, 2008, 3:19pm; Reply: 7
Thanks for the answers, Brett. Very useful, and I should have thought of checking for the name!

I can't use the method you describe for finding the unknown children exactly this way, as there can be gaps in the tag sequence I get (if elements were deleted), but I could do it with a DO loop, and keep incrementing until I've found them all. Not ideal for performance but it'll work and will probably be fine.

Cheers,
Spyke
Posted by: Rhett Thompson, September 8, 2008, 12:27pm; Reply: 8
Hi,

I've rewritten the parser significantly, and the problems that originally plagued it are no more.  It's also loads faster, and handles comments a lot better.  I'm still testing it, but so far I haven't seen any problems, so expect 2.4 soon:)

Later.
Print page generated: February 5, 2012, 2:40pm