PDA

View Full Version : Parsing XML/Formatted TXT into a HTML document.



}:8) Supermoo
May 28th, 2001, 03:35
Okay, this probaly sounds really easy to you intelligent folk, but not to me. ;)

How do I change the XML data I've got into a pretty html page for my visitors?

I want to make a search-engine using Seachfeed's results and incorporate it into my own layout, I would create boxes for it and would most likley call it via http://www.djcl.com/script.php?q=query or however you do it, you get the idea...

Here is what SearchFeed Say:


XML format

This feed format is used for the server side development. For example, if you call similar URL: http://www.searchfeed.com/rd/feed/XMLFeed.jsp?cat=dogs&excID=162&pID=571&nl=5&page=1&ip=1.1.1.1, search results will be returned in the XML formatted file. The breakdown of the URL was shown in the JavaScript example. You should provide the URL parameters that are specific to your needs.



XML will have the following format:

<Listings>

<Page></Page>

<Count></Count>

<Listing>

<Title></Title>

<URL></URL>

<URI></URI>

<Description></Description>

<Bid></Bid>

</Listing>

<Listing>

<Title></Title>

<URL></URL>

<URI></URI>

<Description> </Description>

<Bid></Bid>

</Listing>



</Listings>


So how do I make that go into a HTML template, like normal search results?

Thanks,

lucifer
May 28th, 2001, 09:04
the easy way is to


suck all the xml into a variable $xml
do some reg expression search and replace

eg


$xml=preg_replace("/<description>([^<]*)<\/description>/","<p><B>$1</B></P>",$xml);

print to browser header , $xml , footer

lucifer
May 28th, 2001, 09:49
a more cunning idea is to use the xml to create a hash of hashes of the info and then you have more flexibilty with doing your template work.

I have a perl example of this I could dig out if you wanted

}:8) Supermoo
May 28th, 2001, 15:33
I'd love to see it! Thanks lucifer! :)

Also, the lovley code you gave me goes in PHP right? or perl? or? :confused:

PS. You're new aviator's pretty kewl! Although it don't flash enough... ;)

lucifer
May 28th, 2001, 16:01
your's is pretty cool too now you've got the chessboard going.

I've been having problems with the animated transparancy on K-meleon browser so I'm having to do some reworking updates soon


what language is your preferrence?


here's the code



use strict;
my $sampleXML = qq(
<?xml version="1.0"?>
<response>
<portfolio>
<portID>Test</portID>
<portname>Consors Test Portfolio</portname>
<xloss>-5124.88530628508</xloss>
<riskgrade>129.39826419201</riskgrade>
<value>114260.469443242</value>
<divriskgrade>68.4645462588067</

divriskgrade>
<divxloss>-2131.56017715564</divxloss>

<position>
<posID>Bayer</posID>
<xloss>-197.254957013363</xloss>
<riskgrade>148.955450569846</riskgrade>
<value>6138</value>
<riskimpact>-0.00822567904632093</riskimpact>
</position>
<position>
<posID>Bond</posID>
<xloss>-157.580177492695</xloss>
<riskgrade>29.3498701802706</riskgrade>
<value>18922.4694432424</value>
</position>
</portfolio>
</response>
);

my $portresults = parsePortfolioResponse($sampleXML);

sub parsePortfolioResponse {
my $response = $_[0];

my %portresults;

my $stats = "riskgrade|riskimpact|xloss|value|divriskgrade|divxloss|posID";

#assumes portstats come first
my $ref = "portfolio";
while ($response =~ m!<($stats)>(.*?)</($stats)>!g) {
if ($1 eq "posID") {
$ref = $2;
} else {
$portresults{$ref}->{$1} = $2 if $2;
}
}

return \%portresults;
}




you'd need to do a rewrite for what you want but it should give you an insight

have fun ;)

}:8) Supermoo
May 28th, 2001, 16:11
Oh thankyou lucifer!

* Kisses satan's feet, oooh... there hot! ;) *

Thanks! Thanks! Thanks! :)


Originally posted by lucifer
your's is pretty cool too now you've got the chessboard going.

I've been having problems with the animated transparancy on K-meleon browser so I'm having to do some reworking updates soon

Sounds very kewl! I'll be waitinf for the new one to come out!


Originally posted by lucifer
what language is your preferrence?

I'm guessing for this task PHP would be most useful... just a guess, I'll go with most things...


Originally posted by lucifer
here's the code



use strict;
my $sampleXML = qq(
<?xml version="1.0"?>
<response>
<portfolio>
<portID>Test</portID>
<portname>Consors Test Portfolio</portname>
<xloss>-5124.88530628508</xloss>
<riskgrade>129.39826419201</riskgrade>
<value>114260.469443242</value>
<divriskgrade>68.4645462588067</

divriskgrade>
<divxloss>-2131.56017715564</divxloss>

<position>
<posID>Bayer</posID>
<xloss>-197.254957013363</xloss>
<riskgrade>148.955450569846</riskgrade>
<value>6138</value>
<riskimpact>-0.00822567904632093</riskimpact>
</position>
<position>
<posID>Bond</posID>
<xloss>-157.580177492695</xloss>
<riskgrade>29.3498701802706</riskgrade>
<value>18922.4694432424</value>
</position>
</portfolio>
</response>
);

my $portresults = parsePortfolioResponse($sampleXML);

sub parsePortfolioResponse {
my $response = $_[0];

my %portresults;

my $stats = "riskgrade|riskimpact|xloss|value|divriskgrade|divxloss|posID";

#assumes portstats come first
my $ref = "portfolio";
while ($response =~ m!<($stats)>(.*?)</($stats)>!g) {
if ($1 eq "posID") {
$ref = $2;
} else {
$portresults{$ref}->{$1} = $2 if $2;
}
}

return \%portresults;
}




you'd need to do a rewrite for what you want but it should give you an insight

have fun ;)

Thanks, I'll post my re-written work here first... because I'll most likley make a few mistakes! Thanks again! :)

Lucifer helps the cow out of it's problems,

lucifer
May 28th, 2001, 16:18
I should have been a vet doing all this stuff for a cow and that damn cat :D :D

}:8) Supermoo
May 28th, 2001, 16:21
lol, excitement short lived... hitting self in head...

I'll need variable thingy for search, $earch, don't know how do do that... :o

Also I'll just use $IP for their IP, hope that works...



use strict;
my $sampleXML = http://www.searchfeed.com/rd/feed/XMLFeed.jsp?cat=/$earch&excID=162&pID=571&nl=5&page=1&ip=$IP);

my $portresults = parseListingsListing($sampleXML);


How's it going? :confused:

Thanks,

lucifer
May 28th, 2001, 16:39
In Perl I think you'll have to use LWP module

I'd use php



$xml=file("whateverthaturlwas");

$xml=implode("",$xml);


for the IP use their's $REMOTE_ADDR or your sites or just make one up and see if it works

}:8) Supermoo
May 29th, 2001, 02:43
Aaahhh.... I'm dying here... 2 hours and I give up! :(



<?PHP

$xml=file("http://www.searchfeed.com/rd/feed/XMLFeed.jsp?cat=dogs&excID=162&pID=571&nl=5&page=1&ip=$REMOTE_ADDR");

$xml = str_replace("<title>","<b>",$xml);
$xml = str_replace("</title>","</b>",$xml);
$xml = str_replace("<url>","<i>",$xml);
$xml = str_replace("</url>","</i>",$xml);

// $xml=implode("",$xml); Don't Work
// echo $xml; Comes up with Array
// implode & join don't work... :(

?>


I didn't get very far, although I get the basic idea, I think... I just need to know how to print the array...

Also how do you make <url>*</url> a string?

Thanks,

lucifer
May 29th, 2001, 04:28
this worked



<?

# get xml
$xml=file("http://www.searchfeed.com/rd/feed/XMLFeed.jsp?cat=dogs&excID=162&pID=571&nl=5&page=1&ip=1.1.1.1");

$xml=implode("",$xml);


# get page/count info
preg_match("/<Page>(.*?)<\/Page>/i",$xml,$match);
$page=$match[1];

preg_match("/<Count>(.*?)<\/Count>/i",$xml,$match);
$count=$match[1];


# parse xml
$tags="(Title|URL|URI|Description|Bid)";

preg_match_all("/<$tags>(.*?)<\/$tags>/i",$xml,$match,PREG_SET_ORDER);

# now all data in $match[x][y]
# bit of funny order
# $match[x][1] = tag name
# $match[x][2] = value
#
# where x=1 - 6 first record, 7-11 second etc.
# could be sorted better


# print data

for( $i=0 ; $i<($count * 5) ; $i++ ){

# clean up title/descript fields due to nasty xml

$match[$i][2]=preg_replace ("/<!\[CDATA\[/","",$match[$i][2]);
$match[$i][2]=preg_replace ("/\]\]>/","",$match[$i][2]);

echo $match[$i][1] .": ". $match[$i][2]. "<br>";

}
?>



hope that helps ;)

lucifer
May 29th, 2001, 04:42
It's here (http://www.the-antichrist.com/meow/moo.php3)

it puts the xml at the end too

lucifer
May 29th, 2001, 05:52
Here we go again :) this is a much better version and sorts things into an array of hashes so that it's simpler to use.

I don't know why you had an implode problem - it's a core part of the language.

the output page is poor no <HEAD><HTML> etc but it shows how to do it. formating and link etc.




<?
# get xml
$xml=file("http://www.searchfeed.com/rd/feed/XMLfeed.jsp?cat=dog&execID=162&pID=571&nl=5&page=1&ip=1.1.1.1");
$xml=implode("",$xml);

# pull out page/count tag values
preg_match("/<Page>(.*?)<\/Page>/i",$xml,$match);
$page=$match[1];
preg_match("/<Count>(.*?)<\/Count>/i",$xml,$match);
$count=$match[1];

# get rid of nasty bits in xml (title/description)
$xml=preg_replace("/<!\[CDATA\[/","",$xml);
$xml=preg_replace("/\]\]>/","",$xml);

# pull out the fields/values
$tags="(Title|URL|URI|Description|Bid)";
preg_match_all("/<$tags>(.*?)<\/$tags>/i",$xml,$match,PREG_SET_ORDER);

# arrange into array of hashes
# $info[ result number ][field]
# eg $info[0][Title] - title of first result
# $info[3][URL] - URL of 4th record
for ($i=0;$i<$count;$i++){
for ($j=0;$j<5;$j++){
$key=$match[($i*5+$j)][1];
$info[$i][$key]=$match[($i*5+$j)][2];
}
}


# output page

echo "<p>Page $page of $count</p>";

for ($i=0;$i<$count;$i++){
?>
<p><a href="<?=$info[$i][URL]?>" target="_new"><b><?=$info[$i][Title]?></b></a><br><?=$info[$i][Description]?></p><hr>

<? } ?>

}:8) Supermoo
May 29th, 2001, 15:19
Thankyou soooo much lucifer!!! But before I kiss your hot feet again I think I'll be checking it... ;)

{EDIT}

Trying the last one...


Warning: Unknown option 'P' in /home/djcl/public_html/beta/php/31.php on line 11

Warning: Unknown option 'C' in /home/djcl/public_html/beta/php/31.php on line 13

Warning: Compilation failed: missing terminating ] for character class at offset 9 in /home/djcl/public_html/beta/php/31.php on line 17

Warning: Unknown option '(' in /home/djcl/public_html/beta/php/31.php on line 22

Page of


Parse error: parse error in /home/djcl/public_html/beta/php/31.php on line 42


Hmmm... I'll go check my server config for PHP etc...

Apparently I need a header with the code #!/usr/bin/php4 included to access all the php4 stuff... I added it under the <? thingy... and used the first script... no luck, second...



Warning: Unknown option 'P' in /home/djcl/public_html/beta/php/34.php on line 11

Warning: Unknown option 'C' in /home/djcl/public_html/beta/php/34.php on line 14

Warning: Unknown option '(' in /home/djcl/public_html/beta/php/34.php on line 21



Damn! :(

lucifer
May 29th, 2001, 15:35
funny things :o

seems all my \ got eaten when I Ctrl-V




# pull out page/count tag values

preg_match("/<Page>(.*?)</Page>/i",$xml,$match);

$page=$match[1];

preg_match("/<Count>(.*?)</Count>/i",$xml,$match);

$count=$match[1];


# get rid of nasty bits in xml (title/description)

$xml=preg_replace("/<![CDATA[/","",$xml);

$xml=preg_replace("/]]>/","",$xml);




# pull out the fields/values

$tags="(Title|URL|URI|Description|Bid)";

preg_match_all("/<$tags>(.*?)</$tags>/i",$xml,$match,PREG_SET_ORDER);





should be




# pull out page/count tag values

preg_match("/<Page>(.*?)<\/Page>/i",$xml,$match);

$page=$match[1];

preg_match("/<Count>(.*?)<\/Count>/i",$xml,$match);

$count=$match[1];


# get rid of nasty bits in xml (title/description)

$xml=preg_replace("/<!\[CDATA\[/","",$xml);

$xml=preg_replace("/\]\]>/","",$xml);




# pull out the fields/values

$tags="(Title|URL|URI|Description|Bid)";

preg_match_all("/<$tags>(.*?)<\/$tags>/i",$xml,$match,PREG_SET_ORDER);




sorry about that

lucifer
May 29th, 2001, 15:42
#! thing should be on first line of script before anything

I think this board must use stripslashes() somewhere in it's routine

the stuff above is for the last script the other needs similar slashes putting in

I know they work cos I ran them on my server before I posted them

sorry you've been f****d about


let me know how you do and let's see it when it's working It's got me curious about this bidding stuff

}:8) Supermoo
May 29th, 2001, 16:02
Thanks again! We're getin' closer, well when I say we I really mean you... but anyway...

Okay, that's sort of there, now I've only got...


Parse error: parse error in /home/djcl/public_html/beta/php/37.php on line 41


So I modified the line to:


<p><a href="<?=$info[$i][URL]?>" target="_new"><b><?=$info[$i][Title]?><\/b><\/a><br><?=$info[$i][Description]?><\/p><hr>

Although still no luck... :(

lucifer
May 29th, 2001, 16:19
Originally posted by }:8) Supermoo
Thanks again! We're getin' closer, well when I say we I really mean you... but anyway...

Okay, that's sort of there, now I've only got...


So I modified the line to:


<p><a href="<?=$info[$i][URL]?>" target="_new"><b><?=$info[$i][Title]?><\/b><\/a><br><?=$info[$i][Description]?><\/p><hr>

Although still no luck... :(
good try



echo "<p><a href=\"<?=$info[$i][URL]?>\" target=\"_new\"><b><?=$info[$i][Title]?></b></a><br><?=$info[$i][Description]?></p><hr>";

this is a "quoted" string so we need to backslash " and \ only to keep them

the others were regular expressions! so different rules in reg exp's backslash everything that is not alphanumeric if you want it to be it's self

confussing :)

}:8) Supermoo
May 29th, 2001, 16:30
lol :confused:

Anyway... I changed it and suprise... got another error!

It returns the following...



#!/usr/bin/php4
Page of


Parse error: parse error in /home/djcl/public_html/beta/php/40.php on line 42


I'm heading off now, so c'ya later... :)

lucifer
May 29th, 2001, 18:14
ignore that last thing it was html not php anyway - me being silly

I've incuded the file more sensible it worked for me!!

big problem

that link is down

my script does not error check (yet)

let me know how you get on

lucifer
May 30th, 2001, 04:41
that last error

you are still running php 3 not 4

<?=$var ?> not allowed you need <? echo $var ?> instead

to get php 4 you'll need the #! thing plus rename it to *.cgi or something like that otherwise the #! will be ignored

hope all is well


mooooooooooooooooooooooooooooooooooooooo :)

}:8) Supermoo
May 30th, 2001, 15:28
Mooooooo! :D


Originally posted by lucifer
...to get php 4 you'll need the #! thing plus rename it to *.cgi or something like that otherwise the #! will be ignored...

Done!

First off thanks for all your help! Here goes...



Page of


Dang! (http://www.djcl.com/beta/php/moo.cgi)

Well I don't think it's printing much... is that because the files down? I'll check again in an hour or two...

Thanks,

lucifer
May 30th, 2001, 15:38
Originally posted by }:8) Supermoo

Well I don't think it's printing much... is that because the files down?
Yep 404 error

:( no error checking so no graceful fail

see if you can check the link

have fun

}:8) Supermoo
May 30th, 2001, 15:54
Well Thankyou so much lucifer! :)

Also, how do I get this admission to your website? looks interesting... ;)

lucifer
May 30th, 2001, 16:04
Originally posted by }:8) Supermoo
Also, how do I get this admission to your website? looks interesting... ;)

do cows have souls to sell ????? ;)

}:8) Supermoo
May 30th, 2001, 16:07
lol, I don't think so... although I've got four to five stomaches! :)

lucifer
May 30th, 2001, 16:27
there's currently a waiting list .....

}:8) Supermoo
May 31st, 2001, 01:33
Damn... :(

[EDIT]

PS. The results are working... although the script ain't getting them! :(

lucifer
May 31st, 2001, 09:52
change the

http://www.searchfeed.com/rd/feed/XMLFeed.jsp?cat=dogs&excID=162&pID=571&nl=5&page=1&ip=1.1.1.1

bit in the program it was wrong - not sure why now right

that's the only line to change I hope

live version (http://www.the-antichrist.com/meow/moo2.php3)

meow
May 31st, 2001, 12:09
Originally posted by lucifer
I should have been a vet doing all this stuff for a cow and that damn cat :D :D
:eek: Eeeeeeeeeeeeeeeeeeeeeeeeeeek! I heard that! :mad:
And toe kissing too! Icky.
(sharpening claws and grumbling dangerously)

lucifer
May 31st, 2001, 13:17
Your're lucky think what supermoo has to go through at the vets

puts on large rubber glove

meow
May 31st, 2001, 13:25
I cow hardly notices. :D

}:8) Supermoo
May 31st, 2001, 15:20
lol, you people are getting sick... ;)

meow
May 31st, 2001, 15:27
Oh, did you notice?!? :D

------------------------------
Green cows are better.
-----------------------------

}:8) Supermoo
May 31st, 2001, 15:33
Green cows are better are they? lucifer reckon' you could give me a green-paint job aswell? :D

<EDIT>

All working now lucifer! Thanks again!!! :)

lucifer
May 31st, 2001, 15:40
I have to say I think supermoo does look much healthier

meow
May 31st, 2001, 15:44
Well, if the glove treatment is what it takes to look fit, I rather stay sickly and green! :D

}:8) Supermoo
May 31st, 2001, 16:02
No, once you've had glove you can never go back! :)

*lucifer has e-mail*

lucifer
May 31st, 2001, 16:03
you know you want it ;)

lucifer
May 31st, 2001, 16:05
the cow is psycic

}:8) Supermoo
May 31st, 2001, 16:06
lol

meow
May 31st, 2001, 16:09
I'll report you to the moderator. This is green cow abuse. :mad:
FYI we green ones are very tight *ssed.

meow
May 31st, 2001, 16:10
I'll create an Anti You Two Site. :mad:

}:8) Supermoo
May 31st, 2001, 16:13
Meow, settle...

http://www.freewebspace.net/forums/showthread.php?s=&threadid=5653

meow
May 31st, 2001, 16:22
What? :confused:

lucifer
May 31st, 2001, 16:30
too many browser windows

:confused:

Cheap Bastard
June 21st, 2001, 17:33
just wondering...
what exactly does (.*?) mean?

gyrbo
June 22nd, 2001, 09:21
SuperMoo, did the script worked? Can you give me a url?

lucifer
June 22nd, 2001, 10:10
Originally posted by Cheap Bastard
just wondering...
what exactly does (.*?) mean?

. = any character (except new line unless you tell it to include them using /s option)

* = zero or more

? = match as little as possible

() = remember this bit for later + plus helps group things

so


/x(.*?)x/ on string "x123x456x"

(.*?) matches 123 as the first match
(.*) would match 123x456

takes sometime to get your head round then it's easy but can take time to translate them

lucifer
June 22nd, 2001, 10:13
Originally posted by gyrbo
SuperMoo, did the script worked? Can you give me a url?

http://www.the-antichrist.com/meow/moo2.php3

is a demo but it's in further development

atlas
June 22nd, 2001, 22:59
Hmm... there are perl modules to do this kind of thing. Off the top of my head I think XML::Parser does the stuff you're asking for.

-mk

atlas
June 22nd, 2001, 23:01
I'm pretty sure '?' means the part directly before it is optional.

So 'colou?r' would match both spellings: color and colour

-mk

Cheap Bastard
June 23rd, 2001, 09:19
thanks lucifer

lucifer
June 25th, 2001, 07:09
Originally posted by atlas
I'm pretty sure '?' means the part directly before it is optional.

So 'colou?r' would match both spellings: color and colour

-mk

? is ungreedy matching


you want something like /colou{0,1}r/

gyrbo
June 25th, 2001, 09:37
Originally posted by lucifer


http://www.the-antichrist.com/meow/moo2.php3

is a demo but it's in further development

Hmm, it are the results from a search eneigne, but where do I type what I want to search for????:confused:

lucifer
June 25th, 2001, 09:56
try this

http://www.the-antichrist.com/moo/moo5.php3

it is a little lame and the links may not work etc

but as I said just a demo

gyrbo
June 25th, 2001, 10:15
Looks good.