I am searching for hours a solution (in PHP) to this:
I have some XML files which structure may vary
<page id="this is what I want to extract">
<boh>
<bah>
<other childs (maybe one, maybe ten)>
<ref id="This is all I know!"> Some text Lorem Ipsum</ref>
I need two formulas which would be able:
to extract the page id from a search inside all the entire file for either a specific ref id, or some partial text inside the ref tag
In brief all I know about this file is: It has a ref tag, which sometimes has an id and always some text inside. I either have the ref id or some portions of the text. I need to find the id of the page node in which ref is contained.
So: Search for "This is all I know!" as ref id would output "this is what I want to extract"
as well as
Search for "Lorem" as text inside ref would output "this is what I want to extract"
How can I accomplish this? I've googled a lot, I think I should make something related with SimpleXML and XPATH, but I never used them in this way.
You can use this XPath expression in your code:
//page[contains(.//ref/text(), 'Lorem')]/@id
It will search all <ref>
elements that are descendants of <page>
and compare the text with the string 'Lorem' (which, in your code, you should pass as a variable). It will return a set containing all the ids of the <page>
elements that contain the matching text.