I have the following string:

blah blah yo<desc>some text with description - unwanted 
text</desc>um hey now some words yah<desc>some other description text 
stuff - more unwanted here</desc>random word and ; things. Now a hyphen 
outside of desc tag - with other text<desc>yet another description - unwanted
<desc>and that's about it.

(Note: In reality there are no newline/carriage returns in the string. I only added them here for readability.)

I want to select only the text in the desc tag from the hyphen forward, and also including the preceding space, and also including the ending desc tag. That was simple as I just did this:

\s-.*?<\/desc>

Now, the problem is that the hyphen that is outside the desc tag is getting selected too. So all my selections are as follow:

- unwanted text</desc>
- more unwanted here</desc>
- with other text<desc>yet another description - unwanted</desc>

So the first two are perfect but see how that last line is messed up because of the - outside the desc tag?

Just FYI, if interested, in my code I am doing a replace like this:

$text = preg_replace('/\s-.*?<\/desc>/', '</desc>', $text);

I tried doing some Lookbehind stuff but could not get it to work.

Any ideas?

Thanks! Mark

Accepted Answer

You could try [^-<>]* instead of .*?. This restricts what the regex can select and effectively treats angle brackets and the hyphen as tokens.

Written by mario
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki