I'm looking for a regular expressions string that can find a word or regex string NOT between html tags.

Say I want to replace (alpha|beta) in: the first two letters in the greek alphabet are alpha and <b>beta</b>

I only want it to replace alpha, because beta is between <> tags. So ignore (<(.*?)>(.*?)<\/(.*?)>)



Consider using the code{} button when writing your question

Written by Calum

Sorry, just joined this site. Will use it in the future. :)

Written by Dark Slipstream

It's ok :) it's just if you try to use tags it might not work without the code wrapper.

Written by Calum

It looks to me everything is between tags in html.

Written by sln

@sln, I mean on one line. Limited between \r\n at the beginning and end.

Written by Dark Slipstream

Is the matching word always going to be immediately surrounded by tags (e.g., <b>beta</b>)? What if it's within a tag but with other stuff (e.g., <b>prescribed beta blockers</b>)? (BTW, I just did a whole bunch of work to match all text not in tags until I reread and understood your question. I'll share it later, since it's too good to waste now. :-)

Written by Wiseguy

Accepted Answer

I didn't test the logic used in this page - http://www.phpro.org/examples/Get-Text-Between-Tags.html But I can confirm the logical point made at the top of the page in big bold letters that says you shouldn't do what you're trying to do with regex.

Html is not uniform and edge cases will always bite you in the rear if you use regular expressions to handle the content of those tags in any real world situation. So unless your markup is extremely simplistic, uniform, 100% accurate, only contains html (not css, javascript or garbage) then your best bet is a dom parser library.

And really many dom parser libraries have problems too but you'll be miles ahead of the regex counterparts. The best way to get the text contet of tags is to render the html in a browser and access the innerText property of the given dom node (or have a human copy and paste the contents out manually) - but that isn't always an option :D

Written by Marcus Pope
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki