I looked through related questions before posting this and I couldn't modify any relevant answers to work with my method (not good at regex).

Basically, here are my existing lines:

$code = preg_replace_callback( '/"(.*?)"/', array( &$this, '_getPHPString' ), $code );

$code = preg_replace_callback( "#'(.*?)'#", array( &$this, '_getPHPString' ), $code );

They both match strings contained between '' and "". I need the regex to ignore escaped quotes contained between themselves. So data between '' will ignore \' and data between "" will ignore \".

Any help would be greatly appreciated.

Comments

Do you need to be able to handle escaped slashes as well? In other words should it assume that any quote preceded by a slash is escaped, even if that slash is itself preceded by a slash?

Written by Phoenix

@Phoenix, if you are referring to \\" and \\', then no I do not.

Written by Dark Slipstream

if you don't handle escaping the escape character, then escaping a particular character is invalid.

Written by sln

Accepted Answer

For most strings, you need to allow escaped anything (not just escaped quotes). e.g. you most likely need to allow escaped characters like "\n" and "\t" and of course, the escaped-escape: "\\".

This is a frequently asked question, and one which was solved (and optimized) long ago. Jeffrey Friedl covers this question in depth (as an example) in his classic work: Mastering Regular Expressions (3rd Edition). Here is the regex you are looking for:

Good:

"([^"\\]|\\.)*"
Version 1: Works correctly but is not terribly efficient.

Better:

"([^"\\]++|\\.)*" or "((?>[^"\\]+)|\\.)*"
Version 2: More efficient if you have possessive quantifiers or atomic groups (See: sin's correct answer which uses the atomic group method).

Best:

"[^"\\]*(?:\\.[^"\\]*)*"
Version 3: More efficient still. Implements Friedl's: "unrolling-the-loop" technique. Does not require possessive or atomic groups (i.e. this can be used in Javascript and other less-featured regex engines.)

Here are the recommended regexes in PHP syntax for both double and single quoted sub-strings:

$re_sq = '/"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"/s';
$re_dq = "/'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'/s";
Written by ridgerunner
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki