5.09.2007
RegEx Pattern: No Whitespace in String Literal
A colleague of mine, C. Bess, recently asked me for help creating a RegEx pattern for matching whitespace in a string, while avoiding matching the whitespace that was inside quotes. Take the following string for example:
The quick "red fox" jumped over "the lazy" brown dog.
He needed to match the whitespace between each of the words, except the whitespace between "red" and "fox", and "the" and "lazy". Splitting based on the matches would return:
[0] => The
[1] => quick
[2] => "red fox"
[3] => jumped
[4] => over
[5] => "the lazy"
[6] => brown
[7] => dog.
The beast of a pattern I created uses a combination of forward and backward positive and negative assertions. So, without any further ado:
Note - you'll need to remove the line break from the middle of the pattern; I had to put one in for formatting purposes.
The quick "red fox" jumped over "the lazy" brown dog.
He needed to match the whitespace between each of the words, except the whitespace between "red" and "fox", and "the" and "lazy". Splitting based on the matches would return:
[0] => The
[1] => quick
[2] => "red fox"
[3] => jumped
[4] => over
[5] => "the lazy"
[6] => brown
[7] => dog.
The beast of a pattern I created uses a combination of forward and backward positive and negative assertions. So, without any further ado:
\s(?![^"]*\")|\s(?<!\"[^"]*)|\s(?=(?:[^"]*\"[^"]*\"[^"]*)+[^"]*$)|
\s(?<=(?:^[^"]*"[^"]*\"[^"]*)+[^"]*)
Note - you'll need to remove the line break from the middle of the pattern; I had to put one in for formatting purposes.
Labels: literal, pattern, regex, regex pattern, regular expression, string, whitespace
Subscribe to Posts [Atom]