Embedded regex flags

The hardest part of using regular expressions is not crafting regular expressions per se. In my opinion, the two hardest parts are minor syntax variations between implementations, and all the environmental stuff outside of regular expressions per se.

Embedded regular expression modifiers address one of the environmental complications by putting the modifier in the regular expression itself.

For example, if you want to make a grep search case-insensitive, you pass it the -i flag. But if you want to make a regex case-insensitive inside a Python program, you pass a function the argument re.IGNORECASE. But if you put (?i) at the beginning of your regular expression, then the intention to make the match case-insensitive is embedded directly into the regex. You could use the regex in any environment that supports (?i) without having to know how to specify modifiers in that environment.

I was debugging a Python script this morning that worked under one version of Python and not under another. The root of the problem was that it was using re.findall() with several huge regular expression that had embedded modifiers. That was OK up to Python 3.5, then it was a warning between versions 3.6 and 3.10, and it’s an error in versions 3.11 and later.

The problem isn’t with all embedded modifiers, only global modifiers that don’t appear at the beginning of the regex. Older versions of Python, following Perl’s lead, would let you put a modifier like (?i) in the middle of a regex, and apply the modifier from that point to the end of the expression. In the latest versions of Python, you can either place the modifier at the beginning of the regex, or use a scoped modifier like (?:…) in the middle of the expression.

I didn’t want to edit the regular expressions in my code—some had over a thousand characters—so I changed re.findall() to regex.findall(). The third-party regex module is generally more Perl-compatible than Python’s standard re module.

The post Embedded regex flags first appeared on John D. Cook.

Liked Liked