In Swift 5.7, Apple introduces a domain-specific language to handle regular expressions properly.
Regular expressions are a powerful tool when dealing with strings. But the previous implementation in Apple’s programming language Swift is a legacy of Objective-C with all the disadvantages: the regex are cryptic, remain difficult to understand and therefore a frequent source of errors. Since the analysis does not take place during compilation, but only at runtime, the return values are not typed sufficiently.
One of the Swift paradigms is that the source code should be understandable and that errors should be detected at compile time rather than at runtime. The strong, static typing contributes to the latter.
In 2019, Swift gained the ability to create domain-specific languages (DSLs) that make it easy to assemble complex structures in a human-readable way. A first application of this was the composition of View
s in SwiftUI. With version 5.7, which Apple presented at WWDC 2022 and which is scheduled to be released in autumn, Swift is getting the hang of it RegexBuilder
a DSL for regular expressions that is intuitively understandable. These are joined by the new generic types Regex<Output>
and Regex<Output>.Match
. With Regex Literal, a new literal is also available that the compiler can use to analyze classic regular expressions during translation. Of the Output
is strictly typed because it is known at compile time.
The icing on the cake is the inclusion of Formatter
n from the framework Foundation
. The formatters parse types like Date
and Double
in local formats and recently also as part of a regular expression. This makes it easy, for example, to recognize a floating point number or a date via a regular expression, even if the string is localized in Indonesian. Of the Match
delivers as Output
the finished Double
respectively Date
.
Most of the new features are part of Swift’s standard library and are therefore available on macOS, Linux and Windows. Swift 5.7 is in beta until fall 2022. The current beta version of Xcode 14, which can be found in Apple’s developer portal, is suitable for trying out.
Regex, Output and Match
The new generic struct
for regular expressions [Regex<Output>
]. Most of the time, a regular expression is known at compile time. If this is the case, then Output
a Tupel
. Normally, the elements of the tuple would be of the type Substring
or for alternatives its optional Substring?
expect. Regex
but can also recognize floating point numbers and dates and returns for them Double
respectively Date
return.
For a regular expression created at runtime, for example processing a string with a classic regex, is Output
of the type AnyRegexOutput
. The latter conforms to Collection
and contains the recognized results as elements accessible via the index.
If you turn one Regex
on one String
on, you get a if you are successful struct
Regex<Output>.Match
or in case of failure nil
. The property output
from Match
contains the actual result of the matching Output
-Type.
Classic regular expressions
A classic regular expression in the form of a string can have a Regex
initialize:
let regex = try Regex("b(.*)d")
// Erkennt "b", null oder mehr beliebige Charaktere,
// die zurückgegeben werden, und ein "d".
if let match = "abcde".firstMatch(of: regex) {
print(match.output.count) // 2
print(match.output[0].substring) // Optional("bcd")
print(match.output[1].substring) // Optional("c")
}
The code only parses the regular expression at runtime. Therefore the actual Output
not known at compile time and gets the type AnyRegexOutput
. The latter is indexed and contains several elements. The first stands for the entire text, which is based on the Regex
was detected. The following items are the locations explicitly marked as to be returned. In the code above, this is the bracketed part "(.*)"
. The property substring
of an element supplies the recognized text as optional Substring?
.
There init
generates an error in the code example, if the classic regular expression is incorrect, the call must be with try
be marked.
A string literal interprets backslashes as special characters. So be on the Extended String Delimiter
referenced, which embeds special characters in a character string that are accepted directly and not evaluated.
Detected when translating
It is desirable to evaluate the regular expression during compilation, because then the types of the positions to be recognized are also known and Output
can be a concrete tuple. Swift 5.7 introduces the regex literal for this, which starts and ends with a slash instead of a double quote.
The example above looks like this with Regex Literal:
let regex = /b(.*)d/
if let match = "abcde".firstMatch(of: regex) {
print(match.output.0) // "bcd"
print(match.output.1) // "c"
}
Of the Output
from regex
is of type (Substring, Substring)
. It is noteworthy that Substring
is not optional for the regular expression since all digits are always matched when applying the Regex
is successful. In case of failure, execution returns nil
instead of Match
return.
Both classic regular expressions and regex literal mimics Match
his Output
. the term .output
can therefore usually be omitted: match.0
instead of match.output.0
respectively match[0].substring
instead of match.output[0].substring
.