AppleTech News

Programming languages: Regular expressions reorganized in Swift

In Swift 5.7, Apple introduces a domain-specific language to handle regular expressions properly.

Regular expressions are a powerful tool when dealing with strings. But the previous implementation in Apple’s programming language Swift is a legacy of Objective-C with all the disadvantages: the regex are cryptic, remain difficult to understand and therefore a frequent source of errors. Since the analysis does not take place during compilation, but only at runtime, the return values ​​are not typed sufficiently.

One of the Swift paradigms is that the source code should be understandable and that errors should be detected at compile time rather than at runtime. The strong, static typing contributes to the latter.

In 2019, Swift gained the ability to create domain-specific languages ​​(DSLs) that make it easy to assemble complex structures in a human-readable way. A first application of this was the composition of Views in SwiftUI. With version 5.7, which Apple presented at WWDC 2022 and which is scheduled to be released in autumn, Swift is getting the hang of it RegexBuilder a DSL for regular expressions that is intuitively understandable. These are joined by the new generic types Regex<Output> and Regex<Output>.Match. With Regex Literal, a new literal is also available that the compiler can use to analyze classic regular expressions during translation. Of the Output is strictly typed because it is known at compile time.

The icing on the cake is the inclusion of Formattern from the framework Foundation. The formatters parse types like Date and Double in local formats and recently also as part of a regular expression. This makes it easy, for example, to recognize a floating point number or a date via a regular expression, even if the string is localized in Indonesian. Of the Match delivers as Output the finished Double respectively Date.

Most of the new features are part of Swift’s standard library and are therefore available on macOS, Linux and Windows. Swift 5.7 is in beta until fall 2022. The current beta version of Xcode 14, which can be found in Apple’s developer portal, is suitable for trying out.

The new generic struct for regular expressions [Regex<Output>]. Most of the time, a regular expression is known at compile time. If this is the case, then Output a Tupel. Normally, the elements of the tuple would be of the type Substring or for alternatives its optional Substring? expect. Regex but can also recognize floating point numbers and dates and returns for them Double respectively Date return.

For a regular expression created at runtime, for example processing a string with a classic regex, is Output of the type AnyRegexOutput. The latter conforms to Collection and contains the recognized results as elements accessible via the index.

If you turn one Regex on one String on, you get a if you are successful struct Regex<Output>.Match or in case of failure nil. The property output from Match contains the actual result of the matching Output-Type.

A classic regular expression in the form of a string can have a Regex initialize:

 

let regex = try Regex("b(.*)d")
// Erkennt "b", null oder mehr beliebige Charaktere,
// die zurückgegeben werden, und ein "d".
if let match = "abcde".firstMatch(of: regex) {
  print(match.output.count) // 2
  print(match.output[0].substring) // Optional("bcd")
  print(match.output[1].substring) // Optional("c")
}

 

The code only parses the regular expression at runtime. Therefore the actual Output not known at compile time and gets the type AnyRegexOutput. The latter is indexed and contains several elements. The first stands for the entire text, which is based on the Regex was detected. The following items are the locations explicitly marked as to be returned. In the code above, this is the bracketed part "(.*)". The property substring of an element supplies the recognized text as optional Substring?.

There init generates an error in the code example, if the classic regular expression is incorrect, the call must be with try be marked.

A string literal interprets backslashes as special characters. So be on the Extended String Delimiter referenced, which embeds special characters in a character string that are accepted directly and not evaluated.

It is desirable to evaluate the regular expression during compilation, because then the types of the positions to be recognized are also known and Output can be a concrete tuple. Swift 5.7 introduces the regex literal for this, which starts and ends with a slash instead of a double quote.

The example above looks like this with Regex Literal:

 

let regex = /b(.*)d/
if let match = "abcde".firstMatch(of: regex) {
  print(match.output.0) // "bcd"
  print(match.output.1) // "c"
}

 

Of the Output from regex is of type (Substring, Substring). It is noteworthy that Substring is not optional for the regular expression since all digits are always matched when applying the Regex is successful. In case of failure, execution returns nil instead of Match return.

Both classic regular expressions and regex literal mimics Match his Output. the term .output can therefore usually be omitted: match.0 instead of match.output.0 respectively match[0].substring instead of match.output[0].substring.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button