02 August 2024 | Challenge 280 |
Regular Pairs
Task 1: Twice Appearance
Submitted by: Mohammad Sajid Anwar
You are given a string, $str
, containing lowercase English letters only.
Write a script to print the first letter that appears twice.
Example 1
Input: $str = "acbddbca"
Output: "d"
Example 2
Input: $str = "abccd"
Output: "c"
Example 3
Input: $str = "abcdabbb"
Output: "a"
Solution
Trying to solve this task using a single regular expression, ignoring the restriction to lowercase Latin letters.
First insight: We cannot use a pure look ahead assertion to identify the requested character pair, as this would match the first character that appears twice. Though this is exactly the wording in the task description, the examples suggest something slightly different:
Find the first character that is preceded by itself in the string.
Therefore we need some kind of look behind assertion.
If there were true variable length look behind assertions, we could use such an illegal regex:
/(.)(?<=\1.+)/
and we were done.
But even the length-limited variable look behind assertions available in perl v5.30 cannot be used as the regex engine does not recognize that \1
has a fixed length of one:
/(.)(?<=\1.{1,254})/
results in the same error message Lookbehind longer than 255 not implemented
.
There is another approach mentioned in perlre that can be used for arbitrary look behind assertions. It is explained in detail in “Variable-Length Lookbehinds: actually possible in Perl/PCRE!”. It’s tricky. Basically it is made of a recursive, fixed length look behind assertion stepping backwards that contains a variable length look ahead assertion instead.
Adopting the approach for this task:
use strict;
use warnings;
sub twice_appearance {
shift =~ /
(.) # a single char
(?=(.*)) # capture rest of string
(
\1.+\2 # stop if current char precedes itself
| # or
(?<= # look behind:
(?= # look ahead:
(?3) # recursion: stop or look further behind
)
. # one char
)
)
/x;
$1;
}
See the full solution to task 1.
Task 2: Count Asterisks
Submitted by: Mohammad Sajid Anwar
You are given a string, $str
, where every two consecutive vertical bars are grouped into a pair.
Write a script to return the number of asterisks, *
, excluding any between each pair of vertical bars.
Example 1
Input: $str = "p|*e*rl|w**e|*ekly|"
Ouput: 2
The characters we are looking here are "p" and "w**e".
Example 2
Input: $str = "perl"
Ouput: 0
Example 3
Input: $str = "th|ewe|e**|k|l***ych|alleng|e"
Ouput: 5
The characters we are looking here are "th", "e**", "l***ych" and "e".
Solution
Frankly speaking, it took some time until I had understood this task.
We remove everything between pairs of vertical bars (including the bars) and count the asterisks in the resulting string. A final unpaired bar will be ignored.
use strict;
use warnings;
sub count_asterisks {
shift =~ s/\|[^|]*\|//rg =~ tr/*//;
}
See the full solution to task 2.
If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.