# There Is More Than One Way To Regex
**Challenge 280 solutions in Perl by Matthias Muth**
## Task 1: Twice Appearance
> You are given a string, \$str, containing lowercase English letters only.
> Write a script to print the first letter that appears twice.
>
> Example 1
> Input: \$str = "acbddbca"
> Output: "d"
>
> Example 2
> Input: \$str = "abccd"
> Output: "c"
>
> Example 3
> Input: \$str = "abcdabbb"
> Output: "a"
This is my no-frills-easy-reading solution:
```perl
sub twice_appearance( $str ) {
my %seen;
for ( split "", $str ) {
return $_
if $seen{$_};
$seen{$_} = 1;
}
return ();
}
```
I tried to develop a regex-based solution, but I failed!
I started with this:
```perl
sub twice_appearance_WRONG( $str ) {
return $str =~ /(.).*?\g1/ ? $1 : ();
}
```
But this doesn't work, because it finds 'the first letter that is repeated later on', not 'the first letter that is a duplicate of a letter that occurred before'. In Example 1 ("acbddbca") it finds 'a', because it tries 'a' first, but it should find 'd', because that is the first 'duplicating' letter (the first 'second letter', if you will).
Then I tried a solution that captures any 'second' letter, and then checks with a lookbehind that that letter appears before:
```perlin the string
sub twice_appearance_LOOK_BEHIND_NO_GO( $str ) {
return $str =~ /(.)(?
But anyway, it aborts with an error
'Lookbehind longer than 255 not implemented ...'.
I gave up.
If anyone has a regex-based solution for this challenge task,
please post it in
[The Weekly Challenge - Perl & Raku group on Facebook](https://www.facebook.com/groups/theweeklychallengegroup/) or send me an [email](mailto:matthias.muth@gmx.de)!
## Task 2: Count Asterisks
> You are given a string, \$str, where every two consecutive vertical bars are grouped into a pair.
> Write a script to return the number of asterisks, \*, excluding any between each pair of vertical bars.
>
> Example 1
> Input: \$str = "p|\*e\*rl|w\*\*e|\*ekly|"
> Ouput: 2
> The characters we are looking here are "p" and "w\*\*e".
>
> Example 2
> Input: \$str = "perl"
> Ouput: 0
>
> Example 3
> Input: \$str = "th|ewe|e\*\*|k|l\*\*\*ych|alleng|e"
> Ouput: 5
> The characters we are looking here are "th", "e\*\*", "l\*\*\*ych" and "e".
##### Single regex version
I started with a single regex solution, which is, sorry for that, not very easy-to-read:
```perl
sub count_asterisks_single_regex( $str ) {
return scalar( () = $str =~ /\G(?:\|[^|]*\||[^*])*+\*/g );
}
```
What???
Ok, here is what it does, and what it uses.
Let's first add the `x` modifier to better see the pieces:
```perl
return scalar( () = $str =~ / \G (?: \| [^|]* \| | [^*] )*+ \* /xg );
```
Aha. So we loop over the string with the `g` modifier to find all occurrences of `\*` (at the end of the regex). And we use `\G` to always continue where we left off.
We skip over everything that we don't want:
- pairs of vertical bars and anything that is not a vertical bar in between:
`\| [^|]* \|`
- anything that is not an asterisk:
`[^*]`
We want to skip as many of both of these as we can,
so we group them together as alternatives, and add a `*` quantifier.
Actually we use a `*+` ('possessive') quantifier
that keeps the regex engine from backtracking
once it finds a pair of vertical bars.
This inhibits retrying a vertical bar using the `[^*]` part
to find a `*` earlier (which then would also match *within* vertical bar pairs).
What else?
The regex delivers all matches, but we only want a count of the matches.
We get the count using a not so well-known property of the list assignment operator: It returns the number of elements of the *right hand side* of the assignment in scalar context. And it does so no matter what the left hand side is. So this:
```perl
scalar( () = ( ) )
```
has become a programming idiom in Perl to return the number of elements in a list *without assigning the list to an array variable first*.
Good for a one-liner!
(See also [this useful stackoverflow article](https://stackoverflow.com/questions/2225460/how-do-i-find-the-number-of-values-in-a-perl-list).)
##### Two regex version: more easy-to-read
My second solution uses two regexes:
- one to remove all vertical bar pairs,
- and another one to find all asterisks.
I guess it's much easier to read, especially with some parentheses added to help with understanding the operator grouping:
```perl
sub count_asterisks_two_regexes( $str ) {
return scalar( () = ( $str =~ s/ \| [^|]* \| //xgr ) =~ / \* /xg );
}
```
##### One regex and `tr`: my favorite (and shortest!) solution
What I described so far helped me to arrive at my favorite solution.
It is actually the shortest one, and I think it's the most readable.
It uses
- one regex to remove vertical bar pairs (as above),
- the `tr` operator to count the asterisks, by replacing them by - wait a minute - *asterisks*.
The `tr` operator returns the number of characters that it replaced, so what more could we want?
Here we go:
```perl
sub count_asterisks( $str ) {
return ( $str =~ s/ \| [^|]* \| //xgr ) =~ tr/*/*/;
}
```
This was an exercise in evolutionary programming... :-)
#### **Thank you for the challenge!**