# There Is More Than One Way To Regex **Challenge 280 solutions in Perl by Matthias Muth** ## Task 1: Twice Appearance > You are given a string, \$str, containing lowercase English letters only.
> Write a script to print the first letter that appears twice.
>
> Example 1
> Input: \$str = "acbddbca"
> Output: "d"
>
> Example 2
> Input: \$str = "abccd"
> Output: "c"
>
> Example 3
> Input: \$str = "abcdabbb"
> Output: "a"
This is my no-frills-easy-reading solution: ```perl sub twice_appearance( $str ) { my %seen; for ( split "", $str ) { return $_ if $seen{$_}; $seen{$_} = 1; } return (); } ``` I tried to develop a regex-based solution, but I failed!
I started with this: ```perl sub twice_appearance_WRONG( $str ) { return $str =~ /(.).*?\g1/ ? $1 : (); } ``` But this doesn't work, because it finds 'the first letter that is repeated later on', not 'the first letter that is a duplicate of a letter that occurred before'. In Example 1 ("acbddbca") it finds 'a', because it tries 'a' first, but it should find 'd', because that is the first 'duplicating' letter (the first 'second letter', if you will). Then I tried a solution that captures any 'second' letter, and then checks with a lookbehind that that letter appears before: ```perlin the string sub twice_appearance_LOOK_BEHIND_NO_GO( $str ) { return $str =~ /(.)(? But anyway, it aborts with an error 'Lookbehind longer than 255 not implemented ...'. I gave up.
If anyone has a regex-based solution for this challenge task, please post it in [The Weekly Challenge - Perl & Raku group on Facebook](https://www.facebook.com/groups/theweeklychallengegroup/) or send me an [email](mailto:matthias.muth@gmx.de)! ## Task 2: Count Asterisks > You are given a string, \$str, where every two consecutive vertical bars are grouped into a pair.
> Write a script to return the number of asterisks, \*, excluding any between each pair of vertical bars.
>
> Example 1
> Input: \$str = "p|\*e\*rl|w\*\*e|\*ekly|"
> Ouput: 2
> The characters we are looking here are "p" and "w\*\*e".
>
> Example 2
> Input: \$str = "perl"
> Ouput: 0
>
> Example 3
> Input: \$str = "th|ewe|e\*\*|k|l\*\*\*ych|alleng|e"
> Ouput: 5
> The characters we are looking here are "th", "e\*\*", "l\*\*\*ych" and "e".
##### Single regex version I started with a single regex solution, which is, sorry for that, not very easy-to-read: ```perl sub count_asterisks_single_regex( $str ) { return scalar( () = $str =~ /\G(?:\|[^|]*\||[^*])*+\*/g ); } ``` What??? Ok, here is what it does, and what it uses.
Let's first add the `x` modifier to better see the pieces: ```perl return scalar( () = $str =~ / \G (?: \| [^|]* \| | [^*] )*+ \* /xg ); ``` Aha. So we loop over the string with the `g` modifier to find all occurrences of `\*` (at the end of the regex). And we use `\G` to always continue where we left off. We skip over everything that we don't want: - pairs of vertical bars and anything that is not a vertical bar in between:
`\| [^|]* \|` - anything that is not an asterisk:
`[^*]` We want to skip as many of both of these as we can, so we group them together as alternatives, and add a `*` quantifier. Actually we use a `*+` ('possessive') quantifier that keeps the regex engine from backtracking once it finds a pair of vertical bars. This inhibits retrying a vertical bar using the `[^*]` part to find a `*` earlier (which then would also match *within* vertical bar pairs). What else? The regex delivers all matches, but we only want a count of the matches.
We get the count using a not so well-known property of the list assignment operator: It returns the number of elements of the *right hand side* of the assignment in scalar context. And it does so no matter what the left hand side is. So this: ```perl scalar( () = ( ) ) ``` has become a programming idiom in Perl to return the number of elements in a list *without assigning the list to an array variable first*.
Good for a one-liner!
(See also [this useful stackoverflow article](https://stackoverflow.com/questions/2225460/how-do-i-find-the-number-of-values-in-a-perl-list).) ##### Two regex version: more easy-to-read My second solution uses two regexes: - one to remove all vertical bar pairs, - and another one to find all asterisks. I guess it's much easier to read, especially with some parentheses added to help with understanding the operator grouping: ```perl sub count_asterisks_two_regexes( $str ) { return scalar( () = ( $str =~ s/ \| [^|]* \| //xgr ) =~ / \* /xg ); } ``` ##### One regex and `tr`: my favorite (and shortest!) solution What I described so far helped me to arrive at my favorite solution.
It is actually the shortest one, and I think it's the most readable. It uses - one regex to remove vertical bar pairs (as above), - the `tr` operator to count the asterisks, by replacing them by - wait a minute - *asterisks*. The `tr` operator returns the number of characters that it replaced, so what more could we want? Here we go: ```perl sub count_asterisks( $str ) { return ( $str =~ s/ \| [^|]* \| //xgr ) =~ tr/*/*/; } ``` This was an exercise in evolutionary programming... :-) #### **Thank you for the challenge!**