diff options
| -rw-r--r-- | challenge-280/matthias-muth/README.md | 225 | ||||
| -rw-r--r-- | challenge-280/matthias-muth/blog.txt | 1 | ||||
| -rwxr-xr-x | challenge-280/matthias-muth/perl/ch-1.pl | 38 | ||||
| -rwxr-xr-x | challenge-280/matthias-muth/perl/ch-2.pl | 69 |
4 files changed, 233 insertions, 100 deletions
diff --git a/challenge-280/matthias-muth/README.md b/challenge-280/matthias-muth/README.md index 5df0c1440c..89dfb95dcb 100644 --- a/challenge-280/matthias-muth/README.md +++ b/challenge-280/matthias-muth/README.md @@ -1,148 +1,173 @@ -# More Weight for Examples! +# There Is More Than One Way To Regex -**Challenge 279 solutions in Perl by Matthias Muth** +**Challenge 280 solutions in Perl by Matthias Muth** -## Task 1: Sort Letters +## Task 1: Twice Appearance -> You are given two arrays, @letters and @weights.<br/> -> Write a script to sort the given array @letters based on the @weights.<br/> +> You are given a string, \$str, containing lowercase English letters only.<br/> +> Write a script to print the first letter that appears twice.<br/> > <br/> > Example 1<br/> -> Input: @letters = ('R', 'E', 'P', 'L')<br/> -> @weights = (3, 2, 1, 4)<br/> -> Output: PERL<br/> +> Input: \$str = "acbddbca"<br/> +> Output: "d"<br/> > <br/> > Example 2<br/> -> Input: @letters = ('A', 'U', 'R', 'K')<br/> -> @weights = (2, 4, 1, 3)<br/> -> Output: RAKU<br/> +> Input: \$str = "abccd"<br/> +> Output: "c"<br/> > <br/> > Example 3<br/> -> Input: @letters = ('O', 'H', 'Y', 'N', 'P', 'T')<br/> -> @weights = (5, 4, 2, 6, 1, 3)<br/> -> Output: PYTHON<br/> +> Input: \$str = "abcdabbb"<br/> +> Output: "a"<br/> -Probably the most efficient way to solve this task is *not* to use sort, -but to directly assign letters to their positions in the result string. -This works for all the examples, -because in these examples, the `@weights` are very regular: +This is my no-frills-easy-reading solution: -- they can be mapped one-by-one to result positions, -- no weight appears twice, -- the weights cover all positions completely. +```perl +sub twice_appearance( $str ) { + my %seen; + for ( split "", $str ) { + return $_ + if $seen{$_}; + $seen{$_} = 1; + } + return (); +} +``` -Actually, at least for the examples, `@weights` should be called `@positions` -(minding that they are one-based, not zero-based). -So here's the simple version, to cover the examples.<br/> -We need to initialize the result string to have the correct length, -because we will assign letters to positions in random order. -```perl -use v5.36; +I tried to develop a regex-based solution, but I failed!<br/> +I started with this: -sub sort_letters_1( $letters, $weights ) { - my $result = " " x $letters->@*; - substr $result, $weights->[$_] - 1, 1, $letters->[$_] - for 0..$letters->$#*; - return $result; +```perl +sub twice_appearance_WRONG( $str ) { + return $str =~ /(.).*?\g1/ ? $1 : (); } ``` +But this doesn't work, because it finds 'the first letter that is repeated later on', not 'the first letter that is a duplicate of a letter that occurred before'. In Example 1 ("acbddbca") it finds 'a', because it tries 'a' first, but it should find 'd', because that is the first 'duplicating' letter (the first 'second letter', if you will). + +Then I tried a solution that captures any 'second' letter, and then checks with a lookbehind that that letter appears before: +```perlin the string +sub twice_appearance_LOOK_BEHIND_NO_GO( $str ) { + return $str =~ /(.)(?<!^.*\g1.*)/ ? $1 : (); +} +``` -**What if the weights were not as regular as they are in the examples?** +I know that if this worked, it would be incredibly slow.<br/> +But anyway, it aborts with an error +'Lookbehind longer than 255 not implemented ...'. -We should consider cases where weights do *not* map one-by-one to positions.<br/> -Let's put some 'heavier weights' in, and consider this additional example: +I gave up.<br/> +If anyone has a regex-based solution for this challenge task, +please post it in +[The Weekly Challenge - Perl & Raku group on Facebook](https://www.facebook.com/groups/theweeklychallengegroup/) or send me an [email](mailto:matthias.muth@gmx.de)! -> Example 4<br/> -> Input: @letters = ('R', 'E', 'P', 'L')<br/> -> @weights = (3333, 2222, 1111, 3333)<br/> -> Output: PERL<br/> - Here: -* the lowest weight is not 1, -* weights are not consecutive (they contain gaps), -* same weight values are used more than once for different letters<br/>(behavior is not defined in this case, but we should do something useful), +## Task 2: Count Asterisks -* the weight values are not necessarily small integers<br/>(which can cause memory problems when we incautiously map weight values to string positions or array indexes). +> You are given a string, \$str, where every two consecutive vertical bars are grouped into a pair.<br/> +> Write a script to return the number of asterisks, \*, excluding any between each pair of vertical bars.<br/> +> <br/> +> Example 1<br/> +> Input: \$str = "p|\*e\*rl|w\*\*e|\*ekly|"<br/> +> Ouput: 2<br/> +> The characters we are looking here are "p" and "w\*\*e".<br/> +> <br/> +> Example 2<br/> +> Input: \$str = "perl"<br/> +> Ouput: 0<br/> +> <br/> +> Example 3<br/> +> Input: \$str = "th|ewe|e\*\*|k|l\*\*\*ych|alleng|e"<br/> +> Ouput: 5<br/> +> The characters we are looking here are "th", "e\*\*", "l\*\*\*ych" and "e".<br/> -So what do we do? +##### Single regex version -My ideas are these: +I started with a single regex solution, which is, sorry for that, not very easy-to-read: -* Use the weight values as hash keys instead of string positions or array indexes to store where any letter is going to be put.<br/> - This addresses both the 'not consecutive' and the 'no small integers' issues. - We can have a weight of 6548632 without running out of bounds as we would with a string or an array. -* Store a *list* of letters with each 'weight' hash key.<br/>This deals with the case of multiple letters having the same weight value. - Every hash entry will contain an array-ref to a list of all letters having that weight. +```perl +sub count_asterisks_single_regex( $str ) { + return scalar( () = $str =~ /\G(?:\|[^|]*\||[^*])*+\*/g ); +} +``` -I'm also happy to highlight the `for_list` Perl feature -that was added in Perl 5.36 for iterating over multiple values at a time. -Especially its use together with the `mesh` function from `List::Util` -makes some things simple and nice.<br/> -Where normally I would have to iterate over `0..$#array`, -I can avoid this here, as well as the use of `$_` in the loop.<br/> -Certainly less 'perlish', but easy for the eyes! +What??? -To get the result string from the hash, we sort the hash keys (numerically!) -and concatenate all letters from their entries in order.<br/> -Like this: +Ok, here is what it does, and what it uses.<br/>Let's first add the `x` modifier to better see the pieces: ```perl -use v5.36; -no warnings 'experimental::for_list'; -use List::Util qw( mesh ); - -sub sort_letters( $letters, $weights ) { - my %buckets; - for my ( $letter, $weight ) ( mesh $letters, $weights ) { - push $buckets{$weight}->@*, $letter; - } - return join "", map $buckets{$_}->@*, sort { $a <=> $b } keys %buckets; -} + return scalar( () = $str =~ / \G (?: \| [^|]* \| | [^*] )*+ \* /xg ); ``` +Aha. So we loop over the string with the `g` modifier to find all occurrences of `\*` (at the end of the regex). And we use `\G` to always continue where we left off. +We skip over everything that we don't want: -## Task 2: Split String +- pairs of vertical bars and anything that is not a vertical bar in between:<br/>`\| [^|]* \|` -> You are given a string, \$str.<br/> -> Write a script to split the given string into two containing exactly same number of vowels and return true if you can otherwise false.<br/> -> <br/> -> Example 1<br/> -> Input: \$str = "perl"<br/> -> Ouput: false<br/> -> <br/> -> Example 2<br/> -> Input: \$str = "book"<br/> -> Ouput: true<br/> -> Two possible strings "bo" and "ok" containing exactly one vowel each.<br/> -> <br/> -> Example 3<br/> -> Input: \$str = "good morning"<br/> -> Ouput: true<br/> -> Two possible strings "good " and "morning" containing two vowels each or "good m" and "orning" containing two vowels each.<br/> +- anything that is not an asterisk:<br/>`[^*]` + +We want to skip as many of both of these as we can, +so we group them together as alternatives, and add a `*` quantifier. + +Actually we use a `*+` ('possessive') quantifier +that keeps the regex engine from backtracking +once it finds a pair of vertical bars. +This inhibits retrying a vertical bar using the `[^*]` part +to find a `*` earlier (which then would also match *within* vertical bar pairs). + +What else? -Actually the task assignment 'split the given string into two containing exactly same number of vowels' can be completely ignored. The result is never used. It's like a piece of 'dead code' that never gets executed and that can be removed. +The regex delivers all matches, but we only want a count of the matches.<br/>We get the count using a not so well-known property of the list assignment operator: It returns the number of elements of the *right hand side* of the assignment in scalar context. And it does so no matter what the left hand side is. So this: -The only information we need to return is whether *we can* split the string into two such pieces! +```perl +scalar( () = ( <list> ) ) +``` + +has become a programming idiom in Perl to return the number of elements in a list *without assigning the list to an array variable first*.<br/>Good for a one-liner!<br/> +(See also [this useful stackoverflow article](https://stackoverflow.com/questions/2225460/how-do-i-find-the-number-of-values-in-a-perl-list).) -We can do so if and only if we can split up the string's *vowels* into two equal pieces.<br/> -Which means we need an even number of vowels. +##### Two regex version: more easy-to-read -Ok, then let's count the vowels! -And the rest is easy... +My second solution uses two regexes: + +- one to remove all vertical bar pairs, +- and another one to find all asterisks. + +I guess it's much easier to read, especially with some parentheses added to help with understanding the operator grouping: ```perl -use v5.36; +sub count_asterisks_two_regexes( $str ) { + return scalar( () = ( $str =~ s/ \| [^|]* \| //xgr ) =~ / \* /xg ); +} +``` + +##### One regex and `tr`: my favorite (and shortest!) solution + +What I described so far helped me to arrive at my favorite solution.<br/> +It is actually the shortest one, and I think it's the most readable. + +It uses + +- one regex to remove vertical bar pairs (as above), +- the `tr` operator to count the asterisks, by replacing them by - wait a minute - *asterisks*. + +The `tr` operator returns the number of characters that it replaced, so what more could we want? -sub split_string( $str ) { - my @vowels = $str =~ /[aeiou]/ig; - return scalar @vowels % 2 == 0; +Here we go: + +```perl +sub count_asterisks( $str ) { + return ( $str =~ s/ \| [^|]* \| //xgr ) =~ tr/*/*/; } ``` +This was an exercise in evolutionary programming... :-) + + + #### **Thank you for the challenge!** + diff --git a/challenge-280/matthias-muth/blog.txt b/challenge-280/matthias-muth/blog.txt new file mode 100644 index 0000000000..60ba8c61d5 --- /dev/null +++ b/challenge-280/matthias-muth/blog.txt @@ -0,0 +1 @@ +https://github.com/MatthiasMuth/perlweeklychallenge-club/tree/muthm-280/challenge-280/matthias-muth#readme diff --git a/challenge-280/matthias-muth/perl/ch-1.pl b/challenge-280/matthias-muth/perl/ch-1.pl new file mode 100755 index 0000000000..e19c337656 --- /dev/null +++ b/challenge-280/matthias-muth/perl/ch-1.pl @@ -0,0 +1,38 @@ +#!/usr/bin/env perl +# +# The Weekly Challenge - Perl & Raku +# (https://theweeklychallenge.org) +# +# Challenge 280 Task 1: Twice Appearance +# +# Perl solution by Matthias Muth. +# + +use v5.36; + +sub twice_appearance_WRONG( $str ) { + return $str =~ /(.).*?\g1/ ? $1 : ""; +} + +sub twice_appearance( $str ) { + my %seen; + for ( split "", $str ) { + return $_ + if $seen{$_}; + $seen{$_} = 1; + } + return (); +} + +sub twice_appearance_WRONG( $str ) { + return $str =~ /(.)(?<!^.*\g1.*)/ ? $1 : ""; +} + +use Test2::V0 qw( -no_srand ); +is twice_appearance( "acbddbca" ), "d", + 'Example 1: twice_appearance( "acbddbca" ) == "d"'; +is twice_appearance( "abccd" ), "c", + 'Example 2: twice_appearance( "abccd" ) == "c"'; +is twice_appearance( "abcdabbb" ), "a", + 'Example 3: twice_appearance( "abcdabbb" ) == "a"'; +done_testing; diff --git a/challenge-280/matthias-muth/perl/ch-2.pl b/challenge-280/matthias-muth/perl/ch-2.pl new file mode 100755 index 0000000000..392a6c978f --- /dev/null +++ b/challenge-280/matthias-muth/perl/ch-2.pl @@ -0,0 +1,69 @@ +#!/usr/bin/env perl +# +# The Weekly Challenge - Perl & Raku +# (https://theweeklychallenge.org) +# +# Challenge 280 Task 2: Count Asterisks +# +# Perl solution by Matthias Muth. +# + +use v5.36; + +# My 'single regex' version. +# Use a list assignment to an empty list in scalar context +# to get the number of elements returned by the regex match operator. +# (see here: +# https://stackoverflow.com/questions/2225460/how-do-i-find-the-number-of-values-in-a-perl-list). +sub count_asterisks_single_regex( $str ) { + return scalar( () = $str =~ /\G(?:\|[^|]*\||[^*])*+\*/g ); +} + +sub count_asterisks_single_regex_x( $str ) { + return scalar( () = $str =~ / \G (?: \| [^|]* \| | [^*] )*+ \* /xg ); +} + +sub count_asterisks_verbose( $str ) { + my $bar = qr( \| )x; + my $no_bar = qr( [^|] )x; + my $bar_pair = qr( $bar ${no_bar}* $bar )x; + my $wanted = qr( \* )x; + my $not_wanted = qr( [^*] )x; + my $skip = qr( (?: ${bar_pair} | ${not_wanted} )*+ )x; + my $next_wanted = qr( ${skip} ${wanted} )x; + + # while ( $str =~ /\G $next_wanted /xg ) { + # say "match '$&'"; + # } + return scalar( () = $str =~ /\G $next_wanted /xg ); +} + +# My 'two regexes' version. +sub count_asterisks_two_regexes( $str ) { + return scalar( () = ( $str =~ s/ \| [^|]* \| //xgr ) =~ / \* /xg ); +} + +# My favorite (and shortest) solution, which looks quite readable. +# Remove pairs of '|' using the 'r' flag of s/// to return what is left, +# then let an identity 'tr' count the asterisks. +# Using redundant parentheses to show the order of execution. +sub count_asterisks( $str ) { + return ( $str =~ s/ \| [^|]* \| //xgr ) =~ tr/*/*/; +} + + +use Test2::V0 qw( -no_srand ); + +my $sub_name = "count_asterisks"; +for my $sub ( sort grep /^${sub_name}/, keys %:: ) { + note "Testing $sub:"; + + no strict 'refs'; + is $sub->( "p|*e*rl|w**e|*ekly|" ), 2, + 'Example 1: count_asterisks( "p|*e*rl|w**e|*ekly|" ) == 2'; + is $sub->( "perl" ), 0, + 'Example 2: count_asterisks( "perl" ) == 0'; + is $sub->( "th|ewe|e**|k|l***ych|alleng|e" ), 5, + 'Example 3: count_asterisks( "th|ewe|e**|k|l***ych|alleng|e" ) == 5'; +} +done_testing; |
