Merge pull request #4077 from drbaggy/master

Finished the readme... blog page...
author: Mohammad S Anwar <Mohammad.Anwar@yahoo.com> 2021-05-14 14:28:21 +0100
committer: GitHub <noreply@github.com> 2021-05-14 14:28:21 +0100
commit: 2023334b99b3c28e54d5dd17f59b4496a0370867 (patch)
tree: 443d0cc2d791a2a219b4348125c185d6f2f5848f /challenge-112
parent: 15502017ba3d59e96e00453936f6ea9d075f00ae (diff)
parent: 9fb1f7c6553ec7338e3e86ed0c35818d74f8b421 (diff)
download: perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.tar.gz
perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.tar.bz2
perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.zip
4 files changed, 500 insertions, 506 deletions
diff --git a/challenge-112/james-smith/README.md b/challenge-112/james-smith/README.md
index 71e2905051..3bc69d2fb2 100644
--- a/challenge-112/james-smith/README.md
+++ b/challenge-112/james-smith/README.md
@@ -1,4 +1,4 @@
-# Perl Weekly Challenge #111
+# Perl Weekly Challenge #112
 
 You can find more information about this weeks, and previous weeks challenges at:
 
@@ -8,592 +8,557 @@ If you are not already doing the challenge - it is a good place to practise your
 **perl** or **raku**. If it is not **perl** or **raku** you develop in - you can
 submit solutions in whichever language you feel comfortable with.
 
-# Challenge 1 - Search Matrix
+# Challenge 1 - Canonical Path
 
-**You are given 5x5 matrix filled with integers such that each row is sorted from left to
-right and the first integer of each row is greater than the last integer of the previous
-row.**
+**You are given a string path, starting with a slash ‘/'. Write a script to
+convert the given absolute path to the simplified canonical path.**
 
-**Write a script to find a given integer in the matrix using an efficient search algorithm.**
+In a Unix-style file system:
 
-## Solution to challenge 1
+ * A period '.' refers to the current directory
+ * A double period '..' refers to the directory up a level
+ * Multiple consecutive slashes ('//') are treated as a single slash '/'
 
-Today's is an interesting challenge ... to check to see if a value is in the matrix....
+The canonical path format:
 
-Initially this looks like we need to define some efficient sorting algorithm that flattens
-the array and then searches it {e.g. binary search}.
+ * The path starts with a single slash '/'.
+ * Any two directories are separated by a single slash '/'.
+ * The path does not end with a trailing '/'.
+ * The path only contains the directories on the path from the root directory to the target file or directory
 
-We implement this... but we compare it against some alternative simple methods. We don't
-get the result we expect...
+## Note....
 
-## Binary search
+Please note there is an ambiguity in the question - when then path contains no
+files - as it cannot start with a '/' and not end with a '/' - so we have
+to make a choice do we return '/' or do we return ''.
 
-We first flatten the array. Then we loop until the array is empty or it has a solitary value
+In our case we decide to return it as the empty string.
+This has the advantage that there is a level of consistency if you do...
 
-  If we get a match with the "middle" value we return 1;
-  Otherwise we remove the half of the list using `splice` which
-  the number isn't in.
-  
-We return 0 if the array has length 0. If the length is 1 we return
-1 or 0 depending on whether the array is search value or not.
+$parent_dir.canonical_path('/a');
 
-```perl
-sub find_val_search {
-  my( $val, $m, @list ) = ( $_[0], 0, map { @{$_} } @{$_[1]} );
-
-    $list[ $m = @list >> 1 ] == $val ? ( return 1              )
-  : $list[ $m              ] >  $val ? ( splice @list, $m      )
-  :                                    ( splice @list, 0, $m+1 )
-    while @list>1;
-  return @list && $list[0] == $val ? 1 : 0;
-}
-```
+or
 
-## The other (better) methods:
+$parent_dir.canonical_path('/');
 
-### grep on the flattened array
+then it will always end without a "/";
 
-```perl
-sub find_val_grep_map {
-  my($v,$m)=@_;
-  return 0 + grep { $_ == $v } map { @{$_} } @{$m};
-}
-```
+To change the value the functions return you can replace the return
+statement with either `return q(/). join q(/), @list` or
+`return $str || q(/)`, depending on whether or not the function
+stores the path elements in an array or a string.
 
-### reducing array with grep and then combining with map
+## Solution to challenge 1
 
-Uses `grep` on each sub array to return either an
-empty array or an array containing the match.
-We then use `grep` to further filter these results
-for existances of the array.
+Again another interesting challenge ... we can see if we can improve
+performance.
 
-The resultant array will have length `0` or `1`.
+Initially it looks quite complex - there are two solutions classes:
 
-```perl
-sub find_val_grep_grep {
-  my($v,$m)=@_;
-  return 0 + grep { grep { $_ == $v } @{$_} } @{$m};
-}
-```
+ * splitting the string and creating/modifying an array of the
+   individual parts
 
-Now to avoid as many of the `grep`s as possible we can do two things. First check the
-number is in the range of the matrix - this skips the outer `grep` for some values and
-secondly - check the value is in the range for each line you are looping over.... This
-leads us to:
+ * splitting the string and creating/modifying a string
 
-```perl
-sub find_val_grep_grep_ext {
-  my($v,$m)=@_;
-  return $v<$m->[0][0] || $v > $m->[-1][-1]
-       ? 0
-       : 0 + grep { $v>=$_->[0] && $v<=$_->[-1] && grep { $_ == $v } @{$_} } @{$m};
-}
-```
-## Efficiency
+## "Expanded perl code"
 
-We use `cmpthese` to compare the performance...
+### Array - two loops...
 
-Our methods are:
-  * **Search**:   `find_val_search`
-  * **GrepMap**:  `find_val_grep_map`
-  * **GrepGrep**: `find_val_grep_grep`
-  * **GrepExt**: `find_val_grep_grep_ext`
-  * **Flatten**:  just flattens the array - basic benchmark to compare efficiency to that function....
+ * We first split the directory into path parts and remove any that
+   are empty or "`.`".
 
-Timings using `Benchmark::cmpthese`
+ * We loop through the array until we find a '`..`' if we do we
+   remove it and the previous entry.
 
-|           |   Rate    | Search | GrepMap | GrepGrep | *Flatten* | GrepExt |
-| --------- | --------: | -----: | ------: | -------: | --------: | ------: |
-| Search    |  4,857/s  |    --  |   -3%   |    -34%  |  *-42%*   |  -68%   |
-| GrepMap   |  4,995/s  |    3%  |    --   |    -32%  |  *-40%*   |  -67%   |
-| GrepGrep  |  7,391/s  |   52%  |   48%   |      --  |  *-11%*   |  -52%   |
-| *Flatten* | *8,326/s* |  *71%* |  *67%*  |    *13%* |   *--*    | *-46%*  |
-| GrepExt   | 15,337/s  |  216%  |  207%   |    108%  |   *84%*   |   --    |
+ * We then repeat this until we don't find a '`..`'
 
+ * To jump out of the loop we use `next "label"` to not just skip out
+   of the inner loop, but to also to restart the parent loop at the
+   same time.
 
-Notes:
+ * Finally remove an initial "`..`" which wouldn't get removed by this
+   algorithm.
 
-  *  *Flatten* is for comparison only - it actually does nothing other than flatten
-     the array - this highlights how efficient each algorithm is (and can be)
- 
-  * So we see that the grep_grep solution is nearly 60% more efficient than the search
-    algorithm (this is true for all search method algorithms which flatten
-    the array first), add the filtering to the outer grep - this becomes faster than
-    flatten - and around 3 times the speed of search.
+ * and join the array together with '`/`' - we add the `''` so that we
+   get the leading '`/`'.
 
-## Not flattening array
-
-As we have a limit on performance with the flattening operation. To
-improve efficiency we will need to consider a different approach: **Which
-therefore must not flatten!**
-
-### Do not flatten - find the row containing the number
+```perl
+sub canonical_path_double {
+  my $directory_path = shift;
+  my @directory_names   = grep { $_ ne '' &&
+                                 $_ ne '.' }
+                          split m{/},
+                          $directory_path;
+
+  OUTER: while(1) {
+    foreach (1..$#directory_names) {
+      next unless $directory_names[$_] eq '..';
+      splice @directory_names,$_-1,2;
+      next OUTER;
+    }
+    last;
+  }
+  shift @directory_names if @directory_names && $directory_names[0] eq '..';
+  return join '/','',@directory_names;
+}
+```
 
- * Rather than using `map`/`grep` on the outer array, we instead use
-   a "hard coded" binary search.
+### Array - 1-loop with backtracking
 
- * First we return 0 if the number is outside the range of numbers in
-   the list.
+To avoid the double loop - we can use back-tracking so that when we
+find a "`..`" instead of restarting the loop we just step back one
+entry in the array - we see `$pointer--` after the `splice`
+statement below.
 
- * Then we look to see if the number is below the first value of the
-   4th row. This means it is in rows 1, 2 or 3. So we test for that.
-   Otherwise it is in rows 4 or 5, so we can test that as well.
- 
- * This function expression returns which row the number *could be in*
+ * If `$pointer == 0` then the `..` is at the start of the directory
+   list so we just remove it (we can't go back beyond the root
+   directory).
 
 ```perl
-   $v < $m->[3][0]
- ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 )
- : ( $v < $m->[4][0] ? 3 : 4                       )
+sub canonical_path_array {
+  my $directory_path = shift;
+  my @directory_names   = grep { $_ ne '' &&
+                                 $_ ne '.' }
+                          split m{/},
+                          $directory_path;
+  my $pointer = 1;
+  while( $pointer < @directory_names ) {
+    if( $directory_names[$pointer] eq '..' ) {
+      if( $pointer > 0 ) {
+        splice @directory_names, $pointer - 1, 2;
+        $pointer --;
+      } else {
+        shift @directory_names;
+      }
+    } else {
+      $pointer ++;
+    }
+  }
+  return join '/','', @directory_names;
+}
 ```
 
-This leads the whole function to be:
+### String - 1-loop
 
-```perl
-sub find_val_dnf {
-  my($v,$m) = @_;
-  return $v < $m->[0][0] || $v > $m->[4][4]
-       ? 0
-       : 0 + grep { $v == $_ } @{$m->[ $v < $m->[3][0]
-           ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 )
-           : ( $v < $m->[4][0] ? 3 : 4                       )
-         ]};
-}
-```
+Rather than store the parts in a list - we use a string to store
+this canonical path - and we either add directories to the end of
+it or remove them if we come across a "`..`".
 
-### Do not flatten - optimal solution
+  * We achieve the former - by just concatenating a "`/`" and the
+    name to the end of the string.
 
- * As well as not using `map` or `grep` on the outside array we
-   do so on the inner array.
+  * The latter we strip this string off with a regex substitution:
+    `s{/[^/]+\Z}{}`. This works in all cases wherever the "`..`"
+    is in the list.
 
- * Once we have chosen the correct row then we split it into
-   two halves and check for a match in each half depending on
-   whether it matches the middle value or is above/below.
+  * Note as we are looping through the array we can ignore the
+    grep and just skip out of the loop if the name is either ""
+    or "`.`".
 
 ```perl
-sub find_val_dnf_optimal {
-  my($v,$m,$t) = @_;
-
-  return $v < $m->[0][0] || $v > $m->[4][4]
-       ? 0
-       : ( $t = $m->[ $v < $m->[3][0]
-           ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 )
-           : ( $v < $m->[4][0] ? 3 : 4                       )
-         ] ) &&
-         ( return $v == $t->[2] ? 1 :
-                  $v < $t->[2] ?
-                  ( $v == $t->[0] || $v == $t->[1] ? 1 : 0) :
-                  ( $v == $t->[4] || $v == $t->[3] ? 1 : 0) );
-}
+sub canonical_path_string {
+  my $path = shift;
+  my @directories    = split m{/}, $path;
+  my $canonical_path = '';
+  foreach my $directory_name ( @directories ) {
+    next if $directory_name eq '';
+    next if $directory_name eq '.';
+    if( $directory_name eq q(..) ) {
+      $canonical_path =~ s{/[^/]+\Z}{};
+    } else {
+      $canonical_path .= q(/) . $directory_name;
+    }
+  }
+  return $canonical_path;
+}  
 ```
+### String fast - 1-loop
 
-### Do not flatten - generalised solution
+Regexs are not the fastest way to perform simple matches strings
+(and intern to modify them). We can speed up the trimming of the
+canonical path by replacing the regex solution by using `rindex`
+and the four-parameter version of `substr`.
 
- * Above we used the information that we are supplied with a 5x5 matrix
-   if the matrix is an arbitrary size we can change the two loops to
-   use a binary search algorithm.
-   
- * Notes:
+  * `rindex $str, $needle` finds the index of `$needle` in `$str`.
+    Here we use it to find the last `/` in string.
 
-    * We use `($l+$r)>>1` to find the mid point - this is more efficient
-      than using `int(($l+$r)/2)`; Bit-shift operators are useful if you
-      are multiplying or dividing by 2.
+  * `substr $str, $offset, $length, $substitute` finds the chunk of
+    the string `$str` from `$offset` of given length `$length`. If a
+    fourth parameter is set then this region of the string is replaced
+    by `$substitute`.
 
-    * The loops differ slightly because one is looking for an entry which
-      is within a range - the other for a specific value. The differencs
-      you will note are (1) the first loop do `$r=$n` where in the
-      second we do `$r=$n-1` - as the first is a range based search the
-      second an exact match; (2) The second loop checks for equality before
-      the binary search check - and returns 1 if true. The end eheck also
-      becomes $l<=$r as $r becomes less than $l when we have missed the
-      entry.
+  * We can use this to truncate the string by doing:
+  
+    `substr $path, rindex( $path, '/' ), ~0, '';`
+
+    In the two-parameter version of `substr` if we omit length then
+    this returns to the end of the string. In the four-parameter
+    version - we can't omit this - so have to use an alternative
+    value - it has to be bigger than the longest string possible.
+    We use the "bitwise-negation" operator "`~`" to generate the
+    largest value possible. This is: 18,446,744,073,709,551,615 or
+    just shy of 16 Exabytes - I believe this should be big enough!
+
+    (Note you can use -ve numbers but there is no way of doing `-0`
+    to trim to the end of the line)
+
+The script then becomes:
 
 ```perl
-sub find_val_general_dnf {
-  my($v,$m)=@_;
-  return 0 if$v<$m->[0][0]||$v>$m->[-1][-1];
-  my($n,$l,$r)=(0,0,@{$m}-1);
-  $v>$m->[$n=($l+$r)>>1][-1]?($l=$n+1):($r=$n)while$r!=$l;
-  ($l,$r)=(0,@{$m=$m->[$l]}-1);
-  ($v==$m->[$n=($l+$r)>>1])?(return 1):$v>$m->[$n]?($l=$n+1):($r=$n-1)while$l<=$r;
-  return 0;
-}
+sub canonical_path_string_fast {
+  my $path = shift;
+  my @directories    = split m{/}, $path;
+  my $canonical_path = '';
+  foreach my $directory_name ( @directories ) {
+    next if $directory_name eq '';
+    next if $directory_name eq '.';
+    if( $directory_name eq q(..) ) {
+      substr $canonical_path,
+             rindex( $canonical_path, '/' ),
+             ~0, '';
+    } else {
+      $canonical_path .= q(/) . $directory_name;
+    }
+  }
+  return $canonical_path;
+}  
 ```
-Timings using `Benchmark::cmpthese`
 
-|           |   Rate    | Search | GrepMap | GrepGrep | *Flatten* | DNFGen | GrepExt | DNF    | DNFOpt |
-| --------- | --------: | -----: | ------: | -------: | --------: | -----: | ------: | -----: | -----: |
-| Search    |  4,316/s  |    --  |   -20%  |    -41%  |   *-47%*  |  -64%  |  -71%   |  -79%  |  -81%  |
-| GrepMap   |  5,391/s  |   25%  |     --  |    -26%  |   *-34%*  |  -55%  |  -64%   |  -73%  |  -76%  |
-| GrepGrep  |  7,310/s  |   69%  |    36%  |      --  |   *-10%*  |  -39%  |  -52%   |  -64%  |  -68%  |
-| *Flatten* | *8,123/s* |  *88%* |   *51%* |   *11%*  |     *--*  | *-33%* | *-46%*  | *-60%* | *-65%* |
-| DNFGen    | 12,063/s  |  179%  |   124%  |     65%  |    *48%*  |    --  |  -20%   |  -40%  |  -47%  |
-| GrepExt   | 15,106/s  |  250%  |   180%  |    107%  |    *86%*  |   25%  |    --   |  -25%  |  -34%  |
-| DNF       | 20,161/s  |  367%  |   274%  |    176%  |   *148%*  |   67%  |   33%   |    --  |  -12%  |
-| DNFOpt    | 22,883/s  |  430%  |   324%  |    213%  |   *182%*  |   90%  |   51%   |   14%  |   --   |
+## "Compact perl code" AKA 1-liners..
 
-We can see that this "optimal method" is about 5 times more efficient than the "search"
-function we originally tried - even the generic double binary search is abour 2.5 times as
-efficient than the binary search over the flattened matrix.
+Now we can look at how we can compact this code. Here though we
+need to consider the trade off between size and performance - the
+smallest code is not necessarily the fastest - as some of the
+tricks to make the code compact also make it slower.
 
-## Test script
+### The array code...
 
-For completeness - this is the test and benchmarking script
+We have two versions of the code - which are slightly different
+The `canonical_path_compact_opt` function is probably closer to
+the 1-loop array function above. We use nested-ternaries to replace
+the `if else` blocks.
 
-```perl
-#!/usr/local/bin/perl
-
-use strict;
-
-use warnings;
-use feature qw(say);
-use Test::More;
-use Benchmark qw(cmpthese);
-
-my $N = @ARGV ? $ARGV[0] : 100_000;
-
-my $matrix = [
-  [  1,  2,  3,  5,  7 ],
-  [  9, 11, 15, 19, 20 ],
-  [ 23, 24, 25, 29, 31 ],
-  [ 32, 33, 39, 40, 42 ],
-  [ 45, 47, 48, 49, 50 ],
-];
-
-## Create a test set - numbers from -10 to 60...
-my %TEST_SET = map { $_ => 0 } (my @KEYS = -10..60);
- 
-## Set all to 0, and then iterate through the elements of the matrix
-## and set the numbers in the list to 1....
-
-$TEST_SET{$_} = 1 foreach map { @{$_} } @{$matrix};
-
-## Run the original PWC test examples...
-is( find_val_search(      35, $matrix ), 0 );
-is( find_val_search(      39, $matrix ), 1 );
-
-is( find_val_grep_grep(    35, $matrix ), 0 );
-is( find_val_grep_grep(    39, $matrix ), 1 );
-#...
-#...
-#...
-is( find_val_find_val_dnf_optimal(    35, $matrix ), 0 );
-is( find_val_find_val_dnf_optimal(    39, $matrix ), 1 );
-
-## Now run our full test set - from -10 to 60. This covers
-## all cases within the list and a few either side...
-
-is( find_val_search(      $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS;
-is( find_val_grep_grep(   $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS;
-#...
-#...
-#...
-is( find_val_dnf_optimal( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS;
-
-done_testing();
-
-cmpthese( $N, {
-  'Search'    => sub { find_val_search(      $_, $matrix ) foreach @KEYS; },
-  'GrepGrep'  => sub { find_val_grep_grep(   $_, $matrix ) foreach @KEYS; },
-  #...
-  #...
-  #...
-  'DNFOpt'    => sub { find_val_dnf_optimal( $_, $matrix ) foreach @KEYS; },
-} );
-```
+As well as using ternaries to make the code shorter - we use a few
+other of our "shortening" tricks:
+
+ * We use variable `$a` as this is a variable we don't need to `my`.
+
+ * We also store the split path name in `@_`, this has two advantages:
+   we don't need to `my` it and secondly - we can just use `shift`
+   without using a variable in the while loop - as `shift` in a
+   function shifts off `@_`
+
+ * We use **yoda** comparisons ( `"value" eq $variable` ) rather tha
+   the more normal `$variable eq "value"` as it means we can save a
+   byte { `$var eq''` vs `''eq$var` } as we don't need the extra space.
 
-## Addendum 1 - using hash keys
+ * We re-order the if/else so that the we unravel it into an:
+   `if() { } ifelse() {} else {}` format which is better for
+   nested ternaries - even if it may seen to be less readable.
 
-As some people have used hash slices to solve this problem - I thought I
-would look into this to -- Colin thought this would help in the perl
-review! I've added the following function:
+ * In the code above we separated the `$pointer - 1` in the `splice`
+   line with the `$pointer--` in the next line to back-track. We can
+   use `--$pointer` to achieve this all within the splice statement.
+
+ * To futher shorten the code in `canonical_path_compact` we
+   replace the `grep` clause with a regular expression which is
+   7-bytes shorter.
 
 ```perl
-sub find_val_hash {
-  my %list;
-  @list{ map { @{$_} } @{$_[1]} } = (); ## Quickest 
-  return exists $list{$_[0]} ? 1: 0;
+sub canonical_path_compact_opt {
+## Remove the regex in the grep and use ternarys...
+$a=1,@_=grep{''ne$_&&'.'ne$_}split/\//,shift;
+$_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_;
+join'/','',@_
+}
+```
+```perl
+sub canonical_path_compact {
+## Make shorted by using regex and ternarys
+$a=1,@_=grep{!/^\.?$/}split/\//,shift;
+$_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_;
+join'/','',@_
 }
 ```
 
-which converts the nested arrayref into an array for the keys of `%list`
-and then returns 1 if the value of a key in the hash. Like other methods
-this throws away the fact that the array is originally sorted...
+### The string code
 
-Looking at performance - this is approximately 53% slower than the search
-algorithm, and therefore less than a tenth of the speed of the optimal
-solution.
+Here we re-implement the two string algorithms in compact 1-liners.
 
-## Addendum 2 - passing an array rather than an arrayref
+ * `canonical_path_fast` and `canonical_path_fastest` correspond
+   directly to the two methods above.
 
-I quickly checked to see if by passing the matrix as an array rather than
-an arrayref was any faster - one less derefernce - but it is not - it
-appears to be roughtly the same and speed sometimes faster.
+ * `canonical_path_short` replaces the equality checks for "" and
+   "`.`" with the regex as we saw in the array code.
 
+ * `canonical_path_shortest` removes the need for one of the
+   ternary operators by performing string multiplication when
+   adding the directory to the list. If the regex returns true,
+   then `"/$_"x!/.../` is `"/$_"x 0` or "". If the regex returns
+   false then `"/$_"x!/.../` is `"/$_"x 1` or "`/$_`".
+   
+```perl 
+sub canonical_path_shortest {
+$a='';
+'..'ne$_?$a.="/$_"x!/^\.?$/:$a=~s#/[^/]+$## for split'/',shift;
+$a
+}
+```
 ```perl
-sub flatten_array {
-  shift @_;
-  my @list = map { @{$_} } @_;
-  return 1;
+sub canonical_path_short {
+$a='';
+/^\.?$/?0:'..'ne$_?$a.="/$_":$a=~s#/[^/]+$## for split'/',shift;
+$a
 }
 ```
-
-## Addendum 3 - using `List::Util` methods
-
-Just out of interest to compare the core `map` and `grep` against the various `List::Util` functions `first`/`any`, I've added 3 more functions:
-
 ```perl
-sub find_val_list_util {
-  my($v,$m) = @_;
-  my $t =  first { $_->[-1] >= $v } @{$m};
-  return ($t && any { $v == $_ } @{$t}) ? 1 : 0;
+sub canonical_path_fast {
+$a='';
+'.'ne$_&&''ne$_&&('..'ne$_?$a.="/$_":$a=~s#/[^/]+$##)for split'/',shift;
+$a
 }
-
-sub find_val_any_any {
-  my($v,$m) = @_;
-  return (any { $v>=$_->[0] && $v <=$_->[-1] && (any { $_ == $v } @{$_}) } @{$m}) ? 1 : 0;
+```
+```perl
+sub canonical_path_fastest {
+$a='';
+'.'ne$_&&''ne$_&&('..'ne$_?$a.='/'.$_:substr$a,rindex($a,'/'),~0,'')for split'/',shift;
+$a
 }
-
-sub find_val_any_any_naive {
-  my($v,$m) = @_;
-  return (any { any { $_ == $v } @{$_} } @{$m}) ? 1 : 0;
+```
+```perl
+my $s;
+sub canonical_path_global {
+$s='';
+'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split'/',shift;
+$s
 }
 ```
 
-The last one is similar to the `grep - grep` method above but uses `any` - now in theory this
-should be faster as `any` stops at the first occcurance - but as `grep` is core and `any` is
-not there will be a performance penalty - especially for such a small set of values.
+## Performance of different methods
+
+We will look at some different versions of the code:
+
+  * whether we use an array or string to accumulate the resultant path
+  * Whether we use "readable" code or Perl hacks and tricks
+
+To see what aspects of our code makes it faster or slower
+
+### Summary of methods..
+ * "Long form" Perl...
+   * `canonical_path_double`      - Using a double loop
+   * `canonical_path_array`       - Using backtracking instead of inner loop
+   * `canonical_path_string`      - Use a string as the accumulator and mapping
+   * `canonical_path_string_fast` - As above - but using substr/rindex
+ * "One-liner" perl {arrays}
+   * `canonical_path_compact`     - short version of array code
+   * `canonical_path_compact_opt` - optimized version of above - 1-less regex
+ * "One-liner" perl {strings}
+   * `canonical_path_shortest` - most compact method
+   * `canonical_path_short`    - compact method
+   * `canonical_path_fast`     - replace one of the regex with equality checks
+   * `canonical_path_fastest`  - replace other regex with substr/rindex
+   * `canonical_path_global`   - as fastest but with global variable...
+
+### Performance of each method:
+
+|                |     Rate | @-st | $-st | $-sh | @-2l | @-cf | @-ft | $-cd | $-fa | $-cf | $-ft | $-gl |
+| -------------- | -------: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
+| @-shortest     | 17,483/s |   -- | -21% | -22% | -29% | -30% | -37% | -38% | -43% | -52% | -59% | -59% | 
+| $-shortest     | 22,026/s |  26% |   -- |  -2% | -11% | -11% | -21% | -21% | -28% | -40% | -48% | -49% | 
+| $-short        | 22,472/s |  29% |   2% |   -- |  -9% | -10% | -20% | -20% | -26% | -39% | -47% | -48% | 
+| @-code-2loop   | 24,631/s |  41% |  12% |  10% |   -- |  -1% | -12% | -12% | -19% | -33% | -42% | -43% | 
+| @-code-fastest | 24,876/s |  42% |  13% |  11% |   1% |   -- | -11% | -11% | -18% | -32% | -42% | -42% | 
+| @-fastest      | 27,933/s |  60% |  27% |  24% |  13% |  12% |   -- |  -0% |  -8% | -24% | -35% | -35% | 
+| $-code-fast    | 28,011/s |  60% |  27% |  25% |  14% |  13% |   0% |   -- |  -8% | -24% | -34% | -35% | 
+| $-fast         | 30,488/s |  74% |  38% |  36% |  24% |  23% |   9% |   9% |   -- | -17% | -29% | -29% | 
+| $-code-fastest | 36,765/s | 110% |  67% |  64% |  49% |  48% |  32% |  31% |  21% |   -- | -14% | -15% | 
+| $-fastest      | 42,735/s | 144% |  94% |  90% |  74% |  72% |  53% |  53% |  40% |  16% |   -- |  -1% | 
+| $-global-fast  | 43,103/s | 147% |  96% |  92% |  75% |  73% |  54% |  54% |  41% |  17% |   1% |   -- |
+
+### Summary
+What we see is:
+ * that the string code is faster than the array code,
+   by around 20-40%
+ * using compact "1-liner" code can be approximately 10%
+   faster.
+ * but using less regex's and replacing them with
+   eq/ne for comparisons and `substr`/`rindex` for
+   replacement/trimming improves the speed the most.
+    * approx 25-30% for removing the comparison regex for checking
+      `' '` or `'.'` and replacing with two `eq`/`ne`
+    * approx 30-40% for removing the substitute of the string
+      from the last `'/'` to the end of the string, with `rindex`
+      and the the four parameter version of `substr`.
+    * combining the two seems to double the performance!
+  * switching from local to global variables gets a minor
+    gain (about 1%) again due to memory management.
+
+## Conclusion
+
+So short code is interesting - but is not by a long shot the
+most efficient especially in respect of converting regexes into
+`substr`/`index`/`rindex`, allocation of variables, even if we
+keep it to a 1-liner.
+
+*e.g.* with the short code - we see the optimal short string code is
+twice as efficient as the shortest version - and only about 33% longer.
+
+One of the interesting things is that there is some discussion that
+avoiding concatenating strings by pushing them into an array and
+joining them is supposedly faster than just concatenating.... This
+seems to prove otherwise.. So don't assume everything you read - but
+check it yourself!
+
+# Challenge 2 - Climb Stairs
+
+You are given `$n` steps to climb - Write a script to find out the
+distinct ways to climb to the top. You are allowed to climb either
+1 or 2 steps at a time.
+
+## Assumption
+
+Although not clear - I just assumed that the response was a single
+numeric value.
+
+## Solution
+
+We first note that the formula for number of steps climbed can be
+seen to be.
+
+  `count_n = count_(n-1) + count_(n-2)`
+
+As the last step is either a 1-step (when there are therefore `count_(n-1)`
+options to get to that step) or 2-step (when there are therefore `count_(n-2)`
+options to get to that step)...
+
+This is a recognisable formula - it is just a fibonnaci sequence.
+
+## Brute force solution
+
+We could use a recursive method to get the fibonnaci values out - but
+that would have function call overheads - rather we can use just two
+variables to store the sequence, we define `$a` & `$b` to both be `1`
+and then each iteration through we set `$a` to `$b` and `$b` to the sum
+of `$a` & `$b`. We just then return the last value of `$b`.
 
-To speed things upw ithe add a check to see if the value is inside the row greater/equal to the
-first value and less than/equal to the last value. This makes a big difference.
+```perl
+sub climb {
+  my($a,$b) = (1,1);
+  ($a,$b) = ($b,$a+$b) foreach 2..$_[0];
+  return $b;
+}
+```
 
+This uses one of the nice features of perl in the fact that you can
+assign to more than one variable with the same statement, you often
+see this when you flip two values over.
 
-## Addendum 4 - caching the "flattening"
+Classically you would write:
 
-Optimizations like these are great if you are in the middle of a tight loop and calling the function a number
-of times - if this is the same matrix each time - we may get some performance boosts by pre-flattening the matrix.
+```perl
+my $t = $a;
+$a = $b;
+$b = $t;
+```
+but you in perl can write this as:
+```perl
+($a,$b)=($b,$a);
+```
+without the need of the additionaly (temporary) variable.
 
-I've included two more functions "Grep-@" & "Hash-%" which replicate "Grep-Map" and "Hash" to show how this can
-speed things up.. The compute on the fly hash is the slowest method - but pre-cacheing the hash is 30 times faster,
-the pre-flattened array speeds up the grep - but only by a factor of 2 and so is still only half the speed of
-the *do not flatten* versions.
+## Building a cache using state or global variables - or pre-computing
 
-**Note:** The second parameter in each of these is the respective flattened arrayref or hashref.
+If the call is being made repeatedly we can cache results - either
+using a "`state`" variable within the function or a "`global`" variable.
 
 ```perl
-sub find_val_grep_pre {
-  my($v,$m)=@_;
-  return 0 + grep { $_ == $v } @{$m};
+sub climb_cache {
+  state @cache = (1,1);
+  $cache[$_]=$cache[$_-1]+$cache[$_-2] foreach @cache .. $_[0];
+  return $cache[$_[0]];
 }
 
-sub find_val_hash_pre {
-  return exists $_[1]{$_[0]} ? 1: 0;
+my @glob_cache = (1,1);
+sub climb_cache_glob {
+  $glob_cache[$_]=$glob_cache[$_-1]+$glob_cache[$_-2] foreach @glob_cache .. $_[0];
+  return $glob_cache[$_[0]];
 }
 ```
 
-| Method      | Rate         | Hash      | AANaive   | Search    | GrepMap   | AnyAny    | ListUtil  | GrepGrep  | *Flatten*  | *Flatten@* | **preGrep** | DNFGen   | GrepExt  | DNF      | DNFOpt   | **preHash** |
-| ----------- | -----------: | --------: | --------: | --------: | --------: | --------: | --------: | --------: | ---------: | ---------: | ----------: | -------: | -------: | -------: | -------: | ---------:  |
-| Hash        |    2,019/s   |      --   |    -26%   |    -59%   |    -63%   |    -67%   |    -69%   |    -73%   |   *-76%*   |   *-76%*   |   **-82%**  |   -83%   |   -87%   |   -90%   |   -91%   |   **-98%**  | 
-| AANaive     |    2,714/s   |     34%   |      --   |    -44%   |    -50%   |    -56%   |    -58%   |    -64%   |   *-68%*   |   *-68%*   |   **-75%**  |   -78%   |   -82%   |   -87%   |   -88%   |   **-97%**  | 
-| Search      |    4,873/s   |    141%   |     80%   |      --   |    -10%   |    -20%   |    -24%   |    -35%   |   *-42%*   |   *-42%*   |   **-56%**  |   -60%   |   -68%   |   -76%   |   -79%   |   **-94%**  | 
-| GrepMap     |    5,435/s   |    169%   |    100%   |     12%   |      --   |    -11%   |    -15%   |    -27%   |   *-35%*   |   *-36%*   |   **-51%**  |   -55%   |   -64%   |   -73%   |   -76%   |   **-94%**  | 
-| AnyAny      |    6,127/s   |    204%   |    126%   |     26%   |     13%   |      --   |     -5%   |    -18%   |   *-27%*   |   *-27%*   |   **-44%**  |   -50%   |   -60%   |   -70%   |   -73%   |   **-93%**  | 
-| ListUtil    |    6,427/s   |    218%   |    137%   |     32%   |     18%   |      5%   |      --   |    -14%   |   *-24%*   |   *-24%*   |   **-42%**  |   -47%   |   -58%   |   -68%   |   -72%   |   **-92%**  | 
-| GrepGrep    |    7,452/s   |    269%   |    175%   |     53%   |     37%   |     22%   |     16%   |      --   |   *-11%*   |   *-12%*   |   **-32%**  |   -39%   |   -51%   |   -63%   |   -68%   |   **-91%**  | 
-| *Flatten*   |    8,418/s*  |   *317%*  |   *210%*  |    *73%*  |    *55%*  |    *37%*  |    *31%*  |    *13%*  |     *--*   |    *-0%*   |  ***-24%*** |  *-31%*  |  *-45%*  |  *-58%*  |  *-63%*  |  ***-90%*** | 
-| *Flatten@*  |    8,446/s*  |   *318%*  |   *211%*  |    *73%*  |    *55%*  |    *38%*  |    *31%*  |    *13%*  |     *0%*   |     *--*   |  ***-23%*** |  *-31%*  |  *-45%*  |  *-58%*  |  *-63%*  |  ***-90%*** | 
-| **preGrep** | **11,013/s** |  **446%** |  **306%** |  **126%** |  **103%** |   **80%** |   **71%** |   **48%** |  ***31%*** |  ***30%*** |     **--**  | **-10%** | **-28%** | **-46%** | **-52%** |   **-87%**  | 
-| DNFGen      |   12,195/s   |    504%   |    349%   |    150%   |    124%   |     99%   |     90%   |     64%   |    *45%*   |    *44%*   |    **11%**  |     --   |   -20%   |   -40%   |   -47%   |   **-86%**  | 
-| GrepExt     |   15,244/s   |    655%   |    462%   |    213%   |    180%   |    149%   |    137%   |    105%   |    *81%*   |    *80%*   |    **38%**  |    25%   |     --   |   -25%   |   -34%   |   **-82%**  | 
-| DNF         |   20,243/s   |    903%   |    646%   |    315%   |    272%   |    230%   |    215%   |    172%   |   *140%*   |   *140%*   |    **84%**  |    66%   |    33%   |     --   |   -12%   |   **-76%**  | 
-| DNFOpt      |   22,936/s   |   1036%   |    745%   |    371%   |    322%   |    274%   |    257%   |    208%   |   *172%*   |   *172%*   |   **108%**  |    88%   |    50%   |    13%   |     --   |   **-73%**  | 
-| **preHash** | **84,746/s** | **4098%** | **3022%** | **1639%** | **1459%** | **1283%** | **1219%** | **1037%** | ***907%*** | ***903%*** |   **669%**  | **595%** | **456%** | **319%** | **269%** |     **--**  |
-
-## Summary...
-
-So as we can see our **DNFOpt** (`find_val_dnf_optimal`) method is the optimal method running at approx 1,700,000 calculations per second, followed by the unoptimized **DNF** (`find_val_dnf`) method at approximately 1,500,000 caclulations per second - these are both optimized to know the matrix is 5x5.
-
-Interestingly the fastest method (without explicitly coding for 5x5) is the optimized version of the `grep`x`grep` of the **GrepExt** (`find_val_grep_grep_ext`) method at approximately 1,100,000 calculations per second. This slighlty nudges out the generic form of the nested binary search **DNFGen** (`find_val_general_dnf`) method at 900,000 calculations per second.
-
-Other things to note..:
-  * `grep`x`grep` method **GrepGrep** (`find_val_grep_grep`) is faster than all the versions using `List::Util` using `any` and `first` by approximately 15%, actually comparing like for like with **AANaive** (`find_val_any_any_naive`) is 2.75 times faster... I guess this is because `grep` is built in... There may be a sweet point where this changes as the matrix gets large though.
-  * The worst methods are: **Hash** (`find_val_hash`), **AANaive** (`find_val_any_any_naive`) and **Search** (`find_val_search`) at approximately 150,000, 200,000 and 350,000 calculations per second.
-  * The range between best and worst is an order of magnitude.
-  * I haven't included in this summary the two **preGrep** and **preHash** methods - which pre-flatten the matrix - and note that the **preHash** method is the fastest by far - nearly 4 times faster than are *best* method **DNFOpt**, even though the **Hash** method was worse (the ration of the two hash methods is approximately 40:1). This shows that if it is possible to cache some part of the calculation it can greatly improve performance if called lots of time.
-  * All these comparisons are based on the 5x5 matrix in the question - and relative speeds may change as the matrix dimensions get larger.
-
-** Quick extra addendum
-
-I've added another method which implements a binary search across the unflattened matrix - by effectively flattening the matrix and a 0..n location back into x,y-coorinates...
+Finally we look at the cache check overhead by pre-computing the values into 
+a cache and then run:
 
 ```perl
-sub find_val_binary {
-  my($v,$m)=@_;
-  return 0 if$v<$m->[0][0]||$v>$m->[-1][-1];
-  my $size = @{$m};
-  my($v2,$n,$l,$r)=(0,0,0,$size*$size-1);
-  while($l<=$r) {
-    $n=($l+$r)>>1;
-    return 1 if $v == ($v2 = $m->[$n/$size][$n%$size]);
-    $v>$v2?($l=$n+1):($r=$n-1);
-  }
-  return 0;
+sub climb_lookup {
+  return $ans[$_[0]];
 }
 ```
 
-Interestingly this is approximately 25% slower than the **DNFGen** method... I would think because of the two *division operations* when getting the matrix value.
-
-|        |    Rate | Binary | DNFGen |
-| ------ | ------: | -----: | -----: |
-| Binary |  9200/s |     -- |   -25% |
-| DNFGen | 122
author	Mohammad S Anwar <Mohammad.Anwar@yahoo.com>	2021-05-14 14:28:21 +0100
committer	GitHub <noreply@github.com>	2021-05-14 14:28:21 +0100
commit	2023334b99b3c28e54d5dd17f59b4496a0370867 (patch)
tree	443d0cc2d791a2a219b4348125c185d6f2f5848f /challenge-112
parent	15502017ba3d59e96e00453936f6ea9d075f00ae (diff)
parent	9fb1f7c6553ec7338e3e86ed0c35818d74f8b421 (diff)
download	perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.tar.gz perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.tar.bz2 perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.zip