- Added template for week 113.

author: Mohammad S Anwar <mohammad.anwar@yahoo.com> 2021-05-17 06:57:07 +0100
committer: Mohammad S Anwar <mohammad.anwar@yahoo.com> 2021-05-17 06:57:07 +0100
commit: c3cd45087006d3f63b05219b8280a25dc1ea7ba9 (patch)
tree: ebd94e12f33bd5860791fb2d7ac49d4074c989f3 /challenge-113/james-smith
parent: b848049a94b216d459403b8590b26ec59e5746fd (diff)
download: perlweeklychallenge-club-c3cd45087006d3f63b05219b8280a25dc1ea7ba9.tar.gz
perlweeklychallenge-club-c3cd45087006d3f63b05219b8280a25dc1ea7ba9.tar.bz2
perlweeklychallenge-club-c3cd45087006d3f63b05219b8280a25dc1ea7ba9.zip
1 files changed, 551 insertions, 0 deletions
diff --git a/challenge-113/james-smith/README.md b/challenge-113/james-smith/README.md
new file mode 100644
index 0000000000..35cdf3b77a
--- /dev/null
+++ b/challenge-113/james-smith/README.md
@@ -0,0 +1,551 @@
+# Perl Weekly Challenge #112
+
+You can find more information about this weeks, and previous weeks challenges at:
+
+  https://perlweeklychallenge.org/
+
+If you are not already doing the challenge - it is a good place to practise your
+**perl** or **raku**. If it is not **perl** or **raku** you develop in - you can
+submit solutions in whichever language you feel comfortable with.
+
+# Challenge 1 - Canonical Path
+
+**You are given a string path, starting with a slash ‘/'. Write a script to
+convert the given absolute path to the simplified canonical path.**
+
+In a Unix-style file system:
+
+ * A period '.' refers to the current directory
+ * A double period '..' refers to the directory up a level
+ * Multiple consecutive slashes ('//') are treated as a single slash '/'
+
+The canonical path format:
+
+ * The path starts with a single slash '/'.
+ * Any two directories are separated by a single slash '/'.
+ * The path does not end with a trailing '/'.
+ * The path only contains the directories on the path from the root directory to the target file or directory
+
+## Note....
+
+Please note there is an ambiguity in the question - when then path contains no
+files - as it cannot start with a '/' and not end with a '/' - so we have
+to make a choice do we return '/' or do we return ''.
+
+In our case we decide to return it as the empty string.
+This has the advantage that there is a level of consistency if you do...
+
+$parent_dir.canonical_path('/a');
+
+or
+
+$parent_dir.canonical_path('/');
+
+then it will always end without a "/";
+
+To change the value the functions return you can replace the return
+statement with either `return q(/). join q(/), @list` or
+`return $str || q(/)`, depending on whether or not the function
+stores the path elements in an array or a string.
+
+## Solution to challenge 1
+
+Again another interesting challenge ... we can see if we can improve
+performance.
+
+Initially it looks quite complex - there are two solutions classes:
+
+ * splitting the string and creating/modifying an array of the
+   individual parts
+
+ * splitting the string and creating/modifying a string
+
+## "Expanded perl code"
+
+### Array - two loops...
+
+ * We first split the directory into path parts and remove any that
+   are empty or "`.`".
+
+ * We loop through the array until we find a '`..`' if we do we
+   remove it and the previous entry.
+
+ * We then repeat this until we don't find a '`..`'
+
+ * To jump out of the loop we use `next "label"` to not just skip out
+   of the inner loop, but to also to restart the parent loop at the
+   same time.
+
+ * Finally remove an initial "`..`" which wouldn't get removed by this
+   algorithm.
+
+ * and join the array together with '`/`' - we add the `''` so that we
+   get the leading '`/`'.
+
+```perl
+sub canonical_path_double {
+  my $directory_path = shift;
+  my @directory_names   = grep { $_ ne '' &&
+                                 $_ ne '.' }
+                          split m{/},
+                          $directory_path;
+
+  OUTER: while(1) {
+    foreach (1..$#directory_names) {
+      next unless $directory_names[$_] eq '..';
+      splice @directory_names,$_-1,2;
+      next OUTER;
+    }
+    last;
+  }
+  shift @directory_names if @directory_names && $directory_names[0] eq '..';
+  return join '/','',@directory_names;
+}
+```
+
+### Array - 1-loop
+
+We don't need to use a double loop - we can just treat
+the resultant array as a queue either pulling "`..`" or
+pushing (not "` `" or "`.`") onto the queue.
+
+```perl
+sub canonical_path_array {
+  my $directory_path = shift;
+  my @parts          = split m{/}, $directory_path;
+  my @directory_names;
+  foreach my $part ( @parts ) {
+    next if $part eq '';
+    next if $part eq '.';
+    if($part eq '..' ) {
+      pop @directory_names;
+    } else {
+      push @directory_names, $part;
+    }
+  }
+  return join '/','',@directory_names;
+}
+```
+
+### String - 1-loop
+
+Rather than store the parts in a list - we use a string to store
+this canonical path - and we either add directories to the end of
+it or remove them if we come across a "`..`", in a similar way to
+the `push`/`pop` that we used above.
+
+  * We achieve the former - by just concatenating a "`/`" and the
+    name to the end of the string.
+
+  * The latter we strip this string off with a regex substitution:
+    `s{/[^/]+\Z}{}`. This works in all cases wherever the "`..`"
+    is in the list.
+
+  * Note as we are looping through the array we can ignore the
+    grep and just skip out of the loop if the name is either ""
+    or "`.`".
+
+```perl
+sub canonical_path_string {
+  my $path = shift;
+  my @directories    = split m{/}, $path;
+  my $canonical_path = '';
+  foreach my $directory_name ( @directories ) {
+    next if $directory_name eq '';
+    next if $directory_name eq '.';
+    if( $directory_name eq q(..) ) {
+      $canonical_path =~ s{/[^/]+\Z}{};
+    } else {
+      $canonical_path .= q(/) . $directory_name;
+    }
+  }
+  return $canonical_path;
+}  
+```
+### String fast - 1-loop
+
+Regexs are not the fastest way to perform simple matches strings
+(and intern to modify them). We can speed up the trimming of the
+canonical path by replacing the regex solution by using `rindex`
+and the four-parameter version of `substr`.
+
+  * `rindex $str, $needle` finds the index of `$needle` in `$str`.
+    Here we use it to find the last `/` in string.
+
+  * `substr $str, $offset, $length, $substitute` finds the chunk of
+    the string `$str` from `$offset` of given length `$length`. If a
+    fourth parameter is set then this region of the string is replaced
+    by `$substitute`.
+
+  * We can use this to truncate the string by doing:
+  
+    `substr $path, rindex( $path, '/' ), ~0, '';`
+
+    In the two-parameter version of `substr` if we omit length then
+    this returns to the end of the string. In the four-parameter
+    version - we can't omit this - so have to use an alternative
+    value - it has to be bigger than the longest string possible.
+    We use the "bitwise-negation" operator "`~`" to generate the
+    largest value possible. This is: 18,446,744,073,709,551,615 or
+    just shy of 16 Exabytes - I believe this should be big enough!
+
+    (Note you can use -ve numbers but there is no way of doing `-0`
+    to trim to the end of the line)
+
+The script then becomes:
+
+```perl
+sub canonical_path_string_fast {
+  my $path = shift;
+  my @directories    = split m{/}, $path;
+  my $canonical_path = '';
+  foreach my $directory_name ( @directories ) {
+    next if $directory_name eq '';
+    next if $directory_name eq '.';
+    if( $directory_name eq q(..) ) {
+      substr $canonical_path,
+             rindex( $canonical_path, '/' ),
+             ~0, '';
+    } else {
+      $canonical_path .= q(/) . $directory_name;
+    }
+  }
+  return $canonical_path;
+}  
+```
+
+## "Compact perl code" AKA 1-liners..
+
+Now we can look at how we can compact this code. Here though we
+need to consider the trade off between size and performance - the
+smallest code is not necessarily the fastest - as some of the
+tricks to make the code compact also make it slower.
+
+### The array code...
+
+We have two versions of the code - which are slightly different
+The `canonical_path_compact_opt` function is probably closer to
+the 1-loop array function above. We use nested-ternaries to replace
+the `if else` blocks.
+
+As well as using ternaries to make the code shorter - we use a few
+other of our "shortening" tricks:
+
+ * We use **yoda** comparisons ( `"value" eq $variable` ) rather tha
+   the more normal `$variable eq "value"` as it means we can save a
+   byte { `$var eq''` vs `''eq$var` } as we don't need the extra space.
+
+ * We re-order the if/else so that the we unravel it into an:
+   `if() { } elsif() {} else {}` format which is better for
+   nested ternaries - even if it may seenm to be less readable.
+
+ * To futher shorten the code in `canonical_path_compact` we
+   replace the filtering clause with a regular expression which is
+   7-bytes shorter.
+
+```perl
+
+sub canonical_path_compact_opt {
+  my @d=();
+  ''ne$_&&'.'ne$_&&('..'eq$_?pop@d:push@d,$_)for split/\//,shift;
+  join'/','',@d;
+}
+
+my @g;
+sub canonical_path_compact_glob {
+  @g=();
+  ''ne$_&&'.'ne$_&&('..'eq$_?pop@g:push@g,$_)for split/\//,shift;
+  join'/','',@g;
+}
+
+sub canonical_path_compact {
+  my @d=();
+  /^\.?$/||('..'eq$_?pop@d:push@d,$_)for split/\//,shift;
+  return join'/','',@d;
+}
+```
+
+### The string code
+
+Here we re-implement the two string algorithms in compact 1-liners.
+
+ * `canonical_path_fast` and `canonical_path_fastest` correspond
+   directly to the two methods above.
+
+ * `canonical_path_short` replaces the equality checks for "" and
+   "`.`" with the regex as we saw in the array code.
+
+ * `canonical_path_shortest` removes the need for one of the
+   ternary operators by performing string multiplication when
+   adding the directory to the list. If the regex returns true,
+   then `"/$_"x!/.../` is `"/$_"x 0` or "". If the regex returns
+   false then `"/$_"x!/.../` is `"/$_"x 1` or "`/$_`".
+   
+```perl 
+sub canonical_path_shortest {
+$a='';
+'..'ne$_?$a.="/$_"x!/^\.?$/:$a=~s#/[^/]+$## for split'/',shift;
+$a
+}
+```
+```perl
+sub canonical_path_short {
+$a='';
+/^\.?$/?0:'..'ne$_?$a.="/$_":$a=~s#/[^/]+$## for split'/',shift;
+$a
+}
+```
+```perl
+sub canonical_path_fast {
+$a='';
+'.'ne$_&&''ne$_&&('..'ne$_?$a.="/$_":$a=~s#/[^/]+$##)for split'/',shift;
+$a
+}
+```
+```perl
+sub canonical_path_fastest {
+$a='';
+'.'ne$_&&''ne$_&&('..'ne$_?$a.='/'.$_:substr$a,rindex($a,'/'),~0,'')for split'/',shift;
+$a
+}
+```
+```perl
+my $s;
+sub canonical_path_global {
+$s='';
+'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split'/',shift;
+$s
+}
+```
+
+## Performance of different methods
+
+We will look at some different versions of the code:
+
+  * whether we use an array or string to accumulate the resultant path
+  * Whether we use "readable" code or Perl hacks and tricks
+
+To see what aspects of our code makes it faster or slower
+
+### Summary of methods..
+ * "Long form" Perl...
+   * `canonical_path_double`      - Using a double loop
+   * `canonical_path_array`       - Using backtracking instead of inner loop
+   * `canonical_path_string`      - Use a string as the accumulator and mapping
+   * `canonical_path_string_fast` - As above - but using substr/rindex
+ * "One-liner" perl {arrays}
+   * `canonical_path_compact`      - short version of array code
+   * `canonical_path_compact_opt`  - optimized version of above - 1-less regex
+   * `canonical_path_compact_glob` - as above but with global variable
+ * "One-liner" perl {strings}
+   * `canonical_path_shortest` - most compact method
+   * `canonical_path_short`    - compact method
+   * `canonical_path_fast`     - replace one of the regex with equality checks
+   * `canonical_path_fastest`  - replace other regex with substr/rindex
+   * `canonical_path_global`   - as fastest but with global variable...
+
+### Performance of each method:
+
+|              |    Rate  | @-sh | $-st | $-sh | @&2l | $&fa | $-fa | @&ft | $&ft | @-ft | @-gl | $-ft | $-gl |
+| ------------ | -------- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
+| @-short      | 20,877/s |   -- |  -6% |  -8% | -15% | -25% | -32% | -38% | -43% | -45% | -45% | -50% | -51% |
+| $-shortest   | 22,124/s |   6% |   -- |  -2% | -10% | -21% | -28% | -34% | -40% | -41% | -42% | -47% | -48% |
+| $-short      | 22,573/s |   8% |   2% |   -- |  -8% | -19% | -27% | -33% | -39% | -40% | -41% | -46% | -47% |
+| @-&-2-loop   | 24,631/s |  18% |  11% |   9% |   -- | -12% | -20% | -26% | -33% | -35% | -35% | -41% | -42% |
+| $-&-fast     | 27,933/s |  34% |  26% |  24% |  13% |   -- |  -9% | -16% | -24% | -26% | -27% | -34% | -35% |
+| $-fast       | 30,769/s |  47% |  39% |  36% |  25% |  10% |   -- |  -8% | -17% | -18% | -19% | -27% | -28% |
+| @-&-fastest  | 33,445/s |  60% |  51% |  48% |  36% |  20% |   9% |   -- |  -9% | -11% | -12% | -20% | -22% |
+| $-&-fastest  | 36,900/s |  77% |  67% |  63% |  50% |  32% |  20% |  10% |   -- |  -2% |  -3% | -12% | -14% |
+| @-fastest    | 37,736/s |  81% |  71% |  67% |  53% |  35% |  23% |  13% |   2% |   -- |  -1% | -10% | -12% |
+| @-global     | 38,168/s |  83% |  73% |  69% |  55% |  37% |  24% |  14% |   3% |   1% |   -- |  -9% | -11% |
+| $-fastest    | 42,017/s | 101% |  90% |  86% |  71% |  50% |  37% |  26% |  14% |  11% |  10% |   -- |  -2% |
+| $-global     | 42,735/s | 105% |  93% |  89% |  74% |  53% |  39% |  28% |  16% |  13% |  12% |   2% |   -- |
+
+
+### Summary
+What we see is:
+ * that the optimized string code is faster than the array code,
+   by around 12-15%
+ * using compact "1-liner" code can be approximately 10%
+   faster.
+ * but using less regex's and replacing them with
+   eq/ne for comparisons and `substr`/`rindex` for
+   replacement/trimming improves the speed the most.
+    * approx 25-30% for removing the comparison regex for checking
+      `' '` or `'.'` and replacing with two `eq`/`ne`
+    * approx 30-40% for removing the substitute of the string
+      from the last `'/'` to the end of the string, with `rindex`
+      and the the four parameter version of `substr`.
+    * combining the two seems to double the performance!
+  * switching from local to global variables gets a minor
+    gain (about 1-2%) again due to memory management.
+
+## Conclusion
+
+So short code is interesting - but is not by a long shot the
+most efficient especially in respect of converting regexes into
+`substr`/`index`/`rindex`, allocation of variables, even if we
+keep it to a 1-liner.
+
+*e.g.* with the short code - we see the optimal short string code is
+twice as efficient as the shortest version - and only about 33% longer.
+
+One of the interesting things is that there is some discussion that
+avoiding concatenating strings by pushing them into an array and
+joining them is supposedly faster than just concatenating.... This
+seems to prove otherwise.. So don't assume everything you read - but
+check it yourself!
+
+# Challenge 2 - Climb Stairs
+
+You are given `$n` steps to climb - Write a script to find out the
+distinct ways to climb to the top. You are allowed to climb either
+1 or 2 steps at a time.
+
+## Assumption
+
+Although not clear - I just assumed that the response was a single
+numeric value.
+
+## Solution
+
+We first note that the formula for number of steps climbed can be
+seen to be.
+
+  `count_n = count_(n-1) + count_(n-2)`
+
+As the last step is either a 1-step (when there are therefore `count_(n-1)`
+options to get to that step) or 2-step (when there are therefore `count_(n-2)`
+options to get to that step)...
+
+This is a recognisable formula - it is just a fibonnaci sequence.
+
+## Brute force solution
+
+We could use a recursive method to get the fibonnaci values out - but
+that would have function call overheads - rather we can use just two
+variables to store the sequence, we define `$a` & `$b` to both be `1`
+and then each iteration through we set `$a` to `$b` and `$b` to the sum
+of `$a` & `$b`. We just then return the last value of `$b`.
+
+```perl
+sub climb {
+  my($a,$b) = (1,1);
+  ($a,$b) = ($b,$a+$b) foreach 2..$_[0];
+  return $b;
+}
+```
+
+This uses one of the nice features of perl in the fact that you can
+assign to more than one variable with the same statement, you often
+see this when you flip two values over.
+
+Classically you would write:
+
+```perl
+my $t = $a;
+$a = $b;
+$b = $t;
+```
+but you in perl can write this as:
+```perl
+($a,$b)=($b,$a);
+```
+without the need of the additionaly (temporary) variable.
+
+## Building a cache using state or global variables - or pre-computing
+
+If the call is being made repeatedly we can cache results - either
+using a "`state`" variable within the function or a "`global`" variable.
+
+```perl
+sub climb_cache {
+  state @cache = (1,1);
+  $cache[$_]=$cache[$_-1]+$cache[$_-2] foreach @cache .. $_[0];
+  return $cache[$_[0]];
+}
+
+my @glob_cache = (1,1);
+sub climb_cache_glob {
+  $glob_cache[$_]=$glob_cache[$_-1]+$glob_cache[$_-2] foreach @glob_cache .. $_[0];
+  return $glob_cache[$_[0]];
+}
+```
+
+Finally we look at the cache check overhead by pre-computing the values into 
+a cache and then run:
+
+```perl
+sub climb_lookup {
+  return $ans[$_[0]];
+} 
+```
+
+## Mathematical formula solution
+
+There is Binet's formula for the `n`th fibonacci number which is:
+
+```
+     phi^n - 1/(-phi)^n
+fn = ------------------
+         sqrt 5
+```
+
+Where `phi` is the golden ratio or 1.618,033,988 == (1+sqrt 5)/2, this
+number crops up in many different places from art to nature.
+
+To speed up the calculation we compute `(phi^n)` and to get the second
+value we note that this can be written as `(-1)^n/(phi^n)`. So we only
+need to calculate `(phi^n)` once. Also we note `(-1)^n` can be
+rewritten as `n&1?1:-1`;
+
+In reality we don't even need to do this last trick, the contribution
+to the sum of '(-1)^n/(phi^n)/sqrt 5' is going to be less than `0.5`
+for all `n>=0` we can just reduce the formula to the first part to
+
+```
+fn = floor  (phi^n/sqrt 5)
+```
+
+```perl
+sub climb_fib {
+  my $q = ((1 + sqrt 5)/2)**($_[0]+1);
+  return int(0.001+ ($q - ($_[0]&1?1:-1)/$q)*sqrt 0.2);
+}
+
+sub climb_fib_1liner {
+  return int(0.001 + (($a = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$a)*sqrt 0.2);
+}
+
+sub climb_fib_approx {
+  return int(0.4 + (0.5+sqrt 1.25)**($_[0]+1)*sqrt 0.2);
+}
+```
+
+## Analysis and conclusion
+
+The following are data for computing all values up to "50 steps".
+
+|       |      Rate |  climb |   fib | fib-1 | cache | g-cch | fib-a | look  |
+| ----- |  -------: |  ----: | ----: | ----: | ----: | ----: | ----: | ----: |
+| climb |   7,145/s |     -- |  -86% |  -88% |  -89% |  -89% |  -92% |  -96% |
+| fib   |  52,854/s |   640% |    -- |   -8% |  -16% |  -21% |  -39% |  -72% |
+| fib-1 |  57,208/s |   701% |    8% |    -- |   -9% |  -14% |  -34% |  -70% |
+| cache |  62,657/s |   777% |   19% |   10% |    -- |   -6% |  -28% |  -67% |
+| g-cch |  66,489/s |   831% |   26% |   16% |    6% |    -- |  -23% |  -65% |
+| fib-a |  86,505/s | 1,111% |   64% |   51% |   38% |   30% |    -- |  -54% |
+| look  | 189,394/s | 2,551% |  258% |  231% |  202% |  185% |  119% |    -- |
+
+
+ * Using "Binet's" formula we see we get approx `8x` the speed of
+   the original `climb` function.
+ * Using the approximation to Binet's formulate we see we get a factor
+   of about `12x` speed up.
+ * Using the cache seems to give about a `9x` speed gain - the `global`
+   variable version is better than the `state` version.
+ * Interestingly if you pre-compute the cache then the speed gain is
+   over `25x` to the original and `3x` times the speed of the basic
+   cache function - this is probably due to the overhead of checking
+   to see if the number is already in the cache.
+
+So some food for thought on how to best handle calls within tight loops.
author	Mohammad S Anwar <mohammad.anwar@yahoo.com>	2021-05-17 06:57:07 +0100
committer	Mohammad S Anwar <mohammad.anwar@yahoo.com>	2021-05-17 06:57:07 +0100
commit	c3cd45087006d3f63b05219b8280a25dc1ea7ba9 (patch)
tree	ebd94e12f33bd5860791fb2d7ac49d4074c989f3 /challenge-113/james-smith
parent	b848049a94b216d459403b8590b26ec59e5746fd (diff)
download	perlweeklychallenge-club-c3cd45087006d3f63b05219b8280a25dc1ea7ba9.tar.gz perlweeklychallenge-club-c3cd45087006d3f63b05219b8280a25dc1ea7ba9.tar.bz2 perlweeklychallenge-club-c3cd45087006d3f63b05219b8280a25dc1ea7ba9.zip