aboutsummaryrefslogtreecommitdiff
path: root/challenge-112
diff options
context:
space:
mode:
authorMohammad S Anwar <Mohammad.Anwar@yahoo.com>2021-05-14 14:28:21 +0100
committerGitHub <noreply@github.com>2021-05-14 14:28:21 +0100
commit2023334b99b3c28e54d5dd17f59b4496a0370867 (patch)
tree443d0cc2d791a2a219b4348125c185d6f2f5848f /challenge-112
parent15502017ba3d59e96e00453936f6ea9d075f00ae (diff)
parent9fb1f7c6553ec7338e3e86ed0c35818d74f8b421 (diff)
downloadperlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.tar.gz
perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.tar.bz2
perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.zip
Merge pull request #4077 from drbaggy/master
Finished the readme... blog page...
Diffstat (limited to 'challenge-112')
-rw-r--r--challenge-112/james-smith/README.md935
-rw-r--r--challenge-112/james-smith/blog.txt1
-rw-r--r--challenge-112/james-smith/perl/ch-1.pl10
-rw-r--r--challenge-112/james-smith/perl/ch-2.pl60
4 files changed, 500 insertions, 506 deletions
diff --git a/challenge-112/james-smith/README.md b/challenge-112/james-smith/README.md
index 71e2905051..3bc69d2fb2 100644
--- a/challenge-112/james-smith/README.md
+++ b/challenge-112/james-smith/README.md
@@ -1,4 +1,4 @@
-# Perl Weekly Challenge #111
+# Perl Weekly Challenge #112
You can find more information about this weeks, and previous weeks challenges at:
@@ -8,592 +8,557 @@ If you are not already doing the challenge - it is a good place to practise your
**perl** or **raku**. If it is not **perl** or **raku** you develop in - you can
submit solutions in whichever language you feel comfortable with.
-# Challenge 1 - Search Matrix
+# Challenge 1 - Canonical Path
-**You are given 5x5 matrix filled with integers such that each row is sorted from left to
-right and the first integer of each row is greater than the last integer of the previous
-row.**
+**You are given a string path, starting with a slash ‘/'. Write a script to
+convert the given absolute path to the simplified canonical path.**
-**Write a script to find a given integer in the matrix using an efficient search algorithm.**
+In a Unix-style file system:
-## Solution to challenge 1
+ * A period '.' refers to the current directory
+ * A double period '..' refers to the directory up a level
+ * Multiple consecutive slashes ('//') are treated as a single slash '/'
-Today's is an interesting challenge ... to check to see if a value is in the matrix....
+The canonical path format:
-Initially this looks like we need to define some efficient sorting algorithm that flattens
-the array and then searches it {e.g. binary search}.
+ * The path starts with a single slash '/'.
+ * Any two directories are separated by a single slash '/'.
+ * The path does not end with a trailing '/'.
+ * The path only contains the directories on the path from the root directory to the target file or directory
-We implement this... but we compare it against some alternative simple methods. We don't
-get the result we expect...
+## Note....
-## Binary search
+Please note there is an ambiguity in the question - when then path contains no
+files - as it cannot start with a '/' and not end with a '/' - so we have
+to make a choice do we return '/' or do we return ''.
-We first flatten the array. Then we loop until the array is empty or it has a solitary value
+In our case we decide to return it as the empty string.
+This has the advantage that there is a level of consistency if you do...
- If we get a match with the "middle" value we return 1;
- Otherwise we remove the half of the list using `splice` which
- the number isn't in.
-
-We return 0 if the array has length 0. If the length is 1 we return
-1 or 0 depending on whether the array is search value or not.
+$parent_dir.canonical_path('/a');
-```perl
-sub find_val_search {
- my( $val, $m, @list ) = ( $_[0], 0, map { @{$_} } @{$_[1]} );
-
- $list[ $m = @list >> 1 ] == $val ? ( return 1 )
- : $list[ $m ] > $val ? ( splice @list, $m )
- : ( splice @list, 0, $m+1 )
- while @list>1;
- return @list && $list[0] == $val ? 1 : 0;
-}
-```
+or
-## The other (better) methods:
+$parent_dir.canonical_path('/');
-### grep on the flattened array
+then it will always end without a "/";
-```perl
-sub find_val_grep_map {
- my($v,$m)=@_;
- return 0 + grep { $_ == $v } map { @{$_} } @{$m};
-}
-```
+To change the value the functions return you can replace the return
+statement with either `return q(/). join q(/), @list` or
+`return $str || q(/)`, depending on whether or not the function
+stores the path elements in an array or a string.
-### reducing array with grep and then combining with map
+## Solution to challenge 1
-Uses `grep` on each sub array to return either an
-empty array or an array containing the match.
-We then use `grep` to further filter these results
-for existances of the array.
+Again another interesting challenge ... we can see if we can improve
+performance.
-The resultant array will have length `0` or `1`.
+Initially it looks quite complex - there are two solutions classes:
-```perl
-sub find_val_grep_grep {
- my($v,$m)=@_;
- return 0 + grep { grep { $_ == $v } @{$_} } @{$m};
-}
-```
+ * splitting the string and creating/modifying an array of the
+ individual parts
-Now to avoid as many of the `grep`s as possible we can do two things. First check the
-number is in the range of the matrix - this skips the outer `grep` for some values and
-secondly - check the value is in the range for each line you are looping over.... This
-leads us to:
+ * splitting the string and creating/modifying a string
-```perl
-sub find_val_grep_grep_ext {
- my($v,$m)=@_;
- return $v<$m->[0][0] || $v > $m->[-1][-1]
- ? 0
- : 0 + grep { $v>=$_->[0] && $v<=$_->[-1] && grep { $_ == $v } @{$_} } @{$m};
-}
-```
-## Efficiency
+## "Expanded perl code"
-We use `cmpthese` to compare the performance...
+### Array - two loops...
-Our methods are:
- * **Search**: `find_val_search`
- * **GrepMap**: `find_val_grep_map`
- * **GrepGrep**: `find_val_grep_grep`
- * **GrepExt**: `find_val_grep_grep_ext`
- * **Flatten**: just flattens the array - basic benchmark to compare efficiency to that function....
+ * We first split the directory into path parts and remove any that
+ are empty or "`.`".
-Timings using `Benchmark::cmpthese`
+ * We loop through the array until we find a '`..`' if we do we
+ remove it and the previous entry.
-| | Rate | Search | GrepMap | GrepGrep | *Flatten* | GrepExt |
-| --------- | --------: | -----: | ------: | -------: | --------: | ------: |
-| Search | 4,857/s | -- | -3% | -34% | *-42%* | -68% |
-| GrepMap | 4,995/s | 3% | -- | -32% | *-40%* | -67% |
-| GrepGrep | 7,391/s | 52% | 48% | -- | *-11%* | -52% |
-| *Flatten* | *8,326/s* | *71%* | *67%* | *13%* | *--* | *-46%* |
-| GrepExt | 15,337/s | 216% | 207% | 108% | *84%* | -- |
+ * We then repeat this until we don't find a '`..`'
+ * To jump out of the loop we use `next "label"` to not just skip out
+ of the inner loop, but to also to restart the parent loop at the
+ same time.
-Notes:
+ * Finally remove an initial "`..`" which wouldn't get removed by this
+ algorithm.
- * *Flatten* is for comparison only - it actually does nothing other than flatten
- the array - this highlights how efficient each algorithm is (and can be)
-
- * So we see that the grep_grep solution is nearly 60% more efficient than the search
- algorithm (this is true for all search method algorithms which flatten
- the array first), add the filtering to the outer grep - this becomes faster than
- flatten - and around 3 times the speed of search.
+ * and join the array together with '`/`' - we add the `''` so that we
+ get the leading '`/`'.
-## Not flattening array
-
-As we have a limit on performance with the flattening operation. To
-improve efficiency we will need to consider a different approach: **Which
-therefore must not flatten!**
-
-### Do not flatten - find the row containing the number
+```perl
+sub canonical_path_double {
+ my $directory_path = shift;
+ my @directory_names = grep { $_ ne '' &&
+ $_ ne '.' }
+ split m{/},
+ $directory_path;
+
+ OUTER: while(1) {
+ foreach (1..$#directory_names) {
+ next unless $directory_names[$_] eq '..';
+ splice @directory_names,$_-1,2;
+ next OUTER;
+ }
+ last;
+ }
+ shift @directory_names if @directory_names && $directory_names[0] eq '..';
+ return join '/','',@directory_names;
+}
+```
- * Rather than using `map`/`grep` on the outer array, we instead use
- a "hard coded" binary search.
+### Array - 1-loop with backtracking
- * First we return 0 if the number is outside the range of numbers in
- the list.
+To avoid the double loop - we can use back-tracking so that when we
+find a "`..`" instead of restarting the loop we just step back one
+entry in the array - we see `$pointer--` after the `splice`
+statement below.
- * Then we look to see if the number is below the first value of the
- 4th row. This means it is in rows 1, 2 or 3. So we test for that.
- Otherwise it is in rows 4 or 5, so we can test that as well.
-
- * This function expression returns which row the number *could be in*
+ * If `$pointer == 0` then the `..` is at the start of the directory
+ list so we just remove it (we can't go back beyond the root
+ directory).
```perl
- $v < $m->[3][0]
- ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 )
- : ( $v < $m->[4][0] ? 3 : 4 )
+sub canonical_path_array {
+ my $directory_path = shift;
+ my @directory_names = grep { $_ ne '' &&
+ $_ ne '.' }
+ split m{/},
+ $directory_path;
+ my $pointer = 1;
+ while( $pointer < @directory_names ) {
+ if( $directory_names[$pointer] eq '..' ) {
+ if( $pointer > 0 ) {
+ splice @directory_names, $pointer - 1, 2;
+ $pointer --;
+ } else {
+ shift @directory_names;
+ }
+ } else {
+ $pointer ++;
+ }
+ }
+ return join '/','', @directory_names;
+}
```
-This leads the whole function to be:
+### String - 1-loop
-```perl
-sub find_val_dnf {
- my($v,$m) = @_;
- return $v < $m->[0][0] || $v > $m->[4][4]
- ? 0
- : 0 + grep { $v == $_ } @{$m->[ $v < $m->[3][0]
- ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 )
- : ( $v < $m->[4][0] ? 3 : 4 )
- ]};
-}
-```
+Rather than store the parts in a list - we use a string to store
+this canonical path - and we either add directories to the end of
+it or remove them if we come across a "`..`".
-### Do not flatten - optimal solution
+ * We achieve the former - by just concatenating a "`/`" and the
+ name to the end of the string.
- * As well as not using `map` or `grep` on the outside array we
- do so on the inner array.
+ * The latter we strip this string off with a regex substitution:
+ `s{/[^/]+\Z}{}`. This works in all cases wherever the "`..`"
+ is in the list.
- * Once we have chosen the correct row then we split it into
- two halves and check for a match in each half depending on
- whether it matches the middle value or is above/below.
+ * Note as we are looping through the array we can ignore the
+ grep and just skip out of the loop if the name is either ""
+ or "`.`".
```perl
-sub find_val_dnf_optimal {
- my($v,$m,$t) = @_;
-
- return $v < $m->[0][0] || $v > $m->[4][4]
- ? 0
- : ( $t = $m->[ $v < $m->[3][0]
- ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 )
- : ( $v < $m->[4][0] ? 3 : 4 )
- ] ) &&
- ( return $v == $t->[2] ? 1 :
- $v < $t->[2] ?
- ( $v == $t->[0] || $v == $t->[1] ? 1 : 0) :
- ( $v == $t->[4] || $v == $t->[3] ? 1 : 0) );
-}
+sub canonical_path_string {
+ my $path = shift;
+ my @directories = split m{/}, $path;
+ my $canonical_path = '';
+ foreach my $directory_name ( @directories ) {
+ next if $directory_name eq '';
+ next if $directory_name eq '.';
+ if( $directory_name eq q(..) ) {
+ $canonical_path =~ s{/[^/]+\Z}{};
+ } else {
+ $canonical_path .= q(/) . $directory_name;
+ }
+ }
+ return $canonical_path;
+}
```
+### String fast - 1-loop
-### Do not flatten - generalised solution
+Regexs are not the fastest way to perform simple matches strings
+(and intern to modify them). We can speed up the trimming of the
+canonical path by replacing the regex solution by using `rindex`
+and the four-parameter version of `substr`.
- * Above we used the information that we are supplied with a 5x5 matrix
- if the matrix is an arbitrary size we can change the two loops to
- use a binary search algorithm.
-
- * Notes:
+ * `rindex $str, $needle` finds the index of `$needle` in `$str`.
+ Here we use it to find the last `/` in string.
- * We use `($l+$r)>>1` to find the mid point - this is more efficient
- than using `int(($l+$r)/2)`; Bit-shift operators are useful if you
- are multiplying or dividing by 2.
+ * `substr $str, $offset, $length, $substitute` finds the chunk of
+ the string `$str` from `$offset` of given length `$length`. If a
+ fourth parameter is set then this region of the string is replaced
+ by `$substitute`.
- * The loops differ slightly because one is looking for an entry which
- is within a range - the other for a specific value. The differencs
- you will note are (1) the first loop do `$r=$n` where in the
- second we do `$r=$n-1` - as the first is a range based search the
- second an exact match; (2) The second loop checks for equality before
- the binary search check - and returns 1 if true. The end eheck also
- becomes $l<=$r as $r becomes less than $l when we have missed the
- entry.
+ * We can use this to truncate the string by doing:
+
+ `substr $path, rindex( $path, '/' ), ~0, '';`
+
+ In the two-parameter version of `substr` if we omit length then
+ this returns to the end of the string. In the four-parameter
+ version - we can't omit this - so have to use an alternative
+ value - it has to be bigger than the longest string possible.
+ We use the "bitwise-negation" operator "`~`" to generate the
+ largest value possible. This is: 18,446,744,073,709,551,615 or
+ just shy of 16 Exabytes - I believe this should be big enough!
+
+ (Note you can use -ve numbers but there is no way of doing `-0`
+ to trim to the end of the line)
+
+The script then becomes:
```perl
-sub find_val_general_dnf {
- my($v,$m)=@_;
- return 0 if$v<$m->[0][0]||$v>$m->[-1][-1];
- my($n,$l,$r)=(0,0,@{$m}-1);
- $v>$m->[$n=($l+$r)>>1][-1]?($l=$n+1):($r=$n)while$r!=$l;
- ($l,$r)=(0,@{$m=$m->[$l]}-1);
- ($v==$m->[$n=($l+$r)>>1])?(return 1):$v>$m->[$n]?($l=$n+1):($r=$n-1)while$l<=$r;
- return 0;
-}
+sub canonical_path_string_fast {
+ my $path = shift;
+ my @directories = split m{/}, $path;
+ my $canonical_path = '';
+ foreach my $directory_name ( @directories ) {
+ next if $directory_name eq '';
+ next if $directory_name eq '.';
+ if( $directory_name eq q(..) ) {
+ substr $canonical_path,
+ rindex( $canonical_path, '/' ),
+ ~0, '';
+ } else {
+ $canonical_path .= q(/) . $directory_name;
+ }
+ }
+ return $canonical_path;
+}
```
-Timings using `Benchmark::cmpthese`
-| | Rate | Search | GrepMap | GrepGrep | *Flatten* | DNFGen | GrepExt | DNF | DNFOpt |
-| --------- | --------: | -----: | ------: | -------: | --------: | -----: | ------: | -----: | -----: |
-| Search | 4,316/s | -- | -20% | -41% | *-47%* | -64% | -71% | -79% | -81% |
-| GrepMap | 5,391/s | 25% | -- | -26% | *-34%* | -55% | -64% | -73% | -76% |
-| GrepGrep | 7,310/s | 69% | 36% | -- | *-10%* | -39% | -52% | -64% | -68% |
-| *Flatten* | *8,123/s* | *88%* | *51%* | *11%* | *--* | *-33%* | *-46%* | *-60%* | *-65%* |
-| DNFGen | 12,063/s | 179% | 124% | 65% | *48%* | -- | -20% | -40% | -47% |
-| GrepExt | 15,106/s | 250% | 180% | 107% | *86%* | 25% | -- | -25% | -34% |
-| DNF | 20,161/s | 367% | 274% | 176% | *148%* | 67% | 33% | -- | -12% |
-| DNFOpt | 22,883/s | 430% | 324% | 213% | *182%* | 90% | 51% | 14% | -- |
+## "Compact perl code" AKA 1-liners..
-We can see that this "optimal method" is about 5 times more efficient than the "search"
-function we originally tried - even the generic double binary search is abour 2.5 times as
-efficient than the binary search over the flattened matrix.
+Now we can look at how we can compact this code. Here though we
+need to consider the trade off between size and performance - the
+smallest code is not necessarily the fastest - as some of the
+tricks to make the code compact also make it slower.
-## Test script
+### The array code...
-For completeness - this is the test and benchmarking script
+We have two versions of the code - which are slightly different
+The `canonical_path_compact_opt` function is probably closer to
+the 1-loop array function above. We use nested-ternaries to replace
+the `if else` blocks.
-```perl
-#!/usr/local/bin/perl
-
-use strict;
-
-use warnings;
-use feature qw(say);
-use Test::More;
-use Benchmark qw(cmpthese);
-
-my $N = @ARGV ? $ARGV[0] : 100_000;
-
-my $matrix = [
- [ 1, 2, 3, 5, 7 ],
- [ 9, 11, 15, 19, 20 ],
- [ 23, 24, 25, 29, 31 ],
- [ 32, 33, 39, 40, 42 ],
- [ 45, 47, 48, 49, 50 ],
-];
-
-## Create a test set - numbers from -10 to 60...
-my %TEST_SET = map { $_ => 0 } (my @KEYS = -10..60);
-
-## Set all to 0, and then iterate through the elements of the matrix
-## and set the numbers in the list to 1....
-
-$TEST_SET{$_} = 1 foreach map { @{$_} } @{$matrix};
-
-## Run the original PWC test examples...
-is( find_val_search( 35, $matrix ), 0 );
-is( find_val_search( 39, $matrix ), 1 );
-
-is( find_val_grep_grep( 35, $matrix ), 0 );
-is( find_val_grep_grep( 39, $matrix ), 1 );
-#...
-#...
-#...
-is( find_val_find_val_dnf_optimal( 35, $matrix ), 0 );
-is( find_val_find_val_dnf_optimal( 39, $matrix ), 1 );
-
-## Now run our full test set - from -10 to 60. This covers
-## all cases within the list and a few either side...
-
-is( find_val_search( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS;
-is( find_val_grep_grep( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS;
-#...
-#...
-#...
-is( find_val_dnf_optimal( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS;
-
-done_testing();
-
-cmpthese( $N, {
- 'Search' => sub { find_val_search( $_, $matrix ) foreach @KEYS; },
- 'GrepGrep' => sub { find_val_grep_grep( $_, $matrix ) foreach @KEYS; },
- #...
- #...
- #...
- 'DNFOpt' => sub { find_val_dnf_optimal( $_, $matrix ) foreach @KEYS; },
-} );
-```
+As well as using ternaries to make the code shorter - we use a few
+other of our "shortening" tricks:
+
+ * We use variable `$a` as this is a variable we don't need to `my`.
+
+ * We also store the split path name in `@_`, this has two advantages:
+ we don't need to `my` it and secondly - we can just use `shift`
+ without using a variable in the while loop - as `shift` in a
+ function shifts off `@_`
+
+ * We use **yoda** comparisons ( `"value" eq $variable` ) rather tha
+ the more normal `$variable eq "value"` as it means we can save a
+ byte { `$var eq''` vs `''eq$var` } as we don't need the extra space.
-## Addendum 1 - using hash keys
+ * We re-order the if/else so that the we unravel it into an:
+ `if() { } ifelse() {} else {}` format which is better for
+ nested ternaries - even if it may seen to be less readable.
-As some people have used hash slices to solve this problem - I thought I
-would look into this to -- Colin thought this would help in the perl
-review! I've added the following function:
+ * In the code above we separated the `$pointer - 1` in the `splice`
+ line with the `$pointer--` in the next line to back-track. We can
+ use `--$pointer` to achieve this all within the splice statement.
+
+ * To futher shorten the code in `canonical_path_compact` we
+ replace the `grep` clause with a regular expression which is
+ 7-bytes shorter.
```perl
-sub find_val_hash {
- my %list;
- @list{ map { @{$_} } @{$_[1]} } = (); ## Quickest
- return exists $list{$_[0]} ? 1: 0;
+sub canonical_path_compact_opt {
+## Remove the regex in the grep and use ternarys...
+$a=1,@_=grep{''ne$_&&'.'ne$_}split/\//,shift;
+$_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_;
+join'/','',@_
+}
+```
+```perl
+sub canonical_path_compact {
+## Make shorted by using regex and ternarys
+$a=1,@_=grep{!/^\.?$/}split/\//,shift;
+$_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_;
+join'/','',@_
}
```
-which converts the nested arrayref into an array for the keys of `%list`
-and then returns 1 if the value of a key in the hash. Like other methods
-this throws away the fact that the array is originally sorted...
+### The string code
-Looking at performance - this is approximately 53% slower than the search
-algorithm, and therefore less than a tenth of the speed of the optimal
-solution.
+Here we re-implement the two string algorithms in compact 1-liners.
-## Addendum 2 - passing an array rather than an arrayref
+ * `canonical_path_fast` and `canonical_path_fastest` correspond
+ directly to the two methods above.
-I quickly checked to see if by passing the matrix as an array rather than
-an arrayref was any faster - one less derefernce - but it is not - it
-appears to be roughtly the same and speed sometimes faster.
+ * `canonical_path_short` replaces the equality checks for "" and
+ "`.`" with the regex as we saw in the array code.
+ * `canonical_path_shortest` removes the need for one of the
+ ternary operators by performing string multiplication when
+ adding the directory to the list. If the regex returns true,
+ then `"/$_"x!/.../` is `"/$_"x 0` or "". If the regex returns
+ false then `"/$_"x!/.../` is `"/$_"x 1` or "`/$_`".
+
+```perl
+sub canonical_path_shortest {
+$a='';
+'..'ne$_?$a.="/$_"x!/^\.?$/:$a=~s#/[^/]+$## for split'/',shift;
+$a
+}
+```
```perl
-sub flatten_array {
- shift @_;
- my @list = map { @{$_} } @_;
- return 1;
+sub canonical_path_short {
+$a='';
+/^\.?$/?0:'..'ne$_?$a.="/$_":$a=~s#/[^/]+$## for split'/',shift;
+$a
}
```
-
-## Addendum 3 - using `List::Util` methods
-
-Just out of interest to compare the core `map` and `grep` against the various `List::Util` functions `first`/`any`, I've added 3 more functions:
-
```perl
-sub find_val_list_util {
- my($v,$m) = @_;
- my $t = first { $_->[-1] >= $v } @{$m};
- return ($t && any { $v == $_ } @{$t}) ? 1 : 0;
+sub canonical_path_fast {
+$a='';
+'.'ne$_&&''ne$_&&('..'ne$_?$a.="/$_":$a=~s#/[^/]+$##)for split'/',shift;
+$a
}
-
-sub find_val_any_any {
- my($v,$m) = @_;
- return (any { $v>=$_->[0] && $v <=$_->[-1] && (any { $_ == $v } @{$_}) } @{$m}) ? 1 : 0;
+```
+```perl
+sub canonical_path_fastest {
+$a='';
+'.'ne$_&&''ne$_&&('..'ne$_?$a.='/'.$_:substr$a,rindex($a,'/'),~0,'')for split'/',shift;
+$a
}
-
-sub find_val_any_any_naive {
- my($v,$m) = @_;
- return (any { any { $_ == $v } @{$_} } @{$m}) ? 1 : 0;
+```
+```perl
+my $s;
+sub canonical_path_global {
+$s='';
+'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split'/',shift;
+$s
}
```
-The last one is similar to the `grep - grep` method above but uses `any` - now in theory this
-should be faster as `any` stops at the first occcurance - but as `grep` is core and `any` is
-not there will be a performance penalty - especially for such a small set of values.
+## Performance of different methods
+
+We will look at some different versions of the code:
+
+ * whether we use an array or string to accumulate the resultant path
+ * Whether we use "readable" code or Perl hacks and tricks
+
+To see what aspects of our code makes it faster or slower
+
+### Summary of methods..
+ * "Long form" Perl...
+ * `canonical_path_double` - Using a double loop
+ * `canonical_path_array` - Using backtracking instead of inner loop
+ * `canonical_path_string` - Use a string as the accumulator and mapping
+ * `canonical_path_string_fast` - As above - but using substr/rindex
+ * "One-liner" perl {arrays}
+ * `canonical_path_compact` - short version of array code
+ * `canonical_path_compact_opt` - optimized version of above - 1-less regex
+ * "One-liner" perl {strings}
+ * `canonical_path_shortest` - most compact method
+ * `canonical_path_short` - compact method
+ * `canonical_path_fast` - replace one of the regex with equality checks
+ * `canonical_path_fastest` - replace other regex with substr/rindex
+ * `canonical_path_global` - as fastest but with global variable...
+
+### Performance of each method:
+
+| | Rate | @-st | $-st | $-sh | @-2l | @-cf | @-ft | $-cd | $-fa | $-cf | $-ft | $-gl |
+| -------------- | -------: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
+| @-shortest | 17,483/s | -- | -21% | -22% | -29% | -30% | -37% | -38% | -43% | -52% | -59% | -59% |
+| $-shortest | 22,026/s | 26% | -- | -2% | -11% | -11% | -21% | -21% | -28% | -40% | -48% | -49% |
+| $-short | 22,472/s | 29% | 2% | -- | -9% | -10% | -20% | -20% | -26% | -39% | -47% | -48% |
+| @-code-2loop | 24,631/s | 41% | 12% | 10% | -- | -1% | -12% | -12% | -19% | -33% | -42% | -43% |
+| @-code-fastest | 24,876/s | 42% | 13% | 11% | 1% | -- | -11% | -11% | -18% | -32% | -42% | -42% |
+| @-fastest | 27,933/s | 60% | 27% | 24% | 13% | 12% | -- | -0% | -8% | -24% | -35% | -35% |
+| $-code-fast | 28,011/s | 60% | 27% | 25% | 14% | 13% | 0% | -- | -8% | -24% | -34% | -35% |
+| $-fast | 30,488/s | 74% | 38% | 36% | 24% | 23% | 9% | 9% | -- | -17% | -29% | -29% |
+| $-code-fastest | 36,765/s | 110% | 67% | 64% | 49% | 48% | 32% | 31% | 21% | -- | -14% | -15% |
+| $-fastest | 42,735/s | 144% | 94% | 90% | 74% | 72% | 53% | 53% | 40% | 16% | -- | -1% |
+| $-global-fast | 43,103/s | 147% | 96% | 92% | 75% | 73% | 54% | 54% | 41% | 17% | 1% | -- |
+
+### Summary
+What we see is:
+ * that the string code is faster than the array code,
+ by around 20-40%
+ * using compact "1-liner" code can be approximately 10%
+ faster.
+ * but using less regex's and replacing them with
+ eq/ne for comparisons and `substr`/`rindex` for
+ replacement/trimming improves the speed the most.
+ * approx 25-30% for removing the comparison regex for checking
+ `' '` or `'.'` and replacing with two `eq`/`ne`
+ * approx 30-40% for removing the substitute of the string
+ from the last `'/'` to the end of the string, with `rindex`
+ and the the four parameter version of `substr`.
+ * combining the two seems to double the performance!
+ * switching from local to global variables gets a minor
+ gain (about 1%) again due to memory management.
+
+## Conclusion
+
+So short code is interesting - but is not by a long shot the
+most efficient especially in respect of converting regexes into
+`substr`/`index`/`rindex`, allocation of variables, even if we
+keep it to a 1-liner.
+
+*e.g.* with the short code - we see the optimal short string code is
+twice as efficient as the shortest version - and only about 33% longer.
+
+One of the interesting things is that there is some discussion that
+avoiding concatenating strings by pushing them into an array and
+joining them is supposedly faster than just concatenating.... This
+seems to prove otherwise.. So don't assume everything you read - but
+check it yourself!
+
+# Challenge 2 - Climb Stairs
+
+You are given `$n` steps to climb - Write a script to find out the
+distinct ways to climb to the top. You are allowed to climb either
+1 or 2 steps at a time.
+
+## Assumption
+
+Although not clear - I just assumed that the response was a single
+numeric value.
+
+## Solution
+
+We first note that the formula for number of steps climbed can be
+seen to be.
+
+ `count_n = count_(n-1) + count_(n-2)`
+
+As the last step is either a 1-step (when there are therefore `count_(n-1)`
+options to get to that step) or 2-step (when there are therefore `count_(n-2)`
+options to get to that step)...
+
+This is a recognisable formula - it is just a fibonnaci sequence.
+
+## Brute force solution
+
+We could use a recursive method to get the fibonnaci values out - but
+that would have function call overheads - rather we can use just two
+variables to store the sequence, we define `$a` & `$b` to both be `1`
+and then each iteration through we set `$a` to `$b` and `$b` to the sum
+of `$a` & `$b`. We just then return the last value of `$b`.
-To speed things upw ithe add a check to see if the value is inside the row greater/equal to the
-first value and less than/equal to the last value. This makes a big difference.
+```perl
+sub climb {
+ my($a,$b) = (1,1);
+ ($a,$b) = ($b,$a+$b) foreach 2..$_[0];
+ return $b;
+}
+```
+This uses one of the nice features of perl in the fact that you can
+assign to more than one variable with the same statement, you often
+see this when you flip two values over.
-## Addendum 4 - caching the "flattening"
+Classically you would write:
-Optimizations like these are great if you are in the middle of a tight loop and calling the function a number
-of times - if this is the same matrix each time - we may get some performance boosts by pre-flattening the matrix.
+```perl
+my $t = $a;
+$a = $b;
+$b = $t;
+```
+but you in perl can write this as:
+```perl
+($a,$b)=($b,$a);
+```
+without the need of the additionaly (temporary) variable.
-I've included two more functions "Grep-@" & "Hash-%" which replicate "Grep-Map" and "Hash" to show how this can
-speed things up.. The compute on the fly hash is the slowest method - but pre-cacheing the hash is 30 times faster,
-the pre-flattened array speeds up the grep - but only by a factor of 2 and so is still only half the speed of
-the *do not flatten* versions.
+## Building a cache using state or global variables - or pre-computing
-**Note:** The second parameter in each of these is the respective flattened arrayref or hashref.
+If the call is being made repeatedly we can cache results - either
+using a "`state`" variable within the function or a "`global`" variable.
```perl
-sub find_val_grep_pre {
- my($v,$m)=@_;
- return 0 + grep { $_ == $v } @{$m};
+sub climb_cache {
+ state @cache = (1,1);
+ $cache[$_]=$cache[$_-1]+$cache[$_-2] foreach @cache .. $_[0];
+ return $cache[$_[0]];
}
-sub find_val_hash_pre {
- return exists $_[1]{$_[0]} ? 1: 0;
+my @glob_cache = (1,1);
+sub climb_cache_glob {
+ $glob_cache[$_]=$glob_cache[$_-1]+$glob_cache[$_-2] foreach @glob_cache .. $_[0];
+ return $glob_cache[$_[0]];
}
```
-| Method | Rate | Hash | AANaive | Search | GrepMap | AnyAny | ListUtil | GrepGrep | *Flatten* | *Flatten@* | **preGrep** | DNFGen | GrepExt | DNF | DNFOpt | **preHash** |
-| ----------- | -----------: | --------: | --------: | --------: | --------: | --------: | --------: | --------: | ---------: | ---------: | ----------: | -------: | -------: | -------: | -------: | ---------: |
-| Hash | 2,019/s | -- | -26% | -59% | -63% | -67% | -69% | -73% | *-76%* | *-76%* | **-82%** | -83% | -87% | -90% | -91% | **-98%** |
-| AANaive | 2,714/s | 34% | -- | -44% | -50% | -56% | -58% | -64% | *-68%* | *-68%* | **-75%** | -78% | -82% | -87% | -88% | **-97%** |
-| Search | 4,873/s | 141% | 80% | -- | -10% | -20% | -24% | -35% | *-42%* | *-42%* | **-56%** | -60% | -68% | -76% | -79% | **-94%** |
-| GrepMap | 5,435/s | 169% | 100% | 12% | -- | -11% | -15% | -27% | *-35%* | *-36%* | **-51%** | -55% | -64% | -73% | -76% | **-94%** |
-| AnyAny | 6,127/s | 204% | 126% | 26% | 13% | -- | -5% | -18% | *-27%* | *-27%* | **-44%** | -50% | -60% | -70% | -73% | **-93%** |
-| ListUtil | 6,427/s | 218% | 137% | 32% | 18% | 5% | -- | -14% | *-24%* | *-24%* | **-42%** | -47% | -58% | -68% | -72% | **-92%** |
-| GrepGrep | 7,452/s | 269% | 175% | 53% | 37% | 22% | 16% | -- | *-11%* | *-12%* | **-32%** | -39% | -51% | -63% | -68% | **-91%** |
-| *Flatten* | 8,418/s* | *317%* | *210%* | *73%* | *55%* | *37%* | *31%* | *13%* | *--* | *-0%* | ***-24%*** | *-31%* | *-45%* | *-58%* | *-63%* | ***-90%*** |
-| *Flatten@* | 8,446/s* | *318%* | *211%* | *73%* | *55%* | *38%* | *31%* | *13%* | *0%* | *--* | ***-23%*** | *-31%* | *-45%* | *-58%* | *-63%* | ***-90%*** |
-| **preGrep** | **11,013/s** | **446%** | **306%** | **126%** | **103%** | **80%** | **71%** | **48%** | ***31%*** | ***30%*** | **--** | **-10%** | **-28%** | **-46%** | **-52%** | **-87%** |
-| DNFGen | 12,195/s | 504% | 349% | 150% | 124% | 99% | 90% | 64% | *45%* | *44%* | **11%** | -- | -20% | -40% | -47% | **-86%** |
-| GrepExt | 15,244/s | 655% | 462% | 213% | 180% | 149% | 137% | 105% | *81%* | *80%* | **38%** | 25% | -- | -25% | -34% | **-82%** |
-| DNF | 20,243/s | 903% | 646% | 315% | 272% | 230% | 215% | 172% | *140%* | *140%* | **84%** | 66% | 33% | -- | -12% | **-76%** |
-| DNFOpt | 22,936/s | 1036% | 745% | 371% | 322% | 274% | 257% | 208% | *172%* | *172%* | **108%** | 88% | 50% | 13% | -- | **-73%** |
-| **preHash** | **84,746/s** | **4098%** | **3022%** | **1639%** | **1459%** | **1283%** | **1219%** | **1037%** | ***907%*** | ***903%*** | **669%** | **595%** | **456%** | **319%** | **269%** | **--** |
-
-## Summary...
-
-So as we can see our **DNFOpt** (`find_val_dnf_optimal`) method is the optimal method running at approx 1,700,000 calculations per second, followed by the unoptimized **DNF** (`find_val_dnf`) method at approximately 1,500,000 caclulations per second - these are both optimized to know the matrix is 5x5.
-
-Interestingly the fastest method (without explicitly coding for 5x5) is the optimized version of the `grep`x`grep` of the **GrepExt** (`find_val_grep_grep_ext`) method at approximately 1,100,000 calculations per second. This slighlty nudges out the generic form of the nested binary search **DNFGen** (`find_val_general_dnf`) method at 900,000 calculations per second.
-
-Other things to note..:
- * `grep`x`grep` method **GrepGrep** (`find_val_grep_grep`) is faster than all the versions using `List::Util` using `any` and `first` by approximately 15%, actually comparing like for like with **AANaive** (`find_val_any_any_naive`) is 2.75 times faster... I guess this is because `grep` is built in... There may be a sweet point where this changes as the matrix gets large though.
- * The worst methods are: **Hash** (`find_val_hash`), **AANaive** (`find_val_any_any_naive`) and **Search** (`find_val_search`) at approximately 150,000, 200,000 and 350,000 calculations per second.
- * The range between best and worst is an order of magnitude.
- * I haven't included in this summary the two **preGrep** and **preHash** methods - which pre-flatten the matrix - and note that the **preHash** method is the fastest by far - nearly 4 times faster than are *best* method **DNFOpt**, even though the **Hash** method was worse (the ration of the two hash methods is approximately 40:1). This shows that if it is possible to cache some part of the calculation it can greatly improve performance if called lots of time.
- * All these comparisons are based on the 5x5 matrix in the question - and relative speeds may change as the matrix dimensions get larger.
-
-** Quick extra addendum
-
-I've added another method which implements a binary search across the unflattened matrix - by effectively flattening the matrix and a 0..n location back into x,y-coorinates...
+Finally we look at the cache check overhead by pre-computing the values into
+a cache and then run:
```perl
-sub find_val_binary {
- my($v,$m)=@_;
- return 0 if$v<$m->[0][0]||$v>$m->[-1][-1];
- my $size = @{$m};
- my($v2,$n,$l,$r)=(0,0,0,$size*$size-1);
- while($l<=$r) {
- $n=($l+$r)>>1;
- return 1 if $v == ($v2 = $m->[$n/$size][$n%$size]);
- $v>$v2?($l=$n+1):($r=$n-1);
- }
- return 0;
+sub climb_lookup {
+ return $ans[$_[0]];
}
```
-Interestingly this is approximately 25% slower than the **DNFGen** method... I would think because of the two *division operations* when getting the matrix value.
-
-| | Rate | Binary | DNFGen |
-| ------ | ------: | -----: | -----: |
-| Binary | 9200/s | -- | -25% |
-| DNFGen | 122