diff options
| author | Mohammad S Anwar <Mohammad.Anwar@yahoo.com> | 2021-05-14 14:28:21 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2021-05-14 14:28:21 +0100 |
| commit | 2023334b99b3c28e54d5dd17f59b4496a0370867 (patch) | |
| tree | 443d0cc2d791a2a219b4348125c185d6f2f5848f /challenge-112 | |
| parent | 15502017ba3d59e96e00453936f6ea9d075f00ae (diff) | |
| parent | 9fb1f7c6553ec7338e3e86ed0c35818d74f8b421 (diff) | |
| download | perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.tar.gz perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.tar.bz2 perlweeklychallenge-club-2023334b99b3c28e54d5dd17f59b4496a0370867.zip | |
Merge pull request #4077 from drbaggy/master
Finished the readme... blog page...
Diffstat (limited to 'challenge-112')
| -rw-r--r-- | challenge-112/james-smith/README.md | 935 | ||||
| -rw-r--r-- | challenge-112/james-smith/blog.txt | 1 | ||||
| -rw-r--r-- | challenge-112/james-smith/perl/ch-1.pl | 10 | ||||
| -rw-r--r-- | challenge-112/james-smith/perl/ch-2.pl | 60 |
4 files changed, 500 insertions, 506 deletions
diff --git a/challenge-112/james-smith/README.md b/challenge-112/james-smith/README.md index 71e2905051..3bc69d2fb2 100644 --- a/challenge-112/james-smith/README.md +++ b/challenge-112/james-smith/README.md @@ -1,4 +1,4 @@ -# Perl Weekly Challenge #111 +# Perl Weekly Challenge #112 You can find more information about this weeks, and previous weeks challenges at: @@ -8,592 +8,557 @@ If you are not already doing the challenge - it is a good place to practise your **perl** or **raku**. If it is not **perl** or **raku** you develop in - you can submit solutions in whichever language you feel comfortable with. -# Challenge 1 - Search Matrix +# Challenge 1 - Canonical Path -**You are given 5x5 matrix filled with integers such that each row is sorted from left to -right and the first integer of each row is greater than the last integer of the previous -row.** +**You are given a string path, starting with a slash ‘/'. Write a script to +convert the given absolute path to the simplified canonical path.** -**Write a script to find a given integer in the matrix using an efficient search algorithm.** +In a Unix-style file system: -## Solution to challenge 1 + * A period '.' refers to the current directory + * A double period '..' refers to the directory up a level + * Multiple consecutive slashes ('//') are treated as a single slash '/' -Today's is an interesting challenge ... to check to see if a value is in the matrix.... +The canonical path format: -Initially this looks like we need to define some efficient sorting algorithm that flattens -the array and then searches it {e.g. binary search}. + * The path starts with a single slash '/'. + * Any two directories are separated by a single slash '/'. + * The path does not end with a trailing '/'. + * The path only contains the directories on the path from the root directory to the target file or directory -We implement this... but we compare it against some alternative simple methods. We don't -get the result we expect... +## Note.... -## Binary search +Please note there is an ambiguity in the question - when then path contains no +files - as it cannot start with a '/' and not end with a '/' - so we have +to make a choice do we return '/' or do we return ''. -We first flatten the array. Then we loop until the array is empty or it has a solitary value +In our case we decide to return it as the empty string. +This has the advantage that there is a level of consistency if you do... - If we get a match with the "middle" value we return 1; - Otherwise we remove the half of the list using `splice` which - the number isn't in. - -We return 0 if the array has length 0. If the length is 1 we return -1 or 0 depending on whether the array is search value or not. +$parent_dir.canonical_path('/a'); -```perl -sub find_val_search { - my( $val, $m, @list ) = ( $_[0], 0, map { @{$_} } @{$_[1]} ); - - $list[ $m = @list >> 1 ] == $val ? ( return 1 ) - : $list[ $m ] > $val ? ( splice @list, $m ) - : ( splice @list, 0, $m+1 ) - while @list>1; - return @list && $list[0] == $val ? 1 : 0; -} -``` +or -## The other (better) methods: +$parent_dir.canonical_path('/'); -### grep on the flattened array +then it will always end without a "/"; -```perl -sub find_val_grep_map { - my($v,$m)=@_; - return 0 + grep { $_ == $v } map { @{$_} } @{$m}; -} -``` +To change the value the functions return you can replace the return +statement with either `return q(/). join q(/), @list` or +`return $str || q(/)`, depending on whether or not the function +stores the path elements in an array or a string. -### reducing array with grep and then combining with map +## Solution to challenge 1 -Uses `grep` on each sub array to return either an -empty array or an array containing the match. -We then use `grep` to further filter these results -for existances of the array. +Again another interesting challenge ... we can see if we can improve +performance. -The resultant array will have length `0` or `1`. +Initially it looks quite complex - there are two solutions classes: -```perl -sub find_val_grep_grep { - my($v,$m)=@_; - return 0 + grep { grep { $_ == $v } @{$_} } @{$m}; -} -``` + * splitting the string and creating/modifying an array of the + individual parts -Now to avoid as many of the `grep`s as possible we can do two things. First check the -number is in the range of the matrix - this skips the outer `grep` for some values and -secondly - check the value is in the range for each line you are looping over.... This -leads us to: + * splitting the string and creating/modifying a string -```perl -sub find_val_grep_grep_ext { - my($v,$m)=@_; - return $v<$m->[0][0] || $v > $m->[-1][-1] - ? 0 - : 0 + grep { $v>=$_->[0] && $v<=$_->[-1] && grep { $_ == $v } @{$_} } @{$m}; -} -``` -## Efficiency +## "Expanded perl code" -We use `cmpthese` to compare the performance... +### Array - two loops... -Our methods are: - * **Search**: `find_val_search` - * **GrepMap**: `find_val_grep_map` - * **GrepGrep**: `find_val_grep_grep` - * **GrepExt**: `find_val_grep_grep_ext` - * **Flatten**: just flattens the array - basic benchmark to compare efficiency to that function.... + * We first split the directory into path parts and remove any that + are empty or "`.`". -Timings using `Benchmark::cmpthese` + * We loop through the array until we find a '`..`' if we do we + remove it and the previous entry. -| | Rate | Search | GrepMap | GrepGrep | *Flatten* | GrepExt | -| --------- | --------: | -----: | ------: | -------: | --------: | ------: | -| Search | 4,857/s | -- | -3% | -34% | *-42%* | -68% | -| GrepMap | 4,995/s | 3% | -- | -32% | *-40%* | -67% | -| GrepGrep | 7,391/s | 52% | 48% | -- | *-11%* | -52% | -| *Flatten* | *8,326/s* | *71%* | *67%* | *13%* | *--* | *-46%* | -| GrepExt | 15,337/s | 216% | 207% | 108% | *84%* | -- | + * We then repeat this until we don't find a '`..`' + * To jump out of the loop we use `next "label"` to not just skip out + of the inner loop, but to also to restart the parent loop at the + same time. -Notes: + * Finally remove an initial "`..`" which wouldn't get removed by this + algorithm. - * *Flatten* is for comparison only - it actually does nothing other than flatten - the array - this highlights how efficient each algorithm is (and can be) - - * So we see that the grep_grep solution is nearly 60% more efficient than the search - algorithm (this is true for all search method algorithms which flatten - the array first), add the filtering to the outer grep - this becomes faster than - flatten - and around 3 times the speed of search. + * and join the array together with '`/`' - we add the `''` so that we + get the leading '`/`'. -## Not flattening array - -As we have a limit on performance with the flattening operation. To -improve efficiency we will need to consider a different approach: **Which -therefore must not flatten!** - -### Do not flatten - find the row containing the number +```perl +sub canonical_path_double { + my $directory_path = shift; + my @directory_names = grep { $_ ne '' && + $_ ne '.' } + split m{/}, + $directory_path; + + OUTER: while(1) { + foreach (1..$#directory_names) { + next unless $directory_names[$_] eq '..'; + splice @directory_names,$_-1,2; + next OUTER; + } + last; + } + shift @directory_names if @directory_names && $directory_names[0] eq '..'; + return join '/','',@directory_names; +} +``` - * Rather than using `map`/`grep` on the outer array, we instead use - a "hard coded" binary search. +### Array - 1-loop with backtracking - * First we return 0 if the number is outside the range of numbers in - the list. +To avoid the double loop - we can use back-tracking so that when we +find a "`..`" instead of restarting the loop we just step back one +entry in the array - we see `$pointer--` after the `splice` +statement below. - * Then we look to see if the number is below the first value of the - 4th row. This means it is in rows 1, 2 or 3. So we test for that. - Otherwise it is in rows 4 or 5, so we can test that as well. - - * This function expression returns which row the number *could be in* + * If `$pointer == 0` then the `..` is at the start of the directory + list so we just remove it (we can't go back beyond the root + directory). ```perl - $v < $m->[3][0] - ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 ) - : ( $v < $m->[4][0] ? 3 : 4 ) +sub canonical_path_array { + my $directory_path = shift; + my @directory_names = grep { $_ ne '' && + $_ ne '.' } + split m{/}, + $directory_path; + my $pointer = 1; + while( $pointer < @directory_names ) { + if( $directory_names[$pointer] eq '..' ) { + if( $pointer > 0 ) { + splice @directory_names, $pointer - 1, 2; + $pointer --; + } else { + shift @directory_names; + } + } else { + $pointer ++; + } + } + return join '/','', @directory_names; +} ``` -This leads the whole function to be: +### String - 1-loop -```perl -sub find_val_dnf { - my($v,$m) = @_; - return $v < $m->[0][0] || $v > $m->[4][4] - ? 0 - : 0 + grep { $v == $_ } @{$m->[ $v < $m->[3][0] - ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 ) - : ( $v < $m->[4][0] ? 3 : 4 ) - ]}; -} -``` +Rather than store the parts in a list - we use a string to store +this canonical path - and we either add directories to the end of +it or remove them if we come across a "`..`". -### Do not flatten - optimal solution + * We achieve the former - by just concatenating a "`/`" and the + name to the end of the string. - * As well as not using `map` or `grep` on the outside array we - do so on the inner array. + * The latter we strip this string off with a regex substitution: + `s{/[^/]+\Z}{}`. This works in all cases wherever the "`..`" + is in the list. - * Once we have chosen the correct row then we split it into - two halves and check for a match in each half depending on - whether it matches the middle value or is above/below. + * Note as we are looping through the array we can ignore the + grep and just skip out of the loop if the name is either "" + or "`.`". ```perl -sub find_val_dnf_optimal { - my($v,$m,$t) = @_; - - return $v < $m->[0][0] || $v > $m->[4][4] - ? 0 - : ( $t = $m->[ $v < $m->[3][0] - ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 ) - : ( $v < $m->[4][0] ? 3 : 4 ) - ] ) && - ( return $v == $t->[2] ? 1 : - $v < $t->[2] ? - ( $v == $t->[0] || $v == $t->[1] ? 1 : 0) : - ( $v == $t->[4] || $v == $t->[3] ? 1 : 0) ); -} +sub canonical_path_string { + my $path = shift; + my @directories = split m{/}, $path; + my $canonical_path = ''; + foreach my $directory_name ( @directories ) { + next if $directory_name eq ''; + next if $directory_name eq '.'; + if( $directory_name eq q(..) ) { + $canonical_path =~ s{/[^/]+\Z}{}; + } else { + $canonical_path .= q(/) . $directory_name; + } + } + return $canonical_path; +} ``` +### String fast - 1-loop -### Do not flatten - generalised solution +Regexs are not the fastest way to perform simple matches strings +(and intern to modify them). We can speed up the trimming of the +canonical path by replacing the regex solution by using `rindex` +and the four-parameter version of `substr`. - * Above we used the information that we are supplied with a 5x5 matrix - if the matrix is an arbitrary size we can change the two loops to - use a binary search algorithm. - - * Notes: + * `rindex $str, $needle` finds the index of `$needle` in `$str`. + Here we use it to find the last `/` in string. - * We use `($l+$r)>>1` to find the mid point - this is more efficient - than using `int(($l+$r)/2)`; Bit-shift operators are useful if you - are multiplying or dividing by 2. + * `substr $str, $offset, $length, $substitute` finds the chunk of + the string `$str` from `$offset` of given length `$length`. If a + fourth parameter is set then this region of the string is replaced + by `$substitute`. - * The loops differ slightly because one is looking for an entry which - is within a range - the other for a specific value. The differencs - you will note are (1) the first loop do `$r=$n` where in the - second we do `$r=$n-1` - as the first is a range based search the - second an exact match; (2) The second loop checks for equality before - the binary search check - and returns 1 if true. The end eheck also - becomes $l<=$r as $r becomes less than $l when we have missed the - entry. + * We can use this to truncate the string by doing: + + `substr $path, rindex( $path, '/' ), ~0, '';` + + In the two-parameter version of `substr` if we omit length then + this returns to the end of the string. In the four-parameter + version - we can't omit this - so have to use an alternative + value - it has to be bigger than the longest string possible. + We use the "bitwise-negation" operator "`~`" to generate the + largest value possible. This is: 18,446,744,073,709,551,615 or + just shy of 16 Exabytes - I believe this should be big enough! + + (Note you can use -ve numbers but there is no way of doing `-0` + to trim to the end of the line) + +The script then becomes: ```perl -sub find_val_general_dnf { - my($v,$m)=@_; - return 0 if$v<$m->[0][0]||$v>$m->[-1][-1]; - my($n,$l,$r)=(0,0,@{$m}-1); - $v>$m->[$n=($l+$r)>>1][-1]?($l=$n+1):($r=$n)while$r!=$l; - ($l,$r)=(0,@{$m=$m->[$l]}-1); - ($v==$m->[$n=($l+$r)>>1])?(return 1):$v>$m->[$n]?($l=$n+1):($r=$n-1)while$l<=$r; - return 0; -} +sub canonical_path_string_fast { + my $path = shift; + my @directories = split m{/}, $path; + my $canonical_path = ''; + foreach my $directory_name ( @directories ) { + next if $directory_name eq ''; + next if $directory_name eq '.'; + if( $directory_name eq q(..) ) { + substr $canonical_path, + rindex( $canonical_path, '/' ), + ~0, ''; + } else { + $canonical_path .= q(/) . $directory_name; + } + } + return $canonical_path; +} ``` -Timings using `Benchmark::cmpthese` -| | Rate | Search | GrepMap | GrepGrep | *Flatten* | DNFGen | GrepExt | DNF | DNFOpt | -| --------- | --------: | -----: | ------: | -------: | --------: | -----: | ------: | -----: | -----: | -| Search | 4,316/s | -- | -20% | -41% | *-47%* | -64% | -71% | -79% | -81% | -| GrepMap | 5,391/s | 25% | -- | -26% | *-34%* | -55% | -64% | -73% | -76% | -| GrepGrep | 7,310/s | 69% | 36% | -- | *-10%* | -39% | -52% | -64% | -68% | -| *Flatten* | *8,123/s* | *88%* | *51%* | *11%* | *--* | *-33%* | *-46%* | *-60%* | *-65%* | -| DNFGen | 12,063/s | 179% | 124% | 65% | *48%* | -- | -20% | -40% | -47% | -| GrepExt | 15,106/s | 250% | 180% | 107% | *86%* | 25% | -- | -25% | -34% | -| DNF | 20,161/s | 367% | 274% | 176% | *148%* | 67% | 33% | -- | -12% | -| DNFOpt | 22,883/s | 430% | 324% | 213% | *182%* | 90% | 51% | 14% | -- | +## "Compact perl code" AKA 1-liners.. -We can see that this "optimal method" is about 5 times more efficient than the "search" -function we originally tried - even the generic double binary search is abour 2.5 times as -efficient than the binary search over the flattened matrix. +Now we can look at how we can compact this code. Here though we +need to consider the trade off between size and performance - the +smallest code is not necessarily the fastest - as some of the +tricks to make the code compact also make it slower. -## Test script +### The array code... -For completeness - this is the test and benchmarking script +We have two versions of the code - which are slightly different +The `canonical_path_compact_opt` function is probably closer to +the 1-loop array function above. We use nested-ternaries to replace +the `if else` blocks. -```perl -#!/usr/local/bin/perl - -use strict; - -use warnings; -use feature qw(say); -use Test::More; -use Benchmark qw(cmpthese); - -my $N = @ARGV ? $ARGV[0] : 100_000; - -my $matrix = [ - [ 1, 2, 3, 5, 7 ], - [ 9, 11, 15, 19, 20 ], - [ 23, 24, 25, 29, 31 ], - [ 32, 33, 39, 40, 42 ], - [ 45, 47, 48, 49, 50 ], -]; - -## Create a test set - numbers from -10 to 60... -my %TEST_SET = map { $_ => 0 } (my @KEYS = -10..60); - -## Set all to 0, and then iterate through the elements of the matrix -## and set the numbers in the list to 1.... - -$TEST_SET{$_} = 1 foreach map { @{$_} } @{$matrix}; - -## Run the original PWC test examples... -is( find_val_search( 35, $matrix ), 0 ); -is( find_val_search( 39, $matrix ), 1 ); - -is( find_val_grep_grep( 35, $matrix ), 0 ); -is( find_val_grep_grep( 39, $matrix ), 1 ); -#... -#... -#... -is( find_val_find_val_dnf_optimal( 35, $matrix ), 0 ); -is( find_val_find_val_dnf_optimal( 39, $matrix ), 1 ); - -## Now run our full test set - from -10 to 60. This covers -## all cases within the list and a few either side... - -is( find_val_search( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS; -is( find_val_grep_grep( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS; -#... -#... -#... -is( find_val_dnf_optimal( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS; - -done_testing(); - -cmpthese( $N, { - 'Search' => sub { find_val_search( $_, $matrix ) foreach @KEYS; }, - 'GrepGrep' => sub { find_val_grep_grep( $_, $matrix ) foreach @KEYS; }, - #... - #... - #... - 'DNFOpt' => sub { find_val_dnf_optimal( $_, $matrix ) foreach @KEYS; }, -} ); -``` +As well as using ternaries to make the code shorter - we use a few +other of our "shortening" tricks: + + * We use variable `$a` as this is a variable we don't need to `my`. + + * We also store the split path name in `@_`, this has two advantages: + we don't need to `my` it and secondly - we can just use `shift` + without using a variable in the while loop - as `shift` in a + function shifts off `@_` + + * We use **yoda** comparisons ( `"value" eq $variable` ) rather tha + the more normal `$variable eq "value"` as it means we can save a + byte { `$var eq''` vs `''eq$var` } as we don't need the extra space. -## Addendum 1 - using hash keys + * We re-order the if/else so that the we unravel it into an: + `if() { } ifelse() {} else {}` format which is better for + nested ternaries - even if it may seen to be less readable. -As some people have used hash slices to solve this problem - I thought I -would look into this to -- Colin thought this would help in the perl -review! I've added the following function: + * In the code above we separated the `$pointer - 1` in the `splice` + line with the `$pointer--` in the next line to back-track. We can + use `--$pointer` to achieve this all within the splice statement. + + * To futher shorten the code in `canonical_path_compact` we + replace the `grep` clause with a regular expression which is + 7-bytes shorter. ```perl -sub find_val_hash { - my %list; - @list{ map { @{$_} } @{$_[1]} } = (); ## Quickest - return exists $list{$_[0]} ? 1: 0; +sub canonical_path_compact_opt { +## Remove the regex in the grep and use ternarys... +$a=1,@_=grep{''ne$_&&'.'ne$_}split/\//,shift; +$_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_; +join'/','',@_ +} +``` +```perl +sub canonical_path_compact { +## Make shorted by using regex and ternarys +$a=1,@_=grep{!/^\.?$/}split/\//,shift; +$_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_; +join'/','',@_ } ``` -which converts the nested arrayref into an array for the keys of `%list` -and then returns 1 if the value of a key in the hash. Like other methods -this throws away the fact that the array is originally sorted... +### The string code -Looking at performance - this is approximately 53% slower than the search -algorithm, and therefore less than a tenth of the speed of the optimal -solution. +Here we re-implement the two string algorithms in compact 1-liners. -## Addendum 2 - passing an array rather than an arrayref + * `canonical_path_fast` and `canonical_path_fastest` correspond + directly to the two methods above. -I quickly checked to see if by passing the matrix as an array rather than -an arrayref was any faster - one less derefernce - but it is not - it -appears to be roughtly the same and speed sometimes faster. + * `canonical_path_short` replaces the equality checks for "" and + "`.`" with the regex as we saw in the array code. + * `canonical_path_shortest` removes the need for one of the + ternary operators by performing string multiplication when + adding the directory to the list. If the regex returns true, + then `"/$_"x!/.../` is `"/$_"x 0` or "". If the regex returns + false then `"/$_"x!/.../` is `"/$_"x 1` or "`/$_`". + +```perl +sub canonical_path_shortest { +$a=''; +'..'ne$_?$a.="/$_"x!/^\.?$/:$a=~s#/[^/]+$## for split'/',shift; +$a +} +``` ```perl -sub flatten_array { - shift @_; - my @list = map { @{$_} } @_; - return 1; +sub canonical_path_short { +$a=''; +/^\.?$/?0:'..'ne$_?$a.="/$_":$a=~s#/[^/]+$## for split'/',shift; +$a } ``` - -## Addendum 3 - using `List::Util` methods - -Just out of interest to compare the core `map` and `grep` against the various `List::Util` functions `first`/`any`, I've added 3 more functions: - ```perl -sub find_val_list_util { - my($v,$m) = @_; - my $t = first { $_->[-1] >= $v } @{$m}; - return ($t && any { $v == $_ } @{$t}) ? 1 : 0; +sub canonical_path_fast { +$a=''; +'.'ne$_&&''ne$_&&('..'ne$_?$a.="/$_":$a=~s#/[^/]+$##)for split'/',shift; +$a } - -sub find_val_any_any { - my($v,$m) = @_; - return (any { $v>=$_->[0] && $v <=$_->[-1] && (any { $_ == $v } @{$_}) } @{$m}) ? 1 : 0; +``` +```perl +sub canonical_path_fastest { +$a=''; +'.'ne$_&&''ne$_&&('..'ne$_?$a.='/'.$_:substr$a,rindex($a,'/'),~0,'')for split'/',shift; +$a } - -sub find_val_any_any_naive { - my($v,$m) = @_; - return (any { any { $_ == $v } @{$_} } @{$m}) ? 1 : 0; +``` +```perl +my $s; +sub canonical_path_global { +$s=''; +'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split'/',shift; +$s } ``` -The last one is similar to the `grep - grep` method above but uses `any` - now in theory this -should be faster as `any` stops at the first occcurance - but as `grep` is core and `any` is -not there will be a performance penalty - especially for such a small set of values. +## Performance of different methods + +We will look at some different versions of the code: + + * whether we use an array or string to accumulate the resultant path + * Whether we use "readable" code or Perl hacks and tricks + +To see what aspects of our code makes it faster or slower + +### Summary of methods.. + * "Long form" Perl... + * `canonical_path_double` - Using a double loop + * `canonical_path_array` - Using backtracking instead of inner loop + * `canonical_path_string` - Use a string as the accumulator and mapping + * `canonical_path_string_fast` - As above - but using substr/rindex + * "One-liner" perl {arrays} + * `canonical_path_compact` - short version of array code + * `canonical_path_compact_opt` - optimized version of above - 1-less regex + * "One-liner" perl {strings} + * `canonical_path_shortest` - most compact method + * `canonical_path_short` - compact method + * `canonical_path_fast` - replace one of the regex with equality checks + * `canonical_path_fastest` - replace other regex with substr/rindex + * `canonical_path_global` - as fastest but with global variable... + +### Performance of each method: + +| | Rate | @-st | $-st | $-sh | @-2l | @-cf | @-ft | $-cd | $-fa | $-cf | $-ft | $-gl | +| -------------- | -------: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | +| @-shortest | 17,483/s | -- | -21% | -22% | -29% | -30% | -37% | -38% | -43% | -52% | -59% | -59% | +| $-shortest | 22,026/s | 26% | -- | -2% | -11% | -11% | -21% | -21% | -28% | -40% | -48% | -49% | +| $-short | 22,472/s | 29% | 2% | -- | -9% | -10% | -20% | -20% | -26% | -39% | -47% | -48% | +| @-code-2loop | 24,631/s | 41% | 12% | 10% | -- | -1% | -12% | -12% | -19% | -33% | -42% | -43% | +| @-code-fastest | 24,876/s | 42% | 13% | 11% | 1% | -- | -11% | -11% | -18% | -32% | -42% | -42% | +| @-fastest | 27,933/s | 60% | 27% | 24% | 13% | 12% | -- | -0% | -8% | -24% | -35% | -35% | +| $-code-fast | 28,011/s | 60% | 27% | 25% | 14% | 13% | 0% | -- | -8% | -24% | -34% | -35% | +| $-fast | 30,488/s | 74% | 38% | 36% | 24% | 23% | 9% | 9% | -- | -17% | -29% | -29% | +| $-code-fastest | 36,765/s | 110% | 67% | 64% | 49% | 48% | 32% | 31% | 21% | -- | -14% | -15% | +| $-fastest | 42,735/s | 144% | 94% | 90% | 74% | 72% | 53% | 53% | 40% | 16% | -- | -1% | +| $-global-fast | 43,103/s | 147% | 96% | 92% | 75% | 73% | 54% | 54% | 41% | 17% | 1% | -- | + +### Summary +What we see is: + * that the string code is faster than the array code, + by around 20-40% + * using compact "1-liner" code can be approximately 10% + faster. + * but using less regex's and replacing them with + eq/ne for comparisons and `substr`/`rindex` for + replacement/trimming improves the speed the most. + * approx 25-30% for removing the comparison regex for checking + `' '` or `'.'` and replacing with two `eq`/`ne` + * approx 30-40% for removing the substitute of the string + from the last `'/'` to the end of the string, with `rindex` + and the the four parameter version of `substr`. + * combining the two seems to double the performance! + * switching from local to global variables gets a minor + gain (about 1%) again due to memory management. + +## Conclusion + +So short code is interesting - but is not by a long shot the +most efficient especially in respect of converting regexes into +`substr`/`index`/`rindex`, allocation of variables, even if we +keep it to a 1-liner. + +*e.g.* with the short code - we see the optimal short string code is +twice as efficient as the shortest version - and only about 33% longer. + +One of the interesting things is that there is some discussion that +avoiding concatenating strings by pushing them into an array and +joining them is supposedly faster than just concatenating.... This +seems to prove otherwise.. So don't assume everything you read - but +check it yourself! + +# Challenge 2 - Climb Stairs + +You are given `$n` steps to climb - Write a script to find out the +distinct ways to climb to the top. You are allowed to climb either +1 or 2 steps at a time. + +## Assumption + +Although not clear - I just assumed that the response was a single +numeric value. + +## Solution + +We first note that the formula for number of steps climbed can be +seen to be. + + `count_n = count_(n-1) + count_(n-2)` + +As the last step is either a 1-step (when there are therefore `count_(n-1)` +options to get to that step) or 2-step (when there are therefore `count_(n-2)` +options to get to that step)... + +This is a recognisable formula - it is just a fibonnaci sequence. + +## Brute force solution + +We could use a recursive method to get the fibonnaci values out - but +that would have function call overheads - rather we can use just two +variables to store the sequence, we define `$a` & `$b` to both be `1` +and then each iteration through we set `$a` to `$b` and `$b` to the sum +of `$a` & `$b`. We just then return the last value of `$b`. -To speed things upw ithe add a check to see if the value is inside the row greater/equal to the -first value and less than/equal to the last value. This makes a big difference. +```perl +sub climb { + my($a,$b) = (1,1); + ($a,$b) = ($b,$a+$b) foreach 2..$_[0]; + return $b; +} +``` +This uses one of the nice features of perl in the fact that you can +assign to more than one variable with the same statement, you often +see this when you flip two values over. -## Addendum 4 - caching the "flattening" +Classically you would write: -Optimizations like these are great if you are in the middle of a tight loop and calling the function a number -of times - if this is the same matrix each time - we may get some performance boosts by pre-flattening the matrix. +```perl +my $t = $a; +$a = $b; +$b = $t; +``` +but you in perl can write this as: +```perl +($a,$b)=($b,$a); +``` +without the need of the additionaly (temporary) variable. -I've included two more functions "Grep-@" & "Hash-%" which replicate "Grep-Map" and "Hash" to show how this can -speed things up.. The compute on the fly hash is the slowest method - but pre-cacheing the hash is 30 times faster, -the pre-flattened array speeds up the grep - but only by a factor of 2 and so is still only half the speed of -the *do not flatten* versions. +## Building a cache using state or global variables - or pre-computing -**Note:** The second parameter in each of these is the respective flattened arrayref or hashref. +If the call is being made repeatedly we can cache results - either +using a "`state`" variable within the function or a "`global`" variable. ```perl -sub find_val_grep_pre { - my($v,$m)=@_; - return 0 + grep { $_ == $v } @{$m}; +sub climb_cache { + state @cache = (1,1); + $cache[$_]=$cache[$_-1]+$cache[$_-2] foreach @cache .. $_[0]; + return $cache[$_[0]]; } -sub find_val_hash_pre { - return exists $_[1]{$_[0]} ? 1: 0; +my @glob_cache = (1,1); +sub climb_cache_glob { + $glob_cache[$_]=$glob_cache[$_-1]+$glob_cache[$_-2] foreach @glob_cache .. $_[0]; + return $glob_cache[$_[0]]; } ``` -| Method | Rate | Hash | AANaive | Search | GrepMap | AnyAny | ListUtil | GrepGrep | *Flatten* | *Flatten@* | **preGrep** | DNFGen | GrepExt | DNF | DNFOpt | **preHash** | -| ----------- | -----------: | --------: | --------: | --------: | --------: | --------: | --------: | --------: | ---------: | ---------: | ----------: | -------: | -------: | -------: | -------: | ---------: | -| Hash | 2,019/s | -- | -26% | -59% | -63% | -67% | -69% | -73% | *-76%* | *-76%* | **-82%** | -83% | -87% | -90% | -91% | **-98%** | -| AANaive | 2,714/s | 34% | -- | -44% | -50% | -56% | -58% | -64% | *-68%* | *-68%* | **-75%** | -78% | -82% | -87% | -88% | **-97%** | -| Search | 4,873/s | 141% | 80% | -- | -10% | -20% | -24% | -35% | *-42%* | *-42%* | **-56%** | -60% | -68% | -76% | -79% | **-94%** | -| GrepMap | 5,435/s | 169% | 100% | 12% | -- | -11% | -15% | -27% | *-35%* | *-36%* | **-51%** | -55% | -64% | -73% | -76% | **-94%** | -| AnyAny | 6,127/s | 204% | 126% | 26% | 13% | -- | -5% | -18% | *-27%* | *-27%* | **-44%** | -50% | -60% | -70% | -73% | **-93%** | -| ListUtil | 6,427/s | 218% | 137% | 32% | 18% | 5% | -- | -14% | *-24%* | *-24%* | **-42%** | -47% | -58% | -68% | -72% | **-92%** | -| GrepGrep | 7,452/s | 269% | 175% | 53% | 37% | 22% | 16% | -- | *-11%* | *-12%* | **-32%** | -39% | -51% | -63% | -68% | **-91%** | -| *Flatten* | 8,418/s* | *317%* | *210%* | *73%* | *55%* | *37%* | *31%* | *13%* | *--* | *-0%* | ***-24%*** | *-31%* | *-45%* | *-58%* | *-63%* | ***-90%*** | -| *Flatten@* | 8,446/s* | *318%* | *211%* | *73%* | *55%* | *38%* | *31%* | *13%* | *0%* | *--* | ***-23%*** | *-31%* | *-45%* | *-58%* | *-63%* | ***-90%*** | -| **preGrep** | **11,013/s** | **446%** | **306%** | **126%** | **103%** | **80%** | **71%** | **48%** | ***31%*** | ***30%*** | **--** | **-10%** | **-28%** | **-46%** | **-52%** | **-87%** | -| DNFGen | 12,195/s | 504% | 349% | 150% | 124% | 99% | 90% | 64% | *45%* | *44%* | **11%** | -- | -20% | -40% | -47% | **-86%** | -| GrepExt | 15,244/s | 655% | 462% | 213% | 180% | 149% | 137% | 105% | *81%* | *80%* | **38%** | 25% | -- | -25% | -34% | **-82%** | -| DNF | 20,243/s | 903% | 646% | 315% | 272% | 230% | 215% | 172% | *140%* | *140%* | **84%** | 66% | 33% | -- | -12% | **-76%** | -| DNFOpt | 22,936/s | 1036% | 745% | 371% | 322% | 274% | 257% | 208% | *172%* | *172%* | **108%** | 88% | 50% | 13% | -- | **-73%** | -| **preHash** | **84,746/s** | **4098%** | **3022%** | **1639%** | **1459%** | **1283%** | **1219%** | **1037%** | ***907%*** | ***903%*** | **669%** | **595%** | **456%** | **319%** | **269%** | **--** | - -## Summary... - -So as we can see our **DNFOpt** (`find_val_dnf_optimal`) method is the optimal method running at approx 1,700,000 calculations per second, followed by the unoptimized **DNF** (`find_val_dnf`) method at approximately 1,500,000 caclulations per second - these are both optimized to know the matrix is 5x5. - -Interestingly the fastest method (without explicitly coding for 5x5) is the optimized version of the `grep`x`grep` of the **GrepExt** (`find_val_grep_grep_ext`) method at approximately 1,100,000 calculations per second. This slighlty nudges out the generic form of the nested binary search **DNFGen** (`find_val_general_dnf`) method at 900,000 calculations per second. - -Other things to note..: - * `grep`x`grep` method **GrepGrep** (`find_val_grep_grep`) is faster than all the versions using `List::Util` using `any` and `first` by approximately 15%, actually comparing like for like with **AANaive** (`find_val_any_any_naive`) is 2.75 times faster... I guess this is because `grep` is built in... There may be a sweet point where this changes as the matrix gets large though. - * The worst methods are: **Hash** (`find_val_hash`), **AANaive** (`find_val_any_any_naive`) and **Search** (`find_val_search`) at approximately 150,000, 200,000 and 350,000 calculations per second. - * The range between best and worst is an order of magnitude. - * I haven't included in this summary the two **preGrep** and **preHash** methods - which pre-flatten the matrix - and note that the **preHash** method is the fastest by far - nearly 4 times faster than are *best* method **DNFOpt**, even though the **Hash** method was worse (the ration of the two hash methods is approximately 40:1). This shows that if it is possible to cache some part of the calculation it can greatly improve performance if called lots of time. - * All these comparisons are based on the 5x5 matrix in the question - and relative speeds may change as the matrix dimensions get larger. - -** Quick extra addendum - -I've added another method which implements a binary search across the unflattened matrix - by effectively flattening the matrix and a 0..n location back into x,y-coorinates... +Finally we look at the cache check overhead by pre-computing the values into +a cache and then run: ```perl -sub find_val_binary { - my($v,$m)=@_; - return 0 if$v<$m->[0][0]||$v>$m->[-1][-1]; - my $size = @{$m}; - my($v2,$n,$l,$r)=(0,0,0,$size*$size-1); - while($l<=$r) { - $n=($l+$r)>>1; - return 1 if $v == ($v2 = $m->[$n/$size][$n%$size]); - $v>$v2?($l=$n+1):($r=$n-1); - } - return 0; +sub climb_lookup { + return $ans[$_[0]]; } ``` -Interestingly this is approximately 25% slower than the **DNFGen** method... I would think because of the two *division operations* when getting the matrix value. - -| | Rate | Binary | DNFGen | -| ------ | ------: | -----: | -----: | -| Binary | 9200/s | -- | -25% | -| DNFGen | 122 |
