From a55ad5eaa86651991f89035202fc71e8eacdc931 Mon Sep 17 00:00:00 2001 From: drbaggy Date: Thu, 13 May 2021 09:53:04 +0100 Subject: Update README.md --- challenge-112/james-smith/README.md | 807 ++++++++++++++---------------------- 1 file changed, 303 insertions(+), 504 deletions(-) diff --git a/challenge-112/james-smith/README.md b/challenge-112/james-smith/README.md index 71e2905051..40f82a0eec 100644 --- a/challenge-112/james-smith/README.md +++ b/challenge-112/james-smith/README.md @@ -1,4 +1,4 @@ -# Perl Weekly Challenge #111 +# Perl Weekly Challenge #112 You can find more information about this weeks, and previous weeks challenges at: @@ -8,592 +8,391 @@ If you are not already doing the challenge - it is a good place to practise your **perl** or **raku**. If it is not **perl** or **raku** you develop in - you can submit solutions in whichever language you feel comfortable with. -# Challenge 1 - Search Matrix +# Challenge 1 - Canonical Path -**You are given 5x5 matrix filled with integers such that each row is sorted from left to -right and the first integer of each row is greater than the last integer of the previous -row.** +**You are given a string path, starting with a slash ‘/'. Write a script to +convert the given absolute path to the simplified canonical path.** -**Write a script to find a given integer in the matrix using an efficient search algorithm.** +In a Unix-style file system: -## Solution to challenge 1 - -Today's is an interesting challenge ... to check to see if a value is in the matrix.... + * A period '.' refers to the current directory + * A double period '..' refers to the directory up a level + * Multiple consecutive slashes ('//') are treated as a single slash '/' -Initially this looks like we need to define some efficient sorting algorithm that flattens -the array and then searches it {e.g. binary search}. +The canonical path format: -We implement this... but we compare it against some alternative simple methods. We don't -get the result we expect... + * The path starts with a single slash '/'. + * Any two directories are separated by a single slash '/'. + * The path does not end with a trailing '/'. + * The path only contains the directories on the path from the root directory to the target file or directory -## Binary search +## Note.... -We first flatten the array. Then we loop until the array is empty or it has a solitary value +Please note there is an ambiguity in the question - when then path contains no +files - as it cannot start with a '/' and not end with a '/' - so we have +to make a choice do we return '/' or do we return ''. - If we get a match with the "middle" value we return 1; - Otherwise we remove the half of the list using `splice` which - the number isn't in. - -We return 0 if the array has length 0. If the length is 1 we return -1 or 0 depending on whether the array is search value or not. +In our case we decide to return it as the empty string. +This has the advantage that there is a level of consistency if you do... -```perl -sub find_val_search { - my( $val, $m, @list ) = ( $_[0], 0, map { @{$_} } @{$_[1]} ); - - $list[ $m = @list >> 1 ] == $val ? ( return 1 ) - : $list[ $m ] > $val ? ( splice @list, $m ) - : ( splice @list, 0, $m+1 ) - while @list>1; - return @list && $list[0] == $val ? 1 : 0; -} -``` +$parent_dir.canonical_path('/a'); -## The other (better) methods: - -### grep on the flattened array - -```perl -sub find_val_grep_map { - my($v,$m)=@_; - return 0 + grep { $_ == $v } map { @{$_} } @{$m}; -} -``` +or -### reducing array with grep and then combining with map +$parent_dir.canonical_path('/'); -Uses `grep` on each sub array to return either an -empty array or an array containing the match. -We then use `grep` to further filter these results -for existances of the array. - -The resultant array will have length `0` or `1`. - -```perl -sub find_val_grep_grep { - my($v,$m)=@_; - return 0 + grep { grep { $_ == $v } @{$_} } @{$m}; -} -``` - -Now to avoid as many of the `grep`s as possible we can do two things. First check the -number is in the range of the matrix - this skips the outer `grep` for some values and -secondly - check the value is in the range for each line you are looping over.... This -leads us to: - -```perl -sub find_val_grep_grep_ext { - my($v,$m)=@_; - return $v<$m->[0][0] || $v > $m->[-1][-1] - ? 0 - : 0 + grep { $v>=$_->[0] && $v<=$_->[-1] && grep { $_ == $v } @{$_} } @{$m}; -} -``` -## Efficiency +then it will always end without a "/"; -We use `cmpthese` to compare the performance... +To change the value the functions return you can replace the return +statement with either `return q(/). join q(/), @list` or +`return $str || q(/)`, depending on whether or not the function +stores the path elements in an array or a string. -Our methods are: - * **Search**: `find_val_search` - * **GrepMap**: `find_val_grep_map` - * **GrepGrep**: `find_val_grep_grep` - * **GrepExt**: `find_val_grep_grep_ext` - * **Flatten**: just flattens the array - basic benchmark to compare efficiency to that function.... +## Solution to challenge 1 -Timings using `Benchmark::cmpthese` +Again another interesting challenge ... we can see if we can improve +performance. -| | Rate | Search | GrepMap | GrepGrep | *Flatten* | GrepExt | -| --------- | --------: | -----: | ------: | -------: | --------: | ------: | -| Search | 4,857/s | -- | -3% | -34% | *-42%* | -68% | -| GrepMap | 4,995/s | 3% | -- | -32% | *-40%* | -67% | -| GrepGrep | 7,391/s | 52% | 48% | -- | *-11%* | -52% | -| *Flatten* | *8,326/s* | *71%* | *67%* | *13%* | *--* | *-46%* | -| GrepExt | 15,337/s | 216% | 207% | 108% | *84%* | -- | +Initially it looks quite complex - there are two solutions classes: + * splitting the string and creating/modifying an array of the + individual parts -Notes: + * splitting the string and creating/modifying a string - * *Flatten* is for comparison only - it actually does nothing other than flatten - the array - this highlights how efficient each algorithm is (and can be) - - * So we see that the grep_grep solution is nearly 60% more efficient than the search - algorithm (this is true for all search method algorithms which flatten - the array first), add the filtering to the outer grep - this becomes faster than - flatten - and around 3 times the speed of search. +## "Expanded perl code" -## Not flattening array +### Array - two loops... -As we have a limit on performance with the flattening operation. To -improve efficiency we will need to consider a different approach: **Which -therefore must not flatten!** + * We first split the directory into path parts and remove any that + are empty or "`.`". -### Do not flatten - find the row containing the number + * We loop through the array until we find a '`..`' if we do we + remove it and the previous entry. - * Rather than using `map`/`grep` on the outer array, we instead use - a "hard coded" binary search. + * We then repeat this until we don't find a '`..`' - * First we return 0 if the number is outside the range of numbers in - the list. + * To jump out of the loop we use `next "label"` to not just skip out + of the inner loop, but to also to restart the parent loop at the + same time. - * Then we look to see if the number is below the first value of the - 4th row. This means it is in rows 1, 2 or 3. So we test for that. - Otherwise it is in rows 4 or 5, so we can test that as well. - - * This function expression returns which row the number *could be in* + * Finally remove an initial "`..`" which wouldn't get removed by this + algorithm. -```perl - $v < $m->[3][0] - ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 ) - : ( $v < $m->[4][0] ? 3 : 4 ) -``` - -This leads the whole function to be: + * and join the array together with '`/`' - we add the `''` so that we + get the leading '`/`'. ```perl -sub find_val_dnf { - my($v,$m) = @_; - return $v < $m->[0][0] || $v > $m->[4][4] - ? 0 - : 0 + grep { $v == $_ } @{$m->[ $v < $m->[3][0] - ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 ) - : ( $v < $m->[4][0] ? 3 : 4 ) - ]}; +sub canonical_path_double { + my $directory_path = shift; + my @directory_names = grep { $_ ne '' && + $_ ne '.' } + split m{/}, + $directory_path; + + OUTER: while(1) { + foreach (1..$#directory_names) { + next unless $directory_names[$_] eq '..'; + splice @directory_names,$_-1,2; + next OUTER; + } + last; + } + shift @directory_names if @directory_names && $directory_names[0] eq '..'; + return join '/','',@directory_names; } ``` -### Do not flatten - optimal solution +### Array - 1-loop with backtracking - * As well as not using `map` or `grep` on the outside array we - do so on the inner array. +To avoid the double loop - we can use back-tracking so that when we +find a "`..`" instead of restarting the loop we just step back one +entry in the array - we see `$pointer--` after the `splice` +statement below. - * Once we have chosen the correct row then we split it into - two halves and check for a match in each half depending on - whether it matches the middle value or is above/below. + * If `$pointer == 0` then the `..` is at the start of the directory + list so we just remove it (we can't go back beyond the root + directory). ```perl -sub find_val_dnf_optimal { - my($v,$m,$t) = @_; - - return $v < $m->[0][0] || $v > $m->[4][4] - ? 0 - : ( $t = $m->[ $v < $m->[3][0] - ? ( $v < $m->[1][0] ? 0 : $v < $m->[2][0] ? 1 : 2 ) - : ( $v < $m->[4][0] ? 3 : 4 ) - ] ) && - ( return $v == $t->[2] ? 1 : - $v < $t->[2] ? - ( $v == $t->[0] || $v == $t->[1] ? 1 : 0) : - ( $v == $t->[4] || $v == $t->[3] ? 1 : 0) ); +sub canonical_path_array { + my $directory_path = shift; + my @directory_names = grep { $_ ne '' && + $_ ne '.' } + split m{/}, + $directory_path; + my $pointer = 1; + while( $pointer < @directory_names ) { + if( $directory_names[$pointer] eq '..' ) { + if( $pointer > 0 ) { + splice @directory_names, $pointer - 1, 2; + $pointer --; + } else { + shift @directory_names; + } + } else { + $pointer ++; + } + } + return join '/','', @directory_names; } ``` -### Do not flatten - generalised solution - - * Above we used the information that we are supplied with a 5x5 matrix - if the matrix is an arbitrary size we can change the two loops to - use a binary search algorithm. - - * Notes: - - * We use `($l+$r)>>1` to find the mid point - this is more efficient - than using `int(($l+$r)/2)`; Bit-shift operators are useful if you - are multiplying or dividing by 2. - - * The loops differ slightly because one is looking for an entry which - is within a range - the other for a specific value. The differencs - you will note are (1) the first loop do `$r=$n` where in the - second we do `$r=$n-1` - as the first is a range based search the - second an exact match; (2) The second loop checks for equality before - the binary search check - and returns 1 if true. The end eheck also - becomes $l<=$r as $r becomes less than $l when we have missed the - entry. +### String - 1-loop -```perl -sub find_val_general_dnf { - my($v,$m)=@_; - return 0 if$v<$m->[0][0]||$v>$m->[-1][-1]; - my($n,$l,$r)=(0,0,@{$m}-1); - $v>$m->[$n=($l+$r)>>1][-1]?($l=$n+1):($r=$n)while$r!=$l; - ($l,$r)=(0,@{$m=$m->[$l]}-1); - ($v==$m->[$n=($l+$r)>>1])?(return 1):$v>$m->[$n]?($l=$n+1):($r=$n-1)while$l<=$r; - return 0; -} -``` -Timings using `Benchmark::cmpthese` +Rather than store the parts in a list - we use a string to store +this canonical path - and we either add directories to the end of +it or remove them if we come across a "`..`". -| | Rate | Search | GrepMap | GrepGrep | *Flatten* | DNFGen | GrepExt | DNF | DNFOpt | -| --------- | --------: | -----: | ------: | -------: | --------: | -----: | ------: | -----: | -----: | -| Search | 4,316/s | -- | -20% | -41% | *-47%* | -64% | -71% | -79% | -81% | -| GrepMap | 5,391/s | 25% | -- | -26% | *-34%* | -55% | -64% | -73% | -76% | -| GrepGrep | 7,310/s | 69% | 36% | -- | *-10%* | -39% | -52% | -64% | -68% | -| *Flatten* | *8,123/s* | *88%* | *51%* | *11%* | *--* | *-33%* | *-46%* | *-60%* | *-65%* | -| DNFGen | 12,063/s | 179% | 124% | 65% | *48%* | -- | -20% | -40% | -47% | -| GrepExt | 15,106/s | 250% | 180% | 107% | *86%* | 25% | -- | -25% | -34% | -| DNF | 20,161/s | 367% | 274% | 176% | *148%* | 67% | 33% | -- | -12% | -| DNFOpt | 22,883/s | 430% | 324% | 213% | *182%* | 90% | 51% | 14% | -- | + * We achieve the former - by just concatenating a "`/`" and the + name to the end of the string. -We can see that this "optimal method" is about 5 times more efficient than the "search" -function we originally tried - even the generic double binary search is abour 2.5 times as -efficient than the binary search over the flattened matrix. + * The latter we strip this string off with a regex substitution: + `s{/[^/]+\Z}{}`. This works in all cases wherever the "`..`" + is in the list. -## Test script - -For completeness - this is the test and benchmarking script + * Note as we are looping through the array we can ignore the + grep and just skip out of the loop if the name is either "" + or "`.`". ```perl -#!/usr/local/bin/perl - -use strict; - -use warnings; -use feature qw(say); -use Test::More; -use Benchmark qw(cmpthese); - -my $N = @ARGV ? $ARGV[0] : 100_000; - -my $matrix = [ - [ 1, 2, 3, 5, 7 ], - [ 9, 11, 15, 19, 20 ], - [ 23, 24, 25, 29, 31 ], - [ 32, 33, 39, 40, 42 ], - [ 45, 47, 48, 49, 50 ], -]; - -## Create a test set - numbers from -10 to 60... -my %TEST_SET = map { $_ => 0 } (my @KEYS = -10..60); - -## Set all to 0, and then iterate through the elements of the matrix -## and set the numbers in the list to 1.... - -$TEST_SET{$_} = 1 foreach map { @{$_} } @{$matrix}; - -## Run the original PWC test examples... -is( find_val_search( 35, $matrix ), 0 ); -is( find_val_search( 39, $matrix ), 1 ); - -is( find_val_grep_grep( 35, $matrix ), 0 ); -is( find_val_grep_grep( 39, $matrix ), 1 ); -#... -#... -#... -is( find_val_find_val_dnf_optimal( 35, $matrix ), 0 ); -is( find_val_find_val_dnf_optimal( 39, $matrix ), 1 ); - -## Now run our full test set - from -10 to 60. This covers -## all cases within the list and a few either side... - -is( find_val_search( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS; -is( find_val_grep_grep( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS; -#... -#... -#... -is( find_val_dnf_optimal( $_, $matrix ), $TEST_SET{$_} ) foreach @KEYS; - -done_testing(); - -cmpthese( $N, { - 'Search' => sub { find_val_search( $_, $matrix ) foreach @KEYS; }, - 'GrepGrep' => sub { find_val_grep_grep( $_, $matrix ) foreach @KEYS; }, - #... - #... - #... - 'DNFOpt' => sub { find_val_dnf_optimal( $_, $matrix ) foreach @KEYS; }, -} ); +sub canonical_path_string_fast { + my $path = shift; + my @directories = split m{/}, $path; + my $canonical_path = ''; + foreach my $directory_name ( @directories ) { + next if $directory_name eq ''; + next if $directory_name eq '.'; + if( $directory_name eq q(..) ) { + $canonical_path =~ s{/[^/]+\Z}{}; + } else { + $canonical_path .= q(/) . $directory_name; + } + } + return $canonical_path; +} ``` +### String fast - 1-loop -## Addendum 1 - using hash keys +Regexs are not the fastest way to perform simple matches strings +(and intern to modify them). We can speed up the trimming of the +canonical path by replacing the regex solution by using `rindex` +and the four-parameter version of `substr`. -As some people have used hash slices to solve this problem - I thought I -would look into this to -- Colin thought this would help in the perl -review! I've added the following function: + * `rindex $str, $needle` finds the index of `$needle` in `$str`. + Here we use it to find the last `/` in string. -```perl -sub find_val_hash { - my %list; - @list{ map { @{$_} } @{$_[1]} } = (); ## Quickest - return exists $list{$_[0]} ? 1: 0; -} -``` + * `substr $str, $offset, $length, $substitute` finds the chunk of + the string `$str` from `$offset` of given length `$length`. If a + fourth parameter is set then this region of the string is replaced + by `$substitute`. -which converts the nested arrayref into an array for the keys of `%list` -and then returns 1 if the value of a key in the hash. Like other methods -this throws away the fact that the array is originally sorted... + * We can use this to truncate the string by doing: + + `substr $path, rindex( $path, '/' ), ~0, '';` -Looking at performance - this is approximately 53% slower than the search -algorithm, and therefore less than a tenth of the speed of the optimal -solution. + In the two-parameter version of `substr` if we omit length then + this returns to the end of the string. In the four-parameter + version - we can't omit this - so have to use an alternative + value - it has to be bigger than the longest string possible. + We use the "bitwise-negation" operator "`~`" to generate the + largest value possible. This is: 18,446,744,073,709,551,615 or + just shy of 16 Exabytes - I believe this should be big enough! -## Addendum 2 - passing an array rather than an arrayref + (Note you can use -ve numbers but there is no way of doing `-0` + to trim to the end of the line) -I quickly checked to see if by passing the matrix as an array rather than -an arrayref was any faster - one less derefernce - but it is not - it -appears to be roughtly the same and speed sometimes faster. +The script then becomes: ```perl -sub flatten_array { - shift @_; - my @list = map { @{$_} } @_; - return 1; -} +sub canonical_path_string_fast { + my $path = shift; + my @directories = split m{/}, $path; + my $canonical_path = ''; + foreach my $directory_name ( @directories ) { + next if $directory_name eq ''; + next if $directory_name eq '.'; + if( $directory_name eq q(..) ) { + substr $canonical_path, + rindex( $canonical_path, '/' ), + ~0, ''; + } else { + $canonical_path .= q(/) . $directory_name; + } + } + return $canonical_path; +} ``` -## Addendum 3 - using `List::Util` methods +## "Compact perl code" AKA 1-liners.. -Just out of interest to compare the core `map` and `grep` against the various `List::Util` functions `first`/`any`, I've added 3 more functions: +Now we can look at how we can compact this code. Here though we +need to consider the trade off between size and performance - the +smallest code is not necessarily the fastest - as some of the +tricks to make the code compact also make it slower. -```perl -sub find_val_list_util { - my($v,$m) = @_; - my $t = first { $_->[-1] >= $v } @{$m}; - return ($t && any { $v == $_ } @{$t}) ? 1 : 0; -} - -sub find_val_any_any { - my($v,$m) = @_; - return (any { $v>=$_->[0] && $v <=$_->[-1] && (any { $_ == $v } @{$_}) } @{$m}) ? 1 : 0; -} +### The array code... -sub find_val_any_any_naive { - my($v,$m) = @_; - return (any { any { $_ == $v } @{$_} } @{$m}) ? 1 : 0; -} -``` +We have two versions of the code - which are slightly different +The `canonical_path_compact_opt` function is probably closer to +the 1-loop array function above. We use nested-ternaries to replace +the `if else` blocks. -The last one is similar to the `grep - grep` method above but uses `any` - now in theory this -should be faster as `any` stops at the first occcurance - but as `grep` is core and `any` is -not there will be a performance penalty - especially for such a small set of values. +As well as using ternaries to make the code shorter - we use a few +other of our "shortening" tricks: -To speed things upw ithe add a check to see if the value is inside the row greater/equal to the -first value and less than/equal to the last value. This makes a big difference. + * We use variable `$a` as this is a variable we don't need to `my`. + * We also store the split path name in `@_`, this has two advantages: + we don't need to `my` it and secondly - we can just use `shift` + without using a variable in the while loop - as `shift` in a + function shifts off `@_` -## Addendum 4 - caching the "flattening" + * We use **yoda** comparisons ( `"value" eq $variable` ) rather tha + the more normal `$variable eq "value"` as it means we can save a + byte { `$var eq''` vs `''eq$var` } as we don't need the extra space. -Optimizations like these are great if you are in the middle of a tight loop and calling the function a number -of times - if this is the same matrix each time - we may get some performance boosts by pre-flattening the matrix. + * We re-order the if/else so that the we unravel it into an: + `if() { } ifelse() {} else {}` format which is better for + nested ternaries - even if it may seen to be less readable. -I've included two more functions "Grep-@" & "Hash-%" which replicate "Grep-Map" and "Hash" to show how this can -speed things up.. The compute on the fly hash is the slowest method - but pre-cacheing the hash is 30 times faster, -the pre-flattened array speeds up the grep - but only by a factor of 2 and so is still only half the speed of -the *do not flatten* versions. + * In the code above we separated the `$pointer - 1` in the `splice` + line with the `$pointer--` in the next line to back-track. We can + use `--$pointer` to achieve this all within the splice statement. -**Note:** The second parameter in each of these is the respective flattened arrayref or hashref. + * To futher shorten the code in `canonical_path_compact` we + replace the `grep` clause with a regular expression which is + 7-bytes shorter. ```perl -sub find_val_grep_pre { - my($v,$m)=@_; - return 0 + grep { $_ == $v } @{$m}; +sub canonical_path_compact_opt { +## Remove the regex in the grep and use ternarys... +$a=1,@_=grep{''ne$_&&'.'ne$_}split/\//,shift; +$_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_; +join'/','',@_ } -sub find_val_hash_pre { - return exists $_[1]{$_[0]} ? 1: 0; -} -``` - -| Method | Rate | Hash | AANaive | Search | GrepMap | AnyAny | ListUtil | GrepGrep | *Flatten* | *Flatten@* | **preGrep** | DNFGen | GrepExt | DNF | DNFOpt | **preHash** | -| ----------- | -----------: | --------: | --------: | --------: | --------: | --------: | --------: | --------: | ---------: | ---------: | ----------: | -------: | -------: | -------: | -------: | ---------: | -| Hash | 2,019/s | -- | -26% | -59% | -63% | -67% | -69% | -73% | *-76%* | *-76%* | **-82%** | -83% | -87% | -90% | -91% | **-98%** | -| AANaive | 2,714/s | 34% | -- | -44% | -50% | -56% | -58% | -64% | *-68%* | *-68%* | **-75%** | -78% | -82% | -87% | -88% | **-97%** | -| Search | 4,873/s | 141% | 80% | -- | -10% | -20% | -24% | -35% | *-42%* | *-42%* | **-56%** | -60% | -68% | -76% | -79% | **-94%** | -| GrepMap | 5,435/s | 169% | 100% | 12% | -- | -11% | -15% | -27% | *-35%* | *-36%* | **-51%** | -55% | -64% | -73% | -76% | **-94%** | -| AnyAny | 6,127/s | 204% | 126% | 26% | 13% | -- | -5% | -18% | *-27%* | *-27%* | **-44%** | -50% | -60% | -70% | -73% | **-93%** | -| ListUtil | 6,427/s | 218% | 137% | 32% | 18% | 5% | -- | -14% | *-24%* | *-24%* | **-42%** | -47% | -58% | -68% | -72% | **-92%** | -| GrepGrep | 7,452/s | 269% | 175% | 53% | 37% | 22% | 16% | -- | *-11%* | *-12%* | **-32%** | -39% | -51% | -63% | -68% | **-91%** | -| *Flatten* | 8,418/s* | *317%* | *210%* | *73%* | *55%* | *37%* | *31%* | *13%* | *--* | *-0%* | ***-24%*** | *-31%* | *-45%* | *-58%* | *-63%* | ***-90%*** | -| *Flatten@* | 8,446/s* | *318%* | *211%* | *73%* | *55%* | *38%* | *31%* | *13%* | *0%* | *--* | ***-23%*** | *-31%* | *-45%* | *-58%* | *-63%* | ***-90%*** | -| **preGrep** | **11,013/s** | **446%** | **306%** | **126%** | **103%** | **80%** | **71%** | **48%** | ***31%*** | ***30%*** | **--** | **-10%** | **-28%** | **-46%** | **-52%** | **-87%** | -| DNFGen | 12,195/s | 504% | 349% | 150% | 124% | 99% | 90% | 64% | *45%* | *44%* | **11%** | -- | -20% | -40% | -47% | **-86%** | -| GrepExt | 15,244/s | 655% | 462% | 213% | 180% | 149% | 137% | 105% | *81%* | *80%* | **38%** | 25% | -- | -25% | -34% | **-82%** | -| DNF | 20,243/s | 903% | 646% | 315% | 272% | 230% | 215% | 172% | *140%* | *140%* | **84%** | 66% | 33% | -- | -12% | **-76%** | -| DNFOpt | 22,936/s | 1036% | 745% | 371% | 322% | 274% | 257% | 208% | *172%* | *172%* | **108%** | 88% | 50% | 13% | -- | **-73%** | -| **preHash** | **84,746/s** | **4098%** | **3022%** | **1639%** | **1459%** | **1283%** | **1219%** | **1037%** | ***907%*** | ***903%*** | **669%** | **595%** | **456%** | **319%** | **269%** | **--** | - -## Summary... - -So as we can see our **DNFOpt** (`find_val_dnf_optimal`) method is the optimal method running at approx 1,700,000 calculations per second, followed by the unoptimized **DNF** (`find_val_dnf`) method at approximately 1,500,000 caclulations per second - these are both optimized to know the matrix is 5x5. - -Interestingly the fastest method (without explicitly coding for 5x5) is the optimized version of the `grep`x`grep` of the **GrepExt** (`find_val_grep_grep_ext`) method at approximately 1,100,000 calculations per second. This slighlty nudges out the generic form of the nested binary search **DNFGen** (`find_val_general_dnf`) method at 900,000 calculations per second. - -Other things to note..: - * `grep`x`grep` method **GrepGrep** (`find_val_grep_grep`) is faster than all the versions using `List::Util` using `any` and `first` by approximately 15%, actually comparing like for like with **AANaive** (`find_val_any_any_naive`) is 2.75 times faster... I guess this is because `grep` is built in... There may be a sweet point where this changes as the matrix gets large though. - * The worst methods are: **Hash** (`find_val_hash`), **AANaive** (`find_val_any_any_naive`) and **Search** (`find_val_search`) at approximately 150,000, 200,000 and 350,000 calculations per second. - * The range between best and worst is an order of magnitude. - * I haven't included in this summary the two **preGrep** and **preHash** methods - which pre-flatten the matrix - and note that the **preHash** method is the fastest by far - nearly 4 times faster than are *best* method **DNFOpt**, even though the **Hash** method was worse (the ration of the two hash methods is approximately 40:1). This shows that if it is possible to cache some part of the calculation it can greatly improve performance if called lots of time. - * All these comparisons are based on the 5x5 matrix in the question - and relative speeds may change as the matrix dimensions get larger. - -** Quick extra addendum - -I've added another method which implements a binary search across the unflattened matrix - by effectively flattening the matrix and a 0..n location back into x,y-coorinates... - -```perl -sub find_val_binary { - my($v,$m)=@_; - return 0 if$v<$m->[0][0]||$v>$m->[-1][-1]; - my $size = @{$m}; - my($v2,$n,$l,$r)=(0,0,0,$size*$size-1); - while($l<=$r) { - $n=($l+$r)>>1; - return 1 if $v == ($v2 = $m->[$n/$size][$n%$size]); - $v>$v2?($l=$n+1):($r=$n-1); - } - return 0; +sub canonical_path_compact { +## Make shorted by using regex and ternarys +$a=1,@_=grep{!/^\.?$/}split/\//,shift; +$_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_; +join'/','',@_ } ``` -Interestingly this is approximately 25% slower than the **DNFGen** method... I would think because of the two *division operations* when getting the matrix value. - -| | Rate | Binary | DNFGen | -| ------ | ------: | -----: | -----: | -| Binary | 9200/s | -- | -25% | -| DNFGen | 12225/s | 33% | -- | - -# Challenge 2 - Ordered Letters - -**Given a word, you can sort its letters alphabetically (case insensitive). -For example, “beekeeper” becomes “beeeeekpr” and “dictionary” becomes -“acdiinorty”.** - -**Write a script to find the longest English words that don’t change when -their letters are sorted.** - -## Dictionaries available - -In this example we will consider the dictionaries available in Ubuntu, -as I am in UK - I will use the British English dictionaries: +### The string code - * `/usr/share/dict/british-english-small` - * `/usr/share/dict/british-english-large` - * `/usr/share/dict/british-english-huge` - * `/usr/share/dict/british-english-insane` +Here we re-implement the two string algorithms in compact 1-liners. -## Assumptions into what really is a word + * `canonical_path_fast` and `canonical_path_fastest` correspond + directly to the two methods above. -These dictionaries contain a number of "Words" which we wouldn't -necessarily consider to be words. To simplify this we will only -chose words which are all lowercase. + * `canonical_path_short` replaces the equality checks for "" and + "`.`" with the regex as we saw in the array code. -So we will filter these words out... - -A summary of the four dictionaries above give the following counts: - -| Name: | # words | # trimmed | -| ------ | ------: | --------: | -| small | 50,790 | 39,781 | -| large | 166,828 | 113,695 | -| huge | 344,861 | 245,593 | -| insane | 654,299 | 427,891 | - -## Solution to challenge 2 - -We will collect all of the words that meet this requirement. -We will collect them in an array `@max`. The first value will -be the length of the words in the list and the rest - -There are 4 parts to the loop... - - * The filters as above... - - `!/[^a-z]/` - -* A filter that skips words shorter than the max length - - `$max[0] <= length $_` - - * The calculation to see if the word matches - - `lc $_ eq join q(), sort split //, lc $_` - - * The code to either replace the array `@max` with the - newer longer word *or* to push the new word to the end - of the list. + * `canonical_path_shortest` removes the need for one of the + ternary operators by performing string multiplication when + adding the directory to the list. If the regex returns true, + then `"/$_"x!/.../` is `"/$_"x 0` or "". If the regex returns + false then `"/$_"x!/.../` is `"/$_"x 1` or "`/$_`". - `$max[0] == length $_ ? ( push @max, $_ ) : ( @max = (length $_, $_) )` +```perl +sub canonical_path_shortest { +$a=''; +'..'ne$_?$a.="/$_"x!/^\.?$/:$a=~s#/[^/]+$## for split'/',shift; +$a +} -The full function is.... +sub canonical_path_short { +$a=''; +/^\.?$/?0:'..'ne$_?$a.="/$_":$a=~s#/[^/]+$## for split'/',shift; +$a +} -```perl -sub longest { - open my $fh, q(<), $_[0]; - my @max = (0); - (chomp) ## Remove newline character - && !/[^a-z]/ ## Remove non-lower case characters - && ( $max[0] <= length $_ ) - ## Remove words that are too short - && ( $_ eq join q(), sort split //, $_ ) - ## Check the word is unchanged when the - ## letters are sorted - && ( $max[0] == length $_ - ? ( push @max, $_ ) - : ( @max = (length $_, $_) ) - ) - ## If the word is the same length as the maximal word - ## push it onto @max - so we store all the longest words - ## with maximum length. - ## If the word is longer than the max length (1st entry - ## in @max - reset max to include the new max length and - ## the word. - while <$fh>; - return "$_[0] > @max"; - ## Return the name of the file used, the size of the words - ## and a complete list of the words of that length. +sub canonical_path_fast { +$a=''; +'.'ne$_&&''ne$_&&('..'ne$_?$a.="/$_":$a=~s#/[^/]+$##)for split'/',shift; +$a } -``` -If you like the code more compact - here it is without the comments... +sub canonical_path_fastest { +$a=''; +'.'ne$_&&''ne$_&&('..'ne$_?$a.='/'.$_:substr$a,rindex($a,'/'),~0,'')for split'/',shift; +$a +} -```perl -sub longest_no_comments { - open my $fh, q(<), $_[0]; - my @m = (0); - (chomp)&&!/[^a-z]/&&($m[0]<=length$_) - &&($_ eq join q(),sort split//,$_) - &&($m[0]==length$_?(push@m,$_):(@m=(length$_,$_))) - while <$fh>; - return "$_[0] > @m"; +my $s; +sub canonical_path_global { +$s=''; +'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split/\//,shift; +$s } ``` -## The results... - -Here are the results for each of the four databases: - - * `british-english-small`: max length 6 - 21 words - * abhors - * accent - * accept - * access - * accost - * almost - * begins - * bellow - * billow - * cellos - * chills - * chilly - * chimps - * chintz - * choosy - * choppy - * effort - * floors - * floppy - * glossy - * knotty -* `british-english-large`: max length 7 - 1 word - * billowy -* `british-english-huge`: max length 7 - 4 words - * beefily - * billowy - * chikors - * dikkops -* `british-english-insane`: max length 8 - 1 word - * aegilops - -## Some definitions... - -All the 6 letter words and billowy and beefily are quite common words, but there are three that people may not have heard of these are all species names. - - * **chikors** - An alternative spelling of chukars - A species of partridge native to central Asia (*Alectoris chukar*). - * **dikkops** - A bird of the family Burhinidae. The stone curlew, thick-knee. (From afrikaans) *dik*-*kop* or thick head - * **aegilops** - A genus of Eurasian and North American plants in the grass family, Poaceae. They are known generally as goat grasses. Some species are known as invasive weeds in parts of North America. +## Performance of different methods + +We will look at some different versions of the code: + + * whether we use an array or string to accumulate the resultant path + * Whether we use "readable" code or Perl hacks and tricks + +To see what aspects of our code makes it faster or slower + +### Summary of methods.. + * "Long form" Perl... + * `canonical_path_double` - Using a double loop + * `canonical_path_array` - Using backtracking instead of inner loop + * `canonical_path_string` - Use a string as the accumulator and mapping + * `canonical_path_string_fast` - As above - but using substr/rindex + * "One-liner" perl {arrays} + * `canonical_path_compact` - short version of array code + * `canonical_path_compact_opt` - optimized version of above - 1-less regex + * "One-liner" perl {strings} + * `canonical_path_shortest` - most compact method + * `canonical_path_short` - compact method + * `canonical_path_fast` - replace one of the regex with equality checks + * `canonical_path_fastest` - replace other regex with substr/rindex + * `canonical_path_global` - as fastest but with global variable... + +### Performance of each method: + +| | Rate | @-st | $-st | $-sh | @-2l | @-cf | @-ft | $-cd | $-fa | $-cf | $-ft | $-gl | +| -------------- | -------: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | +| @-shortest | 17,483/s | -- | -21% | -22% | -29% | -30% | -37% | -38% | -43% | -52% | -59% | -59% | +| $-shortest | 22,026/s | 26% | -- | -2% | -11% | -11% | -21% | -21% | -28% | -40% | -48% | -49% | +| $-short | 22,472/s | 29% | 2% | -- | -9% | -10% | -20% | -20% | -26% | -39% | -47% | -48% | +| @-code-2loop | 24,631/s | 41% | 12% | 10% | -- | -1% | -12% | -12% | -19% | -33% | -42% | -43% | +| @-code-fastest | 24,876/s | 42% | 13% | 11% | 1% | -- | -11% | -11% | -18% | -32% | -42% | -42% | +| @-fastest | 27,933/s | 60% | 27% | 24% | 13% | 12% | -- | -0% | -8% | -24% | -35% | -35% | +| $-code-fast | 28,011/s | 60% | 27% | 25% | 14% | 13% | 0% | -- | -8% | -24% | -34% | -35% | +| $-fast | 30,488/s | 74% | 38% | 36% | 24% | 23% | 9% | 9% | -- | -17% | -29% | -29% | +| $-code-fastest | 36,765/s | 110% | 67% | 64% | 49% | 48% | 32% | 31% | 21% | -- | -14% | -15% | +| $-fastest | 42,735/s | 144% | 94% | 90% | 74% | 72% | 53% | 53% | 40% | 16% | -- | -1% | +| $-global-fast | 43,103/s | 147% | 96% | 92% | 75% | 73% | 54% | 54% | 41% | 17% | 1% | -- | + +### Summary +What we see is: + * that the string code is faster than the array code, + by around 20-40% + * using compact "1-liner" code can be approximately 10% + faster. + * but using less regex's and replacing them with + eq/ne for comparisons and `substr`/`rindex` for + replacement/trimming improves the speed the most. + * approx 25-30% for removing the comparison regex for checking + `' '` or `'.'` and replacing with two `eq`/`ne` + * approx 30-40% for removing the substitute of the string + from the last `'/'` to the end of the string, with `rindex` + and the the four parameter version of `substr`. + * combining the two seems to double the performance! + * switching from local to global variables gets a minor + gain (about 1%) again due to memory management. + +## Conclusion + +So short code is interesting - but is not by a long shot the +most efficient especially in respect of converting regexes into +`substr`/`index`/`rindex`, allocation of variables, even if we +keep it to a 1-liner. + +# Challenge 2 - Ordered Letters -- cgit From 2fd3fd6d3443c4c08c789d2116cb2bfa3d50be64 Mon Sep 17 00:00:00 2001 From: drbaggy Date: Thu, 13 May 2021 09:55:42 +0100 Subject: Update README.md --- challenge-112/james-smith/README.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/challenge-112/james-smith/README.md b/challenge-112/james-smith/README.md index 40f82a0eec..f34255640c 100644 --- a/challenge-112/james-smith/README.md +++ b/challenge-112/james-smith/README.md @@ -156,7 +156,7 @@ it or remove them if we come across a "`..`". or "`.`". ```perl -sub canonical_path_string_fast { +sub canonical_path_string { my $path = shift; my @directories = split m{/}, $path; my $canonical_path = ''; @@ -271,7 +271,8 @@ $a=1,@_=grep{''ne$_&&'.'ne$_}split/\//,shift; $_[$a]ne'..'?$a++:$a?splice@_,--$a,2:shift while$a<@_; join'/','',@_ } - +``` +```perl sub canonical_path_compact { ## Make shorted by using regex and ternarys $a=1,@_=grep{!/^\.?$/}split/\//,shift; @@ -302,25 +303,29 @@ $a=''; '..'ne$_?$a.="/$_"x!/^\.?$/:$a=~s#/[^/]+$## for split'/',shift; $a } - +``` +```perl sub canonical_path_short { $a=''; /^\.?$/?0:'..'ne$_?$a.="/$_":$a=~s#/[^/]+$## for split'/',shift; $a } - +``` +```perl sub canonical_path_fast { $a=''; '.'ne$_&&''ne$_&&('..'ne$_?$a.="/$_":$a=~s#/[^/]+$##)for split'/',shift; $a } - +``` +```perl sub canonical_path_fastest { $a=''; '.'ne$_&&''ne$_&&('..'ne$_?$a.='/'.$_:substr$a,rindex($a,'/'),~0,'')for split'/',shift; $a } - +``` +```perl my $s; sub canonical_path_global { $s=''; -- cgit From f134365b6625a05b5b74dac7d09401e9b7ef0141 Mon Sep 17 00:00:00 2001 From: drbaggy Date: Thu, 13 May 2021 23:37:25 +0100 Subject: Updated docs --- challenge-112/james-smith/README.md | 125 +++++++++++++++++++++++++++++++++++- challenge-112/james-smith/blog.txt | 1 + 2 files changed, 124 insertions(+), 2 deletions(-) create mode 100644 challenge-112/james-smith/blog.txt diff --git a/challenge-112/james-smith/README.md b/challenge-112/james-smith/README.md index f34255640c..eb5ea5a115 100644 --- a/challenge-112/james-smith/README.md +++ b/challenge-112/james-smith/README.md @@ -329,7 +329,7 @@ $a my $s; sub canonical_path_global { $s=''; -'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split/\//,shift; +'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split'/',shift; $s } ``` @@ -400,4 +400,125 @@ most efficient especially in respect of converting regexes into `substr`/`index`/`rindex`, allocation of variables, even if we keep it to a 1-liner. -# Challenge 2 - Ordered Letters +*e.g.* with the short code - we see the optimal short string code is +twice as efficient as the shortest version - and only about 33% longer. + +One of the interesting things is that there is some discussion that +avoiding concatenating strings by pushing them into an array and +joining them is supposedly faster than just concatenating.... This +seems to prove otherwise.. So don't assume everything you read - but +check it yourself! + +# Challenge 2 - Climb Stairs + +You are given `$n` steps to climb - Write a script to find out the +distinct ways to climb to the top. You are allowed to climb either +1 or 2 steps at a time. + +## Assumption + +Although not clear - I just assumed that the response was a single +numeric value. + +## Solution + +We first note that the formula for number of steps climbed can be +seen to be. + + `count_n = count_(n-1) + count_(n-2)` + +As the last step is either a 1-step (when there are therefore `count_(n-1)` +options to get to that step) or 2-step (when there are therefore `count_(n-2)` +options to get to that step)... + +This is a recognisable formula - it is just a fibonnaci sequence. + +## Brute force solution + +We could use a recursive method to get the fibonnaci values out - but +that would have function call overheads - rather we can use just two +variables to store the sequence, we define `$a` & `$b` to both be `1` +and then each iteration through we set `$a` to `$b` and `$b` to the sum +of `$a` & `$b`. We just then return the last value of `$b`. + +```perl +sub climb { + my($a,$b) = (1,1); + ($a,$b) = ($b,$a+$b) foreach 2..$_[0]; + return $b; +} +``` + +This uses one of the nice features of perl in the fact that you can +assign to more than one variable with the same statement, you often +see this when you flip two values over. + +Classically you would write: + +```perl +my $t = $a; +$a = $b; +$b = $t; +``` +but you in perl can write this as: +```perl +($a,$b)=($b,$a); +``` +without the need of the additionaly (temporary) variable. + +## Mathematical formula solution + +There is a formula for the `n`th fibonacci number which is: + +``` +fn = phi^n - 1/(-phi)^n + ------------------ + sqrt 5 +``` + +Where `phi` is the golden ratio or 1.618,033,988 == (1+sqrt 5)/2, this +number crops up in many different places from art to nature. + +To speed up the calculation we compute `(phi^n)` and to get the second +value we note that this can be written as `(-1)^n/(phi^n)`. So we only +need to calculate `(phi^n)` once. Also we note `(-1)^n` can be +rewritten as (n&1)||-1; + +I offer a number of different versions of this calculation... Each +one gets slightly faster... really depending on whether we define +the variable in the return statement or define it separately and +whether we use a local or global variable... + +```perl +sub climb_fib { + my $q = ((1 + sqrt 5)/2)**($_[0]+1); + return int(0.001+ ($q - ($_[0]&1?1:-1)/$q)*sqrt 0.2); +} + +sub climb_fib_1liner { + return int(0.001 + (($a = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$a)*sqrt 0.2); +} + +sub climb_fib_global { + $p = ((1 + sqrt 5)/2)**($_[0]+1); + return int(0.001+ ($p - ($_[0]&1?1:-1)/$p)*sqrt 0.2); +} + +sub climb_fib_1liner_global { + return int(0.001 + (($p = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$p)*sqrt 0.2); +} +``` + +For up to 50 steps then we note that the `climb_fib` function is about +6-7 times faster than the original `climb` function, by using the 1-liner +and global tricks this gets to about 7-8 times the speed. + +| | Rate | climb | fib | fib-1 | fib-g | fib1g | +| ------ | -------: | ----: | ----: | ----: | ----: | ----: | +| climb | 7,933/s | -- | -85% | -86% | -86% | -87% | +| fib | 52,770/s | 565% | -- | -6% | -8% | -11% | +| fib-1 | 56,180/s | 608% | 6% | -- | -3% | -5% | +| fib-g | 57,637/s | 627% | 9% | 3% | -- | -3% | +| fib-1g | 59,347/s | 648% | 12% | 6% | 3% | -- | + + diff --git a/challenge-112/james-smith/blog.txt b/challenge-112/james-smith/blog.txt new file mode 100644 index 0000000000..cbbc43d0ec --- /dev/null +++ b/challenge-112/james-smith/blog.txt @@ -0,0 +1 @@ +https://github.com/drbaggy/perlweeklychallenge-club/tree/master/challenge-112/james-smith -- cgit From d35f1bb53a1bea3868c34b4cab329b805c3fe2e5 Mon Sep 17 00:00:00 2001 From: drbaggy Date: Thu, 13 May 2021 23:38:35 +0100 Subject: some last minute changes --- challenge-112/james-smith/perl/ch-1.pl | 10 +++++----- challenge-112/james-smith/perl/ch-2.pl | 29 ++++++++++++++++++----------- 2 files changed, 23 insertions(+), 16 deletions(-) diff --git a/challenge-112/james-smith/perl/ch-1.pl b/challenge-112/james-smith/perl/ch-1.pl index 826c5edf30..88ff4642f7 100644 --- a/challenge-112/james-smith/perl/ch-1.pl +++ b/challenge-112/james-smith/perl/ch-1.pl @@ -283,7 +283,7 @@ sub canonical_path_short { ## Shorter by using regex rather than, interpolation ## and `$` rather than `/Z` $a=''; -/^\.?$/?0:'..'ne$_?$a.="/$_":$a=~s{/[^/]+$}{} for split/\//,shift; +/^\.?$/?0:'..'ne$_?$a.="/$_":$a=~s#/[^/]+$## for split'/',shift; $a } @@ -294,7 +294,7 @@ sub canonical_path_shortest { ## or //. This is the shortest script in terms of ## bytes - but also the slowest string version. $a=''; -'..'ne$_?$a.="/$_"x!/^\.?$/:$a=~s{/[^/]+$}{} for split/\//,shift; +'..'ne$_?$a.="/$_"x!/^\.?$/:$a=~s#/[^/]+$## for split'/',shift; $a } @@ -302,7 +302,7 @@ $a sub canonical_path_fast { ## Skip the regex for ''/'.' and replace with compares $a=''; -'.'ne$_&&''ne$_&&('..'ne$_?$a.="/$_":$a=~s{/[^/]+$}{})for split/\//,shift; +'.'ne$_&&''ne$_&&('..'ne$_?$a.="/$_":$a=~s#/[^/]+$##)for split'/',shift; $a } @@ -326,7 +326,7 @@ sub canonical_path_fastest { ## 16 EB (Exabytes) - I think that should be enough! $a=''; -'.'ne$_&&''ne$_&&('..'ne$_?$a.='/'.$_:substr$a,rindex($a,'/'),~0,'')for split/\//,shift; +'.'ne$_&&''ne$_&&('..'ne$_?$a.='/'.$_:substr$a,rindex($a,'/'),~0,'')for split'/',shift; $a } @@ -338,7 +338,7 @@ sub canonical_path_global { ## this can be something useful to know in very tight ## loops... $s=''; -'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split/\//,shift; +'.'ne$_&&''ne$_&&('..'ne$_?$s.='/'.$_:substr$s,rindex($s,'/'),~0,'')for split'/',shift; $s } diff --git a/challenge-112/james-smith/perl/ch-2.pl b/challenge-112/james-smith/perl/ch-2.pl index a5d78154ba..7650a2a06d 100644 --- a/challenge-112/james-smith/perl/ch-2.pl +++ b/challenge-112/james-smith/perl/ch-2.pl @@ -14,18 +14,22 @@ close $fh; my $N = @ARGV ? $ARGV[0] : 30; my $I = @ARGV > 1 ? $ARGV[1] : 100_000; +my $cache; -is(climb( $_), $ans[$_] ) foreach 1..$N; -is(climb_fib( $_), $ans[$_] ) foreach 1..$N; -is(climb_fib_1liner( $_), $ans[$_] ) foreach 1..$N; +is(climb( $_), $ans[$_] ) foreach 1..$N; +is(climb_fib( $_), $ans[$_] ) foreach 1..$N; +is(climb_fib_1liner( $_), $ans[$_] ) foreach 1..$N; +is(climb_fib_global( $_), $ans[$_] ) foreach 1..$N; +is(climb_fib_1liner_global( $_), $ans[$_] ) foreach 1..$N; done_testing(); cmpthese($I,{ - 'climb' => sub { climb( $_ ) foreach 0..$N; }, - 'fib-g' => sub { climb_fib_global( $_ ) foreach 0..$N; }, - 'fib-1' => sub { climb_fib_1liner( $_ ) foreach 0..$N; }, - 'fib' => sub { climb_fib( $_ ) foreach 0..$N; }, + 'climb' => sub { climb( $_ ) foreach 0..$N; }, + 'fib' => sub { climb_fib( $_ ) foreach 0..$N; }, + 'fib-g' => sub { climb_fib_global( $_ ) foreach 0..$N; }, + 'fib-1' => sub { climb_fib_1liner( $_ ) foreach 0..$N; }, + 'fib-1g' => sub { climb_fib_1liner_global( $_ ) foreach 0..$N; }, }); ## Once we look at the formula for climb - we @@ -42,9 +46,9 @@ cmpthese($I,{ ## temporarily sub climb { - my @climb = (1,1); - @climb = ($climb[1],$climb[0]+$climb[1]) foreach 2..$_[0]; - return $climb[1]; + my($a,$b) = (1,1); + ($a,$b) = ($b,$a+$b) foreach 2..$_[0]; + return $b; } my $p; @@ -59,6 +63,9 @@ sub climb_fib { } sub climb_fib_1liner { - return int(0.001 + (($p = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$p)*sqrt 0.2); + return int(0.001 + (($a = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$a)*sqrt 0.2); } +sub climb_fib_1liner_global { + return int(0.001 + (($p = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$p)*sqrt 0.2); +} -- cgit From 299611489f84e646577d3bbb170bfcea5cf3eab6 Mon Sep 17 00:00:00 2001 From: drbaggy Date: Fri, 14 May 2021 09:25:48 +0100 Subject: added approx to make faster --- challenge-112/james-smith/perl/ch-2.pl | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/challenge-112/james-smith/perl/ch-2.pl b/challenge-112/james-smith/perl/ch-2.pl index 7650a2a06d..541c0eb97b 100644 --- a/challenge-112/james-smith/perl/ch-2.pl +++ b/challenge-112/james-smith/perl/ch-2.pl @@ -16,20 +16,18 @@ my $N = @ARGV ? $ARGV[0] : 30; my $I = @ARGV > 1 ? $ARGV[1] : 100_000; my $cache; -is(climb( $_), $ans[$_] ) foreach 1..$N; -is(climb_fib( $_), $ans[$_] ) foreach 1..$N; -is(climb_fib_1liner( $_), $ans[$_] ) foreach 1..$N; -is(climb_fib_global( $_), $ans[$_] ) foreach 1..$N; -is(climb_fib_1liner_global( $_), $ans[$_] ) foreach 1..$N; +is(climb( $_), $ans[$_] ) foreach 0..$N; +is(climb_fib( $_), $ans[$_] ) foreach 0..$N; +is(climb_fib_1liner( $_), $ans[$_] ) foreach 0..$N; +is(climb_fib_approx( $_), $ans[$_] ) foreach 0..$N; done_testing(); cmpthese($I,{ 'climb' => sub { climb( $_ ) foreach 0..$N; }, 'fib' => sub { climb_fib( $_ ) foreach 0..$N; }, - 'fib-g' => sub { climb_fib_global( $_ ) foreach 0..$N; }, 'fib-1' => sub { climb_fib_1liner( $_ ) foreach 0..$N; }, - 'fib-1g' => sub { climb_fib_1liner_global( $_ ) foreach 0..$N; }, + 'fib-a' => sub { climb_fib_approx( $_ ) foreach 0..$N; }, }); ## Once we look at the formula for climb - we @@ -44,6 +42,13 @@ cmpthese($I,{ ## speeds it up futher - this is down to ## storing the caclulation of phi^(n+!) only ## temporarily +## +## Also we note that actually we can forgo this - by +## noting that the contribution to the fibonnaci +## value from the (1/phi)^n part is less than 1. So +## we can replace the formula with the approximation below. +## +## This works for all values of n>0; sub climb { my($a,$b) = (1,1); @@ -51,12 +56,6 @@ sub climb { return $b; } -my $p; -sub climb_fib_global { - $p = ((1 + sqrt 5)/2)**($_[0]+1); - return int(0.001+ ($p - ($_[0]&1?1:-1)/$p)*sqrt 0.2); -} - sub climb_fib { my $q = ((1 + sqrt 5)/2)**($_[0]+1); return int(0.001+ ($q - ($_[0]&1?1:-1)/$q)*sqrt 0.2); @@ -66,6 +65,6 @@ sub climb_fib_1liner { return int(0.001 + (($a = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$a)*sqrt 0.2); } -sub climb_fib_1liner_global { - return int(0.001 + (($p = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$p)*sqrt 0.2); +sub climb_fib_approx { + return int(0.4 + (0.5+sqrt 1.25)**($_[0]+1)*sqrt 0.2); } -- cgit From ddbafbec215102ed8c9c783cb86854a3ebe2d170 Mon Sep 17 00:00:00 2001 From: drbaggy Date: Fri, 14 May 2021 11:05:26 +0100 Subject: push some changes to cacheing --- challenge-112/james-smith/perl/ch-2.pl | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/challenge-112/james-smith/perl/ch-2.pl b/challenge-112/james-smith/perl/ch-2.pl index 541c0eb97b..65bd0ad0c9 100644 --- a/challenge-112/james-smith/perl/ch-2.pl +++ b/challenge-112/james-smith/perl/ch-2.pl @@ -3,7 +3,7 @@ use strict; use warnings; -use feature qw(say); +use feature qw(say state); use Test::More; use Benchmark qw(cmpthese); @@ -15,16 +15,23 @@ close $fh; my $N = @ARGV ? $ARGV[0] : 30; my $I = @ARGV > 1 ? $ARGV[1] : 100_000; my $cache; +my @glob_cache = (1,1); is(climb( $_), $ans[$_] ) foreach 0..$N; +is(climb_cache( $_), $ans[$_] ) foreach 0..$N; +is(climb_cache_glob( $_), $ans[$_] ) foreach 0..$N; is(climb_fib( $_), $ans[$_] ) foreach 0..$N; is(climb_fib_1liner( $_), $ans[$_] ) foreach 0..$N; is(climb_fib_approx( $_), $ans[$_] ) foreach 0..$N; +is(climb_lookup( $_), $ans[$_] ) foreach 0..$N; done_testing(); cmpthese($I,{ 'climb' => sub { climb( $_ ) foreach 0..$N; }, + 'cache' => sub { climb_cache( $_ ) foreach 0..$N; }, + 'g-cch' => sub { climb_cache_glob( $_ ) foreach 0..$N; }, + 'look' => sub { climb_lookup( $_ ) foreach 0..$N; }, 'fib' => sub { climb_fib( $_ ) foreach 0..$N; }, 'fib-1' => sub { climb_fib_1liner( $_ ) foreach 0..$N; }, 'fib-a' => sub { climb_fib_approx( $_ ) foreach 0..$N; }, @@ -56,6 +63,17 @@ sub climb { return $b; } +sub climb_cache { + state @cache = (1,1); + $cache[$_]=$cache[$_-1]+$cache[$_-2] foreach @cache .. $_[0]; + return $cache[$_[0]]; +} + +sub climb_cache_glob { + $glob_cache[$_]=$glob_cache[$_-1]+$glob_cache[$_-2] foreach @glob_cache .. $_[0]; + return $glob_cache[$_[0]]; +} + sub climb_fib { my $q = ((1 + sqrt 5)/2)**($_[0]+1); return int(0.001+ ($q - ($_[0]&1?1:-1)/$q)*sqrt 0.2); @@ -68,3 +86,7 @@ sub climb_fib_1liner { sub climb_fib_approx { return int(0.4 + (0.5+sqrt 1.25)**($_[0]+1)*sqrt 0.2); } + +sub climb_lookup { + return $ans[$_[0]]; +} -- cgit From 9fb1f7c6553ec7338e3e86ed0c35818d74f8b421 Mon Sep 17 00:00:00 2001 From: drbaggy Date: Fri, 14 May 2021 11:06:05 +0100 Subject: Update README.md --- challenge-112/james-smith/README.md | 96 ++++++++++++++++++++++++++----------- 1 file changed, 68 insertions(+), 28 deletions(-) diff --git a/challenge-112/james-smith/README.md b/challenge-112/james-smith/README.md index eb5ea5a115..3bc69d2fb2 100644 --- a/challenge-112/james-smith/README.md +++ b/challenge-112/james-smith/README.md @@ -466,13 +466,41 @@ but you in perl can write this as: ``` without the need of the additionaly (temporary) variable. +## Building a cache using state or global variables - or pre-computing + +If the call is being made repeatedly we can cache results - either +using a "`state`" variable within the function or a "`global`" variable. + +```perl +sub climb_cache { + state @cache = (1,1); + $cache[$_]=$cache[$_-1]+$cache[$_-2] foreach @cache .. $_[0]; + return $cache[$_[0]]; +} + +my @glob_cache = (1,1); +sub climb_cache_glob { + $glob_cache[$_]=$glob_cache[$_-1]+$glob_cache[$_-2] foreach @glob_cache .. $_[0]; + return $glob_cache[$_[0]]; +} +``` + +Finally we look at the cache check overhead by pre-computing the values into +a cache and then run: + +```perl +sub climb_lookup { + return $ans[$_[0]]; +} +``` + ## Mathematical formula solution -There is a formula for the `n`th fibonacci number which is: +There is Binet's formula for the `n`th fibonacci number which is: ``` -fn = phi^n - 1/(-phi)^n - ------------------ + phi^n - 1/(-phi)^n +fn = ------------------ sqrt 5 ``` @@ -482,12 +510,15 @@ number crops up in many different places from art to nature. To speed up the calculation we compute `(phi^n)` and to get the second value we note that this can be written as `(-1)^n/(phi^n)`. So we only need to calculate `(phi^n)` once. Also we note `(-1)^n` can be -rewritten as (n&1)||-1; +rewritten as `n&1?1:-1`; + +In reality we don't even need to do this last trick, the contribution +to the sum of '(-1)^n/(phi^n)/sqrt 5' is going to be less than `0.5` +for all `n>=0` we can just reduce the formula to the first part to -I offer a number of different versions of this calculation... Each -one gets slightly faster... really depending on whether we define -the variable in the return statement or define it separately and -whether we use a local or global variable... +``` +fn = floor (phi^n/sqrt 5) +``` ```perl sub climb_fib { @@ -499,26 +530,35 @@ sub climb_fib_1liner { return int(0.001 + (($a = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$a)*sqrt 0.2); } -sub climb_fib_global { - $p = ((1 + sqrt 5)/2)**($_[0]+1); - return int(0.001+ ($p - ($_[0]&1?1:-1)/$p)*sqrt 0.2); -} - -sub climb_fib_1liner_global { - return int(0.001 + (($p = ((1+sqrt 5)/2)**($_[0]+1)) - ($_[0]&1?1:-1)/$p)*sqrt 0.2); +sub climb_fib_approx { + return int(0.4 + (0.5+sqrt 1.25)**($_[0]+1)*sqrt 0.2); } ``` -For up to 50 steps then we note that the `climb_fib` function is about -6-7 times faster than the original `climb` function, by using the 1-liner -and global tricks this gets to about 7-8 times the speed. - -| | Rate | climb | fib | fib-1 | fib-g | fib1g | -| ------ | -------: | ----: | ----: | ----: | ----: | ----: | -| climb | 7,933/s | -- | -85% | -86% | -86% | -87% | -| fib | 52,770/s | 565% | -- | -6% | -8% | -11% | -| fib-1 | 56,180/s | 608% | 6% | -- | -3% | -5% | -| fib-g | 57,637/s | 627% | 9% | 3% | -- | -3% | -| fib-1g | 59,347/s | 648% | 12% | 6% | 3% | -- | - - +## Analysis and conclusion + +The following are data for computing all values up to "50 steps". + +| | Rate | climb | fib | fib-1 | cache | g-cch | fib-a | look | +| ----- | -------: | ----: | ----: | ----: | ----: | ----: | ----: | ----: | +| climb | 7,145/s | -- | -86% | -88% | -89% | -89% | -92% | -96% | +| fib | 52,854/s | 640% | -- | -8% | -16% | -21% | -39% | -72% | +| fib-1 | 57,208/s | 701% | 8% | -- | -9% | -14% | -34% | -70% | +| cache | 62,657/s | 777% | 19% | 10% | -- | -6% | -28% | -67% | +| g-cch | 66,489/s | 831% | 26% | 16% | 6% | -- | -23% | -65% | +| fib-a | 86,505/s | 1,111% | 64% | 51% | 38% | 30% | -- | -54% | +| look | 189,394/s | 2,551% | 258% | 231% | 202% | 185% | 119% | -- | + + + * Using "Binet's" formula we see we get approx `8x` the speed of + the original `climb` function. + * Using the approximation to Binet's formulate we see we get a factor + of about `12x` speed up. + * Using the cache seems to give about a `9x` speed gain - the `global` + variable version is better than the `state` version. + * Interestingly if you pre-compute the cache then the speed gain is + over `25x` to the original and `3x` times the speed of the basic + cache function - this is probably due to the overhead of checking + to see if the number is already in the cache. + +So some food for thought on how to best handle calls within tight loops. -- cgit