perl 5 solutions - need to rewrite for perl 6

author: bagheera-sands <git@sandsscouts.org.uk> 2019-04-23 09:19:55 +0100
committer: bagheera-sands <git@sandsscouts.org.uk> 2019-04-23 09:19:55 +0100
commit: 75c977b4ce2b187f170e9d1c739071707b4e9237 (patch)
tree: 77b6048723ee4e60b664d0dd6036c1a2475bda16
parent: 5ee061e57de12fb9071bb2fcd3a26ea7ce2cecf6 (diff)
download: perlweeklychallenge-club-75c977b4ce2b187f170e9d1c739071707b4e9237.tar.gz
perlweeklychallenge-club-75c977b4ce2b187f170e9d1c739071707b4e9237.tar.bz2
perlweeklychallenge-club-75c977b4ce2b187f170e9d1c739071707b4e9237.zip
9 files changed, 241 insertions, 24 deletions
diff --git a/challenge-005/james-smith/README.md b/challenge-005/james-smith/README.md
index f58242a0a0..9bb2f55bdd 100644
--- a/challenge-005/james-smith/README.md
+++ b/challenge-005/james-smith/README.md
@@ -7,49 +7,116 @@ sugar....
 
 # Problem 1
 
-This is quirky same code between Perl 5 and Perl 6 produce different output - Perl 6 displays one more digit of Pi... so we need to extend the script by one byte (adding ";") to the end of the line...
-
+This is similar to the problems from last week - but really is actually simpler - we need to
+create a signature for each word - and check that against the words in the main dictionary.
 ```
-perl -Mfeature=say perl5/ch-1.pl
-perl6              perl6/ch-1.p6
+perl   perl5/ch-1.pl teams < /usr/share/dict/british-english-insane
+perl6  perl6/ch-1.p6 teams < /usr/share/dict/british-english-insane
 ```
 
-# Problem 2
+Now this can be slightly slow - as it is actioning the sort for every set of letters - we
+can short cut this if the word has a different number of letters... so an extra check saves
+us about 66-75% of the time... [ As this is an optimized function we complete the
+optimization by removing the function call overhead ]
 
-## Perl 5
+# Problem 2
 
-This is a nice problem - I solved this in Perl 5 in two ways - firstly with nested loops - but this requires a label (which is ugly code) to break out of the inner loop and get to the next word. The neater solution requires encapsulating the inner loop inside a function and calling that. Returning true if the word can be matched and false otherwise.
+Using the same signature function above we create a data structure containing every key we
+found with their words, and list those who have the highest value...
 
-The first part collects together the counts of the letters and the second loops through each of the words to see if there are sufficient letters.
+```
+bash run-5.bash
+```
 
-The loop is destructive of the counts array so we pass a copy into the function rather than the usual pass by reference.. This effectively clones the counts array so we don't have to do this explicitly
+Looping through the 4 ubuntu dictionaries...
 
-For 2a see notes about perl 6 (this is non-destructive...)
-```
-perl perl5/ch-2.pl back < /usr/share/dict/british-english-insane
-perl perl5/ch-2a.pl back < /usr/share/dict/british-english-insane
 ```
+## Small
+  7 aeprs        pares parse pears rapes reaps spare spear
+
+Time taken: 0.24 (perl perl5/ch-2.pl /usr/share/dict/british-english-small)
+
+## Large
+ 10 aeprs        asper pares parse pears prase presa rapes reaps spare spear
 
-## Perl 6
+Time taken: 0.78 (perl perl5/ch-2.pl /usr/share/dict/british-english-large)
 
-Because Perl 6 can't pass by value - I've rewritten this to be non-destructive (the checkword function counts up and compares to the main count)... need to check for performance....
+## Huge
+ 12 aelrst       alerts alters artels lastre rastle ratels salter slater staler stelar talers tarsel
+ 12 aelst        least leats salet setal slate stale steal stela taels tales teals tesla
+ 12 aeprs        apers asper pares parse pears prase presa rapes reaps spaer spare spear
+ 12 aerst        arets aster rates reast resat stare stear strae tares taser tears teras
 
-The other change is split - you don't use the empty regex "//" to perform the split - rather the empty string (something frowned upon in Perl5) and to remove the rogue white-space split adds you need to include the additional flag `:skip-empty`
+Time taken: 1.70 (perl perl5/ch-2.pl /usr/share/dict/british-english-huge)
+
+## Insane
+ 17 aerst        arest arets aster astre earst rates reast resat serta stare stear strae tares tarse taser tears teras
+
+Time taken: 2.91 (perl perl5/ch-2.pl /usr/share/dict/british-english-insane)
+```
+
+Due to some quirks in the dictionary - this isn't quite that simple - the above only includes
+words without capital letters - if we want to include those aswell we can slighlty modify the
+code to allow them through - but we end up with duplicates - with the same word in twice once
+with an initial capital and once without (e.g. Taser - taser). To resolve this - we replace
+the inner array with a hash keyed on the lower case version!
 
-I haven't golfed this one entirely - but have used "golf" techniques along the way to make the code in someways more readable using grep rather than if for instance!
 ```
-perl6 perl6/ch-2.p6 back < /usr/share/dict/british-english-insane
+bash run-5a.bash
+```
+
+Obviously more words and slightly more complex so this is slower...
+
 ```
+## Small
+  7 aeprs        pares parse pears rapes reaps spare spear
 
-## Timings
+Time taken: 0.25 (perl perl5/ch-2a.pl /usr/share/dict/british-english-small)
 
-Yet again perl5 out performs perl 6 - perhaps I need to know how to optimize perl 6 code...
+## Large
+ 10 aelst        least slate Stael stale steal stela taels tales teals tesla
+ 10 aeprs        asper pares parse pears prase presa rapes reaps spare spear
 
+Time taken: 0.92 (perl perl5/ch-2a.pl /usr/share/dict/british-english-large)
+
+## Huge
+ 13 aeginrst     angriest astringe ganister gantries granites gratines ingrates rangiest reasting stearing Tangiers tasering Tigreans
+ 13 aelst        least leats salet setal slate Stael stale steal stela taels tales teals tesla
+
+Time taken: 2.13 (perl perl5/ch-2a.pl /usr/share/dict/british-english-huge)
+
+## Insane
+ 18 aerst        arest arets aster astre earst rates reast resat serta stare stear strae tares tarse taser tears teras Tresa
+
+Time taken: 4.45 (perl perl5/ch-2a.pl /usr/share/dict/british-english-insane)
 ```
-  perl5    ch-2.pl        1.9 seconds
-  perl5    ch-2a.pl       1.3 seconds
-  perl6    ch-2.p6       27.1 seconds
+
+One issue is the massive grep to find out which ones have the max number of values... we can avoid this by keeping track of the most common keys.... which leads us to the third solution which tracks these on the way through!
+
+```
+bash run-5b.bash
 ```
 
-This time by what looks like a factor of 20.... need a Perl 6 expert to suggest why....
+```
+## Small
+  7 aeprs        pares parse pears rapes reaps spare spear
 
+Time taken: 0.22 (perl perl5/ch-2b.pl /usr/share/dict/british-english-small)
+
+## Large
+ 10 aelst        least slate Stael stale steal stela taels tales teals tesla
+ 10 aeprs        asper pares parse pears prase presa rapes reaps spare spear
+
+Time taken: 0.79 (perl perl5/ch-2b.pl /usr/share/dict/british-english-large)
+
+## Huge
+ 13 aeginrst     angriest astringe ganister gantries granites gratines ingrates rangiest reasting stearing Tangiers tasering Tigreans
+ 13 aelst        least leats salet setal slate Stael stale steal stela taels tales teals tesla
+
+Time taken: 1.65 (perl perl5/ch-2b.pl /usr/share/dict/british-english-huge)
+
+## Insane
+ 18 aerst        arest arets aster astre earst rates reast resat serta stare stear strae tares tarse taser tears teras Tresa
+
+Time taken: 3.12 (perl perl5/ch-2b.pl /usr/share/dict/british-english-insane)
+```
diff --git a/challenge-005/james-smith/perl5/ch-1.pl b/challenge-005/james-smith/perl5/ch-1.pl
new file mode 100644
index 0000000000..9709294f34
--- /dev/null
+++ b/challenge-005/james-smith/perl5/ch-1.pl
@@ -0,0 +1,17 @@
+use strict;
+use warnings;
+use feature qw(say);
+
+## Read in letters from command line... and store in signature
+
+my $kw = signature( "@ARGV");
+
+print grep { $kw eq signature($_) } <STDIN>;
+
+## Signature algorithm - remove non alpha characters, lc,
+## split and sort....
+
+sub signature {
+  return join q(), sort split //, (lc $_[0]) =~ tr/[a-z]//rcd;
+}
+
diff --git a/challenge-005/james-smith/perl5/ch-1a.pl b/challenge-005/james-smith/perl5/ch-1a.pl
new file mode 100644
index 0000000000..060b6a161d
--- /dev/null
+++ b/challenge-005/james-smith/perl5/ch-1a.pl
@@ -0,0 +1,23 @@
+use strict;
+use warnings;
+use feature 'say';
+
+## Read in letters from command line... and store sorted in string $kw...
+
+my $kw = join q(), sort split //, (lc "@ARGV") =~ tr/[a-z]//rcd;
+my $ln = length $kw;
+
+## We are going to optimize this loop... - to avoid sorting long strings of letters...
+## We only need to check those that have the same number of letters as the supplied
+## word ... so we generate the unsorted keyword - and then only check the match if
+## we have the same length...
+
+## By applying this optimization we see a considerable gain in spead
+say join "\n", grep {
+  chomp;
+  ! m{[^a-z]} &&                   ## Only include words that are all lower case (no capitals or punctuation)
+  length $_ == $ln &&              ## Only words that are the same length as the input word...
+  $kw eq join q(), sort split //;  ## Only those whose letters are the same...
+} <STDIN>;
+
+
diff --git a/challenge-005/james-smith/perl5/ch-2.pl b/challenge-005/james-smith/perl5/ch-2.pl
new file mode 100644
index 0000000000..3a2b4bc4a5
--- /dev/null
+++ b/challenge-005/james-smith/perl5/ch-2.pl
@@ -0,0 +1,31 @@
+use strict;
+use warnings;
+use feature qw(say);
+
+## Read in letters from command line... and store in signature
+my %ds;
+my $max = 0;
+
+## Loop 1 - read all the words in - skip any that are
+foreach (<>) {
+  chomp;
+  next if m{[^a-z]};
+  my $kw = signature($_);
+  push @{$ds{$kw}}, $_;
+  $max = @{$ds{$kw}} if @{$ds{$kw}} > $max;
+}
+
+## Loop2 - loop through all the words and
+printf "%3d %-12s %s\n", $max, $_, "@{[ sort @{$ds{$_}} ]}" for
+  sort
+  grep { @{$ds{$_}} == $max }
+  keys %ds;
+say '';
+
+## Signature algorithm - remove non alpha characters, lc,
+## split and sort....
+
+sub signature {
+  return join q(), sort split //, $_[0];
+}
+
diff --git a/challenge-005/james-smith/perl5/ch-2a.pl b/challenge-005/james-smith/perl5/ch-2a.pl
new file mode 100644
index 0000000000..c2d2a91dd0
--- /dev/null
+++ b/challenge-005/james-smith/perl5/ch-2a.pl
@@ -0,0 +1,30 @@
+use strict;
+use warnings;
+use feature qw(say);
+
+## Read in letters from command line... and store in signature
+my %ds;
+my $max = 0;
+
+## Loop 1 - read all the words in - skip any that are
+foreach (<>) {
+  chomp;
+  next if m{[^a-zA-Z]};
+  my $kw = signature($_);
+  $ds{$kw}{lc $_} = $_; ## Removes duplicates with differing capitalization!
+  $max = keys %{$ds{$kw}} if keys %{$ds{$kw}} > $max;
+}
+
+printf "%3d %-12s %s\n", $max, $_, "@{[sort { lc $a cmp lc $b } values %{$ds{$_}}]}" for
+  sort
+  grep { keys %{$ds{$_}} == $max }
+  keys %ds;
+say '';
+
+## Signature algorithm - remove non alpha characters, lc,
+## split and sort....
+
+sub signature {
+  return join q(), sort split //, lc $_[0];
+}
+
diff --git a/challenge-005/james-smith/perl5/ch-2b.pl b/challenge-005/james-smith/perl5/ch-2b.pl
new file mode 100644
index 0000000000..35151f58df
--- /dev/null
+++ b/challenge-005/james-smith/perl5/ch-2b.pl
@@ -0,0 +1,31 @@
+use strict;
+use warnings;
+use feature qw(say);
+
+## Read in letters from command line... and store in signature
+
+my %ds;
+my $max = 5; ## mates/meats/steam/teams/tames has at least 5 so we will use this as a base!
+
+my %maxkeys; ## We keep a copy of the max keys so we don't need to do the big grep later!
+
+## Loop 1 - read all the words in - skip any that are
+foreach (<>) {
+  chomp;
+  next if m{[^a-zA-Z]};
+  my $kw = join q(), sort split //, lc $_;
+  $ds{$kw}{lc $_} = $_;                ## Removes duplicates with differing capitalization!
+  my $t = keys %{$ds{$kw}};
+  next if $t < $max;
+  ($max,%maxkeys) = ($t) if $t > $max; ## reset hash keys....
+  $maxkeys{$kw}=1;                     ## $kw could appear multiple times so we need to
+                                       ## keep this unique!
+}
+
+## Now we dump the results out - we loop through maxkeys array - saves the big grep loop
+## from previous!
+printf "%3d %-12s %s\n", $max, $_, "@{[sort { lc $a cmp lc $b } values %{$ds{$_}}]}" for
+  sort
+  keys %maxkeys;
+say '';
+
diff --git a/challenge-005/james-smith/run-5.bash b/challenge-005/james-smith/run-5.bash
new file mode 100644
index 0000000000..3b06919763
--- /dev/null
+++ b/challenge-005/james-smith/run-5.bash
@@ -0,0 +1,6 @@
+for i in 'small' 'large' 'huge' 'insane'
+do
+  echo '## '${i^};
+  /usr/bin/time -f 'Time taken: %U (%C)' perl perl5/ch-2.pl /usr/share/dict/british-english-$i
+  echo '';
+done
diff --git a/challenge-005/james-smith/run-5a.bash b/challenge-005/james-smith/run-5a.bash
new file mode 100644
index 0000000000..2fc5cf1878
--- /dev/null
+++ b/challenge-005/james-smith/run-5a.bash
@@ -0,0 +1,6 @@
+for i in 'small' 'large' 'huge' 'insane'
+do
+  echo '## '${i^};
+  /usr/bin/time -f 'Time taken: %U (%C)' perl perl5/ch-2a.pl /usr/share/dict/british-english-$i
+  echo '';
+done
diff --git a/challenge-005/james-smith/run-5b.bash b/challenge-005/james-smith/run-5b.bash
new file mode 100644
index 0000000000..c1fadac428
--- /dev/null
+++ b/challenge-005/james-smith/run-5b.bash
@@ -0,0 +1,6 @@
+for i in 'small' 'large' 'huge' 'insane'
+do
+  echo '## '${i^};
+  /usr/bin/time -f 'Time taken: %U (%C)' perl perl5/ch-2b.pl /usr/share/dict/british-english-$i
+  echo '';
+done
author	bagheera-sands <git@sandsscouts.org.uk>	2019-04-23 09:19:55 +0100
committer	bagheera-sands <git@sandsscouts.org.uk>	2019-04-23 09:19:55 +0100
commit	75c977b4ce2b187f170e9d1c739071707b4e9237 (patch)
tree	77b6048723ee4e60b664d0dd6036c1a2475bda16
parent	5ee061e57de12fb9071bb2fcd3a26ea7ce2cecf6 (diff)
download	perlweeklychallenge-club-75c977b4ce2b187f170e9d1c739071707b4e9237.tar.gz perlweeklychallenge-club-75c977b4ce2b187f170e9d1c739071707b4e9237.tar.bz2 perlweeklychallenge-club-75c977b4ce2b187f170e9d1c739071707b4e9237.zip