[< Previous 185](https://github.com/drbaggy/perlweeklychallenge-club/tree/master/challenge-185/james-smith) | [Next 187 >](https://github.com/drbaggy/perlweeklychallenge-club/tree/master/challenge-187/james-smith) # Ʈʯḗ Ŵꭂḵⱡʏ Chåɫleŋȅ 186 Ỷȣ ʗầņ fɪƞẟ ḿồṛĕ ĩƞꬵõɽɯɑẖıọǹ ꬰƀȣƫ ðiŝ ŵꭂḵꭍ, ẚꬻḋ ṗʁeỽíȣẜ ẅꭂks cẗâỻɛŋɘṧ ẩṱ: https://theweeklychallenge.org/ Ïf ỿȣ ḁrẽ ṇọť aľɻəǻďƴ ẟƣŋ ðe ĉʮảꬸḽeŋḗ - íṯ ȉṥ ẫ ġỏoď plẵꭢ ʈⱺ ṗꭉaꬿŧiśe ɏỗůꭆ **ƥěꭈⱡ** öṟ **ꭉakū**. Įƒ ȋt iȿ nóť **ṗḙɽl** oṙ **rakʊ** ẏȣ ƍȩvȅŀơƥ in - ỿȣ ĉẚṅ ṡūḃꭑĭƫ šôȴuʈīoňẛ įṋ ẉɥḯċʯḝʋḕr ʟaŋʉaĝe ꭅȣ ꬵꭂⱡ cőmḟởrṯaƀłꬳ ẁið. Ȳȣ ɕañ ḟḭꬻḑ þe sȯḹȕțḭǫṋs ʯeɿe ốņ gɩʇḥùɓ ấţ: https://github.com/drbaggy/perlweeklychallenge-club/tree/master/challenge-186/james-smith # Task 1 - Zip List ***You are given two list `@a` and `@b` of same size. Create a subroutine `sub zip(@a, @b)` that merge the two list as shown in the example below.*** ## Solution This is a problem where perl signatures come into play. We use a signature so that we can use arrays in call rather than references to arrays. The signature `(\@\@)` then converts the arrays in the call into references. ### Solution 1 - Simple code: Rather using a loop variable and select out of the two arrays we loop through 1 and use a pointer to loop through the other. ```perl sub zip (\@\@) { my $c = 0; map { $_, $_[1][$c++] } @{$_[0]} } ``` ### Solution 2 - Using a pointer for both arrays: This is the naive version where we have a counter and use it to index both arrays. The plus is that it is a single line, no need for a variable. ```perl sub zipt (\@\@) { map { $_[0][$_], $_[1][$_] } 0..@{$_[0]}-1; } ``` ### Solution 3 - Golfed code: Now I couldn't go a week without putting in some golf code - which is slightly off the wall! We loop through one array and shift off the other As we are golfing we try and avoid new variables, we can therefore reuse/abuse `@_` (if we are careful)... this is the reason for `local` which stops our changes affecting the loop variable outside the function call. Not only does using `@_` avoid using `my` when we loop through the array as we `shift` off the first entry we can use the bare `shift` as it assumes that the array is `@_`... Finally we have to do `@{shift()}` as perl can't distinguish between this meaning the variable `@shift` or `@{$_[-1]}`. ```perl sub zipx (\@\@) { local($_,@_)=(shift,@{shift()}); map{$_,shift}@$_ } ``` ## Performance With out original tests of up to 10 entries in the array we came up with an ordering of `zip`, `zipt`, `zipx` as faster to slowest - although with only about a 10-15% difference between all 3. We then tried with a larger array (initially 1000) and sptted that `zipx` is now the fastest, about 20% faster than `zip` and 50% faster then `zipt`. The break even point is around `35` entries in each array. So the `zipx` is probably a better method to use as if the size is small all are lightning fast - around 1 micro-second, but as the array gets larger this gets around the millisecond - which could well be more significant. # Task 2 - Unicode Makeover ***You are given a string with possible unicode characters. Create a subroutine `sub makeover($str)` that replace the unicode characters with ascii equivalent. For this task, let us assume it only contains alphabets.*** ## Solution We could use packages/built in support for UTF-8. But where would be the fun in that... can we replicate this in compact perl! We first need to define a data structure where we can define the mapping between unicode symbol and plain latin ASCII character. We could use a straight hash. But as the blocks of letters are in Unicode 'code blocks' we can use this to define the arrays more efficiently. `%blocks` is a hash whose key is the numeric value of the first character in the block `ord($ch)`. the second is a series of letter substitutions for that block. If the value in array is `?` then we assume there is no direct mapping. ```perl %blocks = ( 192 => [qw( A A A A A A A C E E E E I I I I TH N O O O O O x O U U U U Y TH SS ... q ? ? dz ? dz ts ? tc ? ls lz ? ? h h)], 7680 => [qw( A a B b B b B b C c D d D d D d D d D d E e E e E e E e E e F f ... O o O o U u U u U u U u U u U u U u Y y Y y Y y Y y LL ll V v Y y)], ... 43824 => [qw( a ae e e e f g l l l m n n ? ? c oe oe ee co co y r r r r r r r s n n m m n x x x x x x x y ? ? ? ? ? lb ie ce uo a)], } ``` Now note we allow "ligatures" & "diagraphs" in the array so there is a potential for a 1-many mapping. (Diagraphs may be written as two letters, but sometimes the letters are merged into a ligature, e.g. oe as in Phoenix; ss; ll in welsh ...) And now `makeover` - this might be seen as nasty code! But we will explain below ```perl sub makeover { join '', map { my $t = ord $_; ( grep( { $_ ne '?' } map { $t >= $_ && $t-$_ < @{$blocks{$_}} ? $blocks{$_}[$t-$_] : () } keys %blocks ), chr $t )[0] } split //, shift } ``` ``` shift -> split -> map -> join ``` Remember with nested blocks we work backwards. The outer map is simple, we split the parameter, and then process each character in turn, and join them together. We split the string into chunks we assume it is a UTF-8 string, and not a byte string, as the examples are the perl code - we can guarantee that as we `use utf8`, for file/command line - we can use *e.g.* ```perl binmode(STDIN, "encoding(UTF-8)"); ``` There are similar was to ensure that database results are UTF-8 ``` keys -> map -> grep ``` Now for the inner loop - We loop through each block and see if the character would be in that block - if it is we grab the mapped version of the character - this is the `map` portion. This `map` returns at most one character. We then `grep` out any `?` character We then add the original symbol to the list... So the results of the outer bracket will be an array with one or two values `[ mapped, original ]` or `[ original ]`. So getting the zero-th element we have the mapped character if there is one or the original character if there isn't. We know output is going to be UTF-8 ans we changed the *binmode* of STDOUT to be UTF-8. This is a bit messy and slow - so can we think of a different way of using our data structure: ### A simpler solution!! Well we can convert the data structure into a hash... and see how the code works - this is an initial overhead but each symbol is a lot quicker! ```perl my %map = map { my $c = $_; map { $_ eq '?' ? ($c++,$_) : ($c++) x 0 } @{$blocks{$_}} } keys %blocks; ``` This then gives us a `makeover` function of: ```perl sub makeover { join '', map { $map{ord $_}//$_ } split //, shift } ``` ### Some examples and performance... Here are some examples.... ``` ÃÊÍÒÙ! => AEIOU! âÊíÒÙ => aEiOU Ƭȟȩ Ẇḕȅǩȴƴ Ćħąỻḝṅḡể => The Weekly Challenge ``` With these three examples the 2nd method is 4-5x faster than the previous method... Pre-computing is always the way to go, if you have a complex function and not too much memory is required to store the mappings. ## Now lets go the other way... This is interesting - but it was hard to write examples - so I thought why don't I go the other way and produce the "decorated" code from the plain text - I find it weird calling it "`makeunder`" but as it is opposite over `makeover` so be it. I think switching names would be more understandable. Now there are two ways we can do this - we can either just use the single character mappings or we can look to see if we can include the ligatures/digraphs. We also add in a small random feature meaning we don't necessarily modify all characters... ### Getting the mapping ```perl my @inv; push @{$inv[length $map{$_}]{$map{$_}}},$_ for keys %map; ``` We have made this slightly more complex by splitting the mapping into two hashs - `$inv[1]` the mapping of characters and `$inv[2]` the mapping of digraphs. ### First pass no ligatures This one is relatively simple - lookup character in list - if is mappable map it - if not just copy it, and we use the `join map split` mode... ```perl sub makeunder_nolig { join '', map { $inv[1]{$_} && 0.8 > rand ? chr $inv[1]{$_}[rand @{$inv[1]{$_}}] : $_ } split //,$_[0]; } ``` ### Now with ligatures ```perl sub makeunder { my $res = ''; my @T = split //,$_[0]; while(@T) { my $x = shift @T; $res .= @T && exists $inv[2]{$x.$T[0]} && 0.8 > rand ? chr($inv[2]{$x.$T[0]}[rand @{$inv[2]{$x.$T[0]}}]).(shift @T)x 0 : exists $inv[1]{$x} && 0.8 > rand ? chr $inv[1]{$x}[rand @{$inv[1]{$x}}] : $x } $res; } ``` We can't use the `join map split` model this time - as the `map` section is looking at one or two characters at a time. So through the loop we first look for a ligature - if we have one we use that (OK - we might not with a random change). If we don't we then try a single character, or if not fall through. **Notes:** We use our favourite `x 0` trick to shift the 2nd character from a digraph, without it affecting the output... Here is some output (this weeks challenges) you will already have seen some at the top of this document! ``` Ṫaṩḵ 1: Zįƥ Ḽiẛƭ Ṧuɓṃïtŧȩƍ ḅꭅ: Ḿoḧǟmꭐaƌ Ṧ Ḁʼnẁẳr Ỵồŭ äɺɞ ɡɪʌȅṅ țẅồ ƚīṧẖ @à ãɴḓ @b ōf ṥąꬺⱻ ṥĩȥe. Čʁeꭤťꬲ a ṧuɓꭆȣtȋñḛ ẜȕƃ ẕĩṗ(@ḁ, @ɓ) ðằṫ ṃềrĝe ðɚ ʈẇợ łıẛƫ aṣ ẜhowṇ ɪn ðⱻ ềẋaꭑpɭȇ ḃʚꬸȯw. Ṭẫꭍḳ 2: Ʊꬼıꭄȡɜ Ṁắƙẹȱỽėȓ ſȗḇɯïţⱦɜȡ bʏ: Ṁổɦẩɰṃåḋ S Ấꬼwǡɾ Ÿȣ äꭋe ǥiṿȩȵ ȃ śẖȑḭnĝ ẁịʇḧ ṕȱṣẝɨḇlé uņĩꭄdě ċȟắꭇäȼṱḕȓȿ. Çɹḛaƭe ậ ẝửḃꭊôuṫiꭎḗ ṩṳƃ ɯắkeṓỽeꭆ($ṩẖꭋ) ðat rȩpḷāče ƫḣẹ uꭎịʗɔdē cḫắɾȁḉțễꭈʂ ẁĭð aẜḉɨĭ ɘɋữǐṿaļėꭎť. Ƒỏꭊ ðis ṯꭤśk, ḷeƭ ǘʂ aʂṣụɯȩ ḯŧ ȭṉḹꭅ ꭄṋʈấïṋș alṕẗǎƀeʦ. ```