The Bear's Den

Enter at your own risk

Common Encodings

Task 1: Count Common

Submitted by: Mohammad Sajid Anwar


You are given two array of strings, @str1 and @str2.

Write a script to return the count of common strings in both arrays.

Example 1

Input: @str1 = ("perl", "weekly", "challenge")
       @str2 = ("raku", "weekly", "challenge")
Output: 2

Example 2

Input: @str1 = ("perl", "raku", "python")
       @str2 = ("python", "java")
Output: 1

Example 3

Input: @str1 = ("guest", "contribution")
       @str2 = ("fun", "weekly", "challenge")
Output: 0

Solution

Generalizing the task to any number of arrays.

As the task makes no restrictions on the strings in each array, we must not assume the strings being unique.

This raises the question, how to deal with non-unique strings. I presume they count in their multiplicity, i.e. the same string appearing n times in the first array and m times in the second array counts as min(n, m) common strings.

This assumption directly leads to the following solution:
For each string we record the number of appearances in each array. Then we take the sum over the minimum number of appearances over all arrays for each word.

use strict;
use warnings;
use List::Util qw(min sum);

sub count_common {
    my %count;
    my @z = (0) x @_;
    while (my ($i, $words) = each @_) {
        ($count{$_} //= [@z])->[$i]++ for @$words;
    }
    sum map min(@$_), values %count;
}

See the full solution to task 1.

Task 2: Decode XOR

Submitted by: Mohammad Sajid Anwar


You are given an encoded array and an initial integer.

Write a script to find the original array that produced the given encoded array. It was encoded such that encoded[i] = orig[i] XOR orig[i + 1].

Example 1

Input: @encoded = (1, 2, 3), $initial = 1
Output: (1, 0, 2, 1)

Encoded array created like below, if the original array was (1, 0, 2, 1)
$encoded[0] = (1 xor 0) = 1
$encoded[1] = (0 xor 2) = 2
$encoded[2] = (2 xor 1) = 3

Example 2

Input: @encoded = (6, 2, 7, 3), $initial = 4
Output: (4, 2, 0, 7, 4)

Solution

We may XOR both sides of the given equation with orig[i]. Taking into account that

x XOR x == 0
x XOR 0 == x

we find:

encoded[i] XOR orig[i] = orig[i + 1]

Using List-Utilities

This is a nice use case for reductions. $initial is given as the first argument.

use strict;
use warnings;
use List::Util 'reductions';
use experimental 'bitwise';

sub decode_xor {
    reductions {$a ^ $b} @_;
}

For the sake of completeness, here is the corresponding encoder. In this case slide comes handy.

use List::MoreUtils 'slide';

sub encode_xor {
    slide {$a ^ $b} @_;
}

Using PDL

Though the implementations based on List::Util and List::MoreUtils appear as simple as possible, it is interesting to take a look at PDL.

The encoder is straightforward. We shift the original list by one element and perform an XOR operation between both.

use PDL;
use PDL::NiceSlice;

sub encode_xor_pdl {
    my $orig = long @_;
    $orig(0:-2) ^ $orig(1:-1);
}

When it comes to decoding, things are a bit more complicated. Here the input for the XOR operation and its output access the same data and therefore we cannot use the overloaded ^ operator. Instead we need to call the core function explicitly, providing all parameters. The second operand to XOR and the result are overlapping slices of the same ndarray $orig. It is initialized with $init followed by zeroes. The zeroes are being overwritten with the recovered original values in the course of the internal loop.

sub decode_xor_pdl {
    my $init = shift;
    my $enc = long @_;
    my $orig = zeroes(long, $enc->dim(0) + 1)->set(0, $init);
    PDL::xor($enc, $orig(0:-2), $orig(1:-1), 0);
    $orig;
}

Both PDL subs may be called with the list of numbers as an array, an array ref or an ndarray. While it doesn’t make a big difference for the list implementations if the interface is based on an array or an array ref, the impact on the PDL implementations is significant. When given an array or an array reference, most effort involves converting the input into an ndarray, where in the former case this takes even longer than the whole process in the list implementations. To unleash the full power of PDL, the input has to be provided as an ndarray. Though such a comparison between the list and PDL implementations might not be fair, it is at least impressive.

Here is a benchmark for all four subs, operating on \(2^{18}\) random long numbers.

           Rate dec_list enc_list  dec_pdl  enc_pdl
dec_list 77.4/s       --      -9%     -96%     -99%
enc_list 84.9/s      10%       --     -96%     -99%
dec_pdl  1895/s    2350%    2134%       --     -70%
enc_pdl  6218/s    7938%    7227%     228%       --

See the full solution to task 2.


If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.