The Bear's Den

Enter at your own risk

Stepping Back and Forth

Task 1: Distinct Average

Submitted by: Mohammad Sajid Anwar


You are given an array of numbers with even length.

Write a script to return the count of distinct average. The average is calculate by removing the minimum and the maximum, then average of the two.

Example 1

Input: @nums = (1, 2, 4, 3, 5, 6)
Output: 1

Step 1: Min = 1, Max = 6, Avg = 3.5
Step 2: Min = 2, Max = 5, Avg = 3.5
Step 3: Min = 3, Max = 4, Avg = 3.5

The count of distinct average is 1.

Example 2

Input: @nums = (0, 2, 4, 8, 3, 5)
Output: 2

Step 1: Min = 0, Max = 8, Avg = 4
Step 2: Min = 2, Max = 5, Avg = 3.5
Step 3: Min = 3, Max = 4, Avg = 3.5

The count of distinct average is 2.

Example 3

Input: @nums = (7, 3, 1, 0, 5, 9)
Output: 2

Step 1: Min = 0, Max = 9, Avg = 4.5
Step 2: Min = 1, Max = 7, Avg = 4
Step 3: Min = 3, Max = 5, Avg = 4

The count of distinct average is 2.

Solution

When calculating the averages as described, this would result in a complexity of \(\mathcal{O}(n^2)\). The effort may be reduced to \(\mathcal{O}(n \log n)\) by sorting the list and pairing elements forwards from the head and backwards from the tail of the sorted list.

Furthermore, as the number of distinct averages is requested only, it is not necessary to calculate the actual averages and consider the sum of minimum and maximum only.

There remains an issue: We are given numbers, not integers and thus we need to handle floating point numbers with all of their pecularities. Consider:

$ perl -e 'printf "%.16f %.16f\n", -1.3 + 1.7, -.3 + .7'
0.3999999999999999 0.4000000000000000

These should be regarded as equal and therefore need to be rounded in some way.

For a PDL implementation we sort the numbers and split the list into two columns. Then we reverse the values in the second column - here the sever-operation is essential, which is also available as a NiceSlice-modifier.

Finally we sum over the rows, multipy these with a scale factor and round to integer, reduce to unique values and count these.

use strict;
use warnings;
use PDL;
use PDL::NiceSlice;

use constant SCALE => 1e8;

sub distinct_avg {
    my $n = pdl(@_)->qsort->splitdim(0, 2);
    $n(1) .= $n(1,-1:0;|);
    ($n->sumover * SCALE)->rint->uniq->nelem;
}

See the full solution to task 1.

Task 2: Backspace Compare

Submitted by: Mohammad Sajid Anwar


You are given two strings containing zero or more #.

Write a script to return true if the two given strings are same by treating # as backspace.

Example 1

Input: $str1 = "ab#c"
       $str2 = "ad#c"
Output: true

For first string,  we remove "b" as it is followed by "#".
For second string, we remove "d" as it is followed by "#".
In the end both strings became the same.

Example 2

Input: $str1 = "ab##"
       $str2 = "a#b#"
Output: true

Example 3

Input: $str1 = "a#b"
       $str2 = "c"
Output: false

Solution

When seeing this task, my first instinct was “use a regex substitution”. Though it doesn’t seem possible to apply all backspaces in a single s///g operation (see notes in perlop), it would be easy to achieve this in a simple while loop:

1 while s/.#//;

But then I realized this is not efficient as it would parse the input string over and over from the beginning. With \(n\) as the length of the string and \(k\) as the number of backspaces, the complexity of this approach would be \(\mathcal{O}(n k)\).

Regex matching and substitution is known to be very fast, but if used in an inefficient way it cannot keep up with something more efficient.

An efficient implementation was an issue in task 1 and should be such in task 2, too.

A better approach consists in copying the input string character-wise and stepping back whenever a backspace is seen. I’ll present a benchmark later.

As it is not specified how to handle the situation when a backspace appears at the beginning of a string, for simplicity a leading # will delete nothing and just vanish.

use strict;
use warnings;
use experimental 'signatures';

sub apply_bs {
    my @out;
    $_ eq '#' ? pop @out : push @out, $_ for split //, shift;
    join '', @out;
}

sub bs_cmp ($str1, $str2) {
    apply_bs($str1) eq apply_bs($str2);
}

The initial implementation was:

sub bs_cmp_subst ($str1, $str2) {
    for ($str1, $str2) {
        1 while s/.#//;
    }
    $str1 eq $str2;
}

Comparing both implementations on the string 'a' x 1e5 . '#' x 1e5 (resulting in the empty string) shows an unambiguous score:

        Rate subst  copy
subst 3.05/s    --  -91%
copy  34.4/s 1027%    --

See the full solution to task 2.


If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.