The Bear's Den

Enter at your own risk

Weighing and Counting Pairs

Task 1: Secret Santa

Submitted by: Andreas Voegele


Secret Santa is a Christmas tradition in which members of a group are randomly assigned a person to whom they give a gift.

You are given a list of names. Write a script that tries to team persons from different families.

Example 1

The givers are randomly chosen but don't share family names with the receivers.

Input: @names = ('Mr. Wall',
                 'Mrs. Wall',
                 'Mr. Anwar',
                 'Mrs. Anwar',
                 'Mr. Conway',
                 'Mr. Cross',
                );

Output:

    Mr. Conway -> Mr. Wall
    Mr. Anwar -> Mrs. Wall
    Mrs. Wall -> Mr. Anwar
    Mr. Cross -> Mrs. Anwar
    Mr. Wall -> Mr. Conway
    Mrs. Anwar -> Mr. Cross

Example 2

One gift is given to a family member.

Input: @names = ('Mr. Wall',
                 'Mrs. Wall',
                 'Mr. Anwar',
                );

Output:

    Mr. Anwar -> Mr. Wall
    Mr. Wall -> Mrs. Wall
    Mrs. Wall -> Mr. Anwar

Solution

Frankly, I don’t understand the task. The formulation

…a script that tries to team persons…

leaves room for for too much interpretation and I see some inconsistency in the examples.

In the first example there are pairs where A hands a gift to B and vice versa.

In the second example there is an odd number of persons. Irrespective of any family memberships there are no pairings covering all persons. The solution looks completely different than the first. There are no longer pairs, but a cycle of three instead.

To get over these inconsistencies, I’d like to add some specifications:

Sadly, this interpretation does not agree with example 2.

Further assumptions:

With these presuppositions the task may be regarded as maximum weight matching. Using a weight of one for pairs from different families and zero within the same family we come to a complete graph where (for an even number of vertices) a maximum cardinality matching always exists.

This problem is known to be solvable in polynomial time, though the known algorithms are rather complicated. Luckily there is Graph::Matching on CPAN that will get the job done.

There are just two steps:

  1. Create a weighted, undirected graph with the persons as vertices and an edge between all pairs of persons with a weight of zero if both belong to the same family and a weight of one otherwise.
  2. Find a maximum cardinality matching having maximum weight for the constructed graph.

As the inner-family matchings do not contribute to the weight of the matching, they will be chosen only for cardinality if there is no better solution.

These steps may be implemented as:

use Math::Prime::Util 'forcomb';
use Graph::Matching 'max_weight_matching';

sub same_family {
    my ($x, $y) = map +(split)[-1], @_;
    $x eq $y;
}

sub secret_santa (@names) {
    return {} if @names % 2;
    my @graph;
    forcomb {push @graph, [@names[@_], 1 - same_family(@names[@_])]} @names, 2;

    +{max_weight_matching(\@graph, 1)};
}

See the full solution.

Task 2: Most Frequent Letter Pair

Submitted by: Jörg Sommrey


You are given a string S of lower case letters 'a'..'z'.

Write a script that finds the pair of consecutive letters in S that appears most frequently. If there is more than one such pair, chose the one that is the lexicographically first.

Example 1

Input: $s = 'abcdbca'
Output: 'bc'

'bc' appears twice in $s

Example 2

Input: $s = 'cdeabeabfcdfabgcd'
Output: 'ab'

'ab' and 'cd' both appear three times in $s and 'ab' is lexicographically smaller than 'cd'.

Solution

There are a few steps to the solution:

  1. Find all consecutive letter pairs.
  2. Lexicographically sort the pairs.
  3. Count the number of repetitions for each pair.
  4. Find the index of the maximum count.
  5. Pick the pair at the found index.

Here PDL::Char comes handy.

We may solve example 1 step by step in the PDL shell.

First load PDL::Char and toggle ‘print-by-default’:

pdl> use PDL::Char

pdl> do_print
1

Construct a PDL::Char object:

pdl> $s = PDL::Char->new('abcdbca')
'abcdbca' 

Then use lags to build the character pairs:

pdl> $p = $s->lags(0, 1, $s->dim(0) - 1)
[ 'ca' 'bc' 'db' 'cd' 'bc' 'ab' ] 

Perform a vector sort:

pdl> $sv = $p->qsortvec
[ 'ab' 'bc' 'bc' 'ca' 'cd' 'db' ] 

Runlength-encode the sorted pairs.

pdl> ($count, $pairs) = $sv->rlevec
$PDL_Char1 = '' ;
$PDL_Char2 = [ 'ab' 'bc' 'ca' 'cd' 'db' '' ] 
;

As the $count ndarray inherits the class PDL::Char from $sv, it needs to be converted to a numerical type:

pdl> $count->long
[1 2 1 1 1 0]

Find the index of the most frequent pair.

pdl> $maxind = $count->long->maximum_ind
1

Pick the result:

pdl> $pairs->atstr($maxind)
bc

Everything put together in a sub:

use PDL;
use PDL::Char;

sub most_frequent_pair {
    my $s = PDL::Char->new(shift);
    my ($count, $pairs) = rlevec($s->lags(0, 1, $s->dim(0) - 1)->qsortvec);
    $pairs->atstr($count->long->maximum_ind);
}

See the full solution.


If you have a question about this post or if you like to comment on it, feel free to open an issue in my github repository.