It's not hard to see that, for 32-bit numbers this occurs
precisely when bits 15 and 31 of z are 0. (Note: the LSB of z is bit 0).
�This occurs with probability 1/4. �So the probability that a pair
of inputs w and 2w produce outputs y and 2y is the probability that the differential
is preserved at every swapping step. �Since this happens with probabilitity
1/4 at each swap, we expect this to occur with probability (1/4)^4=1/256.
So suppose the input pair (w,2w) produces output pair (v,2v). �We
call the pair (w,2w) a right pair. �Then with high probability bits
15 and 31 of k6*w are 0. �This is a two-bit condition on k6*w that one
can use to filter the set of potential values of k6; 1/4 of all k6 values
will pass this test.� One can repeat this test for 16 right input pairs
(w1,2*w1)...(w16,2*w16) chosen uniformly at random, and the probability of
a given k6 value surviving all 16 tests is roughly (1/4)^16 = 2^-32, so we
expect about one value of k6 to survive.
We now show that, having determined k6, an attacker can determine k7 using
very few additional queries. �Note that if (w,2w) is a right pair, then

(k6*w) and

(k6(2w)) form a right pair for determining k7. �Thus the right pairs
used to determine k6 can be used to determine k7, too. �In the rare case
that the right pairs from k6 do not completely determine k7, one may need
to make a few more queries. �But since� k6 is known, an attacker
can control the input to the cipher fragment starting with multiplication
by k7. �This yields a differential with probability 1/64. �So we
may very pessimistically estimate that k7 can be determined with an additional
1024 queries.
Once k7 has been determined, the right pairs can be applied to determing
k8, and so on. �Continuing in this way, we see that k6,...,k10 can
all be determined with high probability using fewer than 8192 chosen-plaintexts.
�An attacker can then apply the same trick to k0,...,k4. �Thus
the whole cipher can be broken with about 2^14 chosen-plaintexts. �This
is surprisingly small considering the large key size.
We should now mention that the work factor of breaking the cipher is quite
low, as well. �Suppose an attacker has right pairs (w1,2*w1),...,(w16,2*w16)
which determine k6. �By definition of being right, bits 15 and 31 of
k6*wi are 0 for all i. �These constraints can be translated into nonlinear
equations on the bits of k6. �Unfortunately, the degree of the equations
is as large as 31, so solving them directly is impossible. One could iterate
over all possible values of k6, throwing out the ones that don't satisfy the
equations, but this will require testing 2^32 keys. �Observe that bit
15 of k6*wi is independent of bits 16,...,31 of k6, though. �Thus an
attacker can try all possible values for the low 16 bits of k6, checking whether
they satisfy this equation. �After discovering the lower 16 bits, he
can then do the same thing for the upper 16 bits. �Since we have to
test each half of a key against each right pair, the total number of tests
performed is 2*2^16*2^4=2^21. �Repeating for k7,...,k10, and then again
for k0,...,k4 yields that the whole cipher can be broken with10*2^19~=2^25
tests. �But how expensive is each test? �Testing the lower or upper
16 bits of a key against a wi involves multiplying by wi, masking bit 15 (or
31), and testing for 0. �This is about 1/8th as expensive as the MultiSwap
encryption, which requires 10 multiplies, 10 swaps, and 6 adds. �So
the work factor is about (2^25)/8=2^22 encryptions.
Converting to known-plaintext attack
Recall there are two stages to the attack: recover k5 and k11, and recover
the rest of the key. �The attack on k5 and k11 can be converted to
a known-plaintext attack as follows. �Referring to Figures 1 and 2,
observe that w=c1-c0+x1. �With probability 2^-32, this value is 0, and
that situation can be detected. �When this happens, c0=k11. �So
a set of 2^32 known-plaintexts should suffice to recover k11. �Similarly,
s0'=c1-c0-s1. �Out of a set of 2^32 known-plaintexts, on average one
plaintext should satisfy x0+s0=0, in which case s0'=k5. �Since these
two events are roughly independent, an attacker should be able to recover
k5 and k11 with 2^32 known-plaintexts.
One can also convert the second stage of the attack to use known-plaintexts.
�We first have to see that the inputs and outputs of the two halves
of the cipher can be isolated. �So suppose an attacker knows k5 and
k11. �First observe that c1 = c0 + s0'. �So the input to Figure
2 can be computed as w=c1 - c0 + x1. �Since he knows k11, an attacker
can also obviously compute the output, v,� of the fragment in Figure
2. �For the first half of the cipher, the input is x0 (or x0 + s0, which
is known, if used in a chaining mode). �The output (immediately after
multiplication by k4) is

(c0-c1-k5).
All that's left is to figure out the number of messages one needs to capture
before expecting to have 8192 pairs (w,2w) for the second round and 8192
pairs (w,2w) for the first round. �With 2^22.5 known-plaintexts, we
get 2^44 pairs, and the probability that any one of these pairs is of the
form (w,2w) is 1/2^31. �Hence we expect to have 2^13 such pairs. �Experiments
confirm this estimate. �Thus with 2^22.5 known plaintexts, we expect
that the 2^22.5 inputs to the second round will contain about 2^13 pairs,
enough to recover k6,...,k10. �But these same messages yield 2^22.5 inputs
to the first round, which should also contain 2^13 pairs. �Since these
events are independent, one should be able to break the system with 2^22.5
known plaintexts. �Detecting the pairs in a set of known plaintexts is
easy if the pairs are stored in a hash-table, so the work factor is just 2^25,
as above.
A better known-plaintext attack
The known-plaintext attack described above requires 2^32 texts, which seems
like a waste since those texts are only required to recover k5 and k11. �By
performing both halves of the attack simultaneously, one can get by with just
2^22.5 known-plaintexts.
Recall that, even without knowledge of k5 and k11, we can derive the input
to the cipher fragment in Figure 2 via w=c1-c0+x1.� If we extend our
differential through the additional swap immediately preceding the addition
of k11, we get a differential (w,2w) -> (v,2v) -> (v+k11,2v+k11) with
probability 1/2^10. �Given such a right pair with outputs c0 and c0',
we can compute k11=c0'-2*c0.
So collect 2^22.5 known-plaintexts, and collect from them 2^13 pairs whose
input to Figure 2 is (w,2w). �Each such pair suggest a candidate for
k11=c0'-2*c0. �The right value of k11 will be suggested once for each
right pair, or 2^13/2^10=8 times. �Wrong pairs will suggest a random
value for k11, and so no other value for k11 should be suggested more than
once or twice.
With k11, one can use the previously described attack to recover k6,...,k10.
�This attack can then be repeated for k5, and then for k0,...,k4. �The
total work factor is about the same as for the previous attacks. �The
storage is also quite small, since we don't have two keep a counter for every
possible value of k11, only the ones suggested by a pair. �Since we use
only about 2^13 pairs, the storage requirement is about 2^16 bytes.
Conclusion
We have seen that MultiSwap can be broken with a 2^14 chosen-plaintext
attack or a 2^22.5 known-plaintext attack, requiring 2^25 work. �We
believe this shows that MultiSwap is not safe for any use.