jack: (Default)
[personal profile] jack
Problem: What is the shortest base-d sequence which contains all sequences of n d-digits?

I think I have a solution, but it's a bit of a sketch. The best possible is length d^n, assuming all adjacent sequences overlap (n-1), and I aim to show how to make one that length.

Instead of a sequence, consider a loop. We write this as a sequence, but consider the ends to be joined. This is because there's enough symmetry we can probably solve the one problem if we can solve the other, and because we can combine sequences without having to have n-1 tails hanging off the ends.

For instance, for 3-sequences of 2-digits, the loop 00010111 contains all 2^3 3-sequences, two of them wrapping round the end.

Lemma: If two loops contain a lot of n-sequences once each, and somewhere contain the same (n-1)-sequence, they can be combined into one loop which does so, having the sum of their lengths.

Write them as [n-1][loop1] and [n-1][loop2]. Then [n-1][loop1][n-1][loop2] is the sum of the lengths, and contains every sequence the preceeding did, since any n-subsequence corresponds to one in one of the originals.

We use (strong) Induction on d and n[1] to show that there is always such a sequence

Find a solution for n=d=2, and then for arbitrary n,d assume we can produce a desired sequence S(m,c) for any s,m smaller than n,d.

What are the n-sequences we need to include? Consider the groups:

Those which contain no 0s.
Those which contain one 0.
...
Those which contain n-1 0s.
The sequence of n 0s.

The first is all the sequences of d-1 digits length n, so we have a loop s(d-1,n) of length (d-1)^n.

The second. For each position of the zero, the rest of the sequence is determined by an n-1-sequence of d-1 digits, producing n*(d-1)^(n-1). Construct a loop s(d-1,n-1) of length (d-1)^(n-1), writing it as abc...m... (where m represents the n-1'th digit)

Consider loops like that but with a 0 inserted every n digits:

0abc...m0...
a0bc...m.0..
ab0c...m..0.
abc0...m...0
...
abc...0m...

Observe that there are n-1 copies of the original loop, and an extra 0 for each position in the original loop, making a total length of n*(d-1)^(n-1). Also, sequence in the original loop is in one of the new loops with a 0 inserted into any position (just read down the column, noting the zeros move one place right each time).

So these loops contain every sequence of the second category exactly once.

They can be stitched together by considering sequences of nearly all 1s -- one loop must have 0111..1 and a different 1011..1, which contain the n-1 subsequence 0111..1 so can be attatched, etc.

This loops can be stitched the first category loop between 0111..1 and 1111..1. Now we have a loop containing all d-digit n-sequences containing no or one 0.

This is approaching the order desired, and some examples make me believe the other categories can be treated the same way with more concentration on the permutations, but I can't formalise it. I think I've done it by hand for up to 4-sequences of any-digits.

If anyone else wants to object, or fill in the gaps, feel free.

[1] OK, I'm using induction a lot, but look, I'm not using proof by contradiction. It's an actual constructive proof you could in theory turn into an algorithm :)

Date: 2006-08-27 10:05 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
There is a known algorithm which solves this problem, and you're right that it can always be done in d^n. I know the algorithm (and in fact I have a small script that implements it lying around somewhere or other), but I don't think I've ever seen a proof of why it always works.

Date: 2006-08-28 06:02 am (UTC)
From: [identity profile] angoel.livejournal.com
I'm not sure that the above is a proof, rather than just wishful thinking ;). In particular, I can't seem to get it to calculate binary sequences, due to the fact that the thing that I'm trying to splice zeros into is only one character long.

Date: 2006-08-28 10:28 am (UTC)
From: [identity profile] dragonwoodshed.livejournal.com
Burma!

(I panicked)

Date: 2006-08-28 11:30 am (UTC)
From: [identity profile] cartesiandaemon.livejournal.com
I wouldn't be at all surprised I was.

You're right about binary. It does get me the answer, but it consists almost entirely of special cases not in the "proof". For instance, n=3,d=2:

(i) A loop containing sequences with no 0s, d(3,1)=111
(ii) A loop containing 011, 101 and 110. Consider d(2,1)=11. I want to insert a 0 after the first place and the second place, but in fact, 11 is symmetric, only one position is necessary to represent all its sequences, so 011 *or* 101 contain all three sequences we need.
(iii) Ditto d(1,1).
(iv) 000

However, we end up with four loops 111, 011, 001, 000. We can stitch the middle two together normally [01]1[01]0, but then the first and last are also strange, we insert 000 into a sequence of 00, but only extending the long loop by 1.

But none of this yet convinces me the proof can't be patched up to deal with it. You can easily work out any n=1,2 or d=1,2 as a special case, and then work with higher values.

Date: 2006-08-28 11:30 am (UTC)
From: [identity profile] cartesiandaemon.livejournal.com
What?

(Ironically, I'm very confused :))

Date: 2006-08-28 11:31 am (UTC)
From: [identity profile] cartesiandaemon.livejournal.com
Thank you.

(I assumed someone must have worked it out already, but Ian brought it up at Sonic's party, and everyone else was hot to work it out, so I felt honour-bound to work it out for myself.)

Date: 2006-08-28 04:38 pm (UTC)
From: [identity profile] angoel.livejournal.com
My opinion is that you need to get the proof working in the binary case, and then the rest of the proof will follow. Saying that it's trivial to work out a loop containing all binary sequences of length 256 is in my opinion incorrect, and writing that out would probably give you the structure needed for higher cases.

Broadly speaking I do think that you're correct and that this proof works. The main bits that you need to formalise are:

a) Proof that you've got all combinations in the loops
b) Proof that you have no duplication of combinations in the loops
c) Proof that it's always possible to stick the loops together, and you don't (say) have situations in which the loops form disjoint sets.

Until you can provide proofs of that, though, it'll still claim it's wishful thinking ;)

[What I mean by wishful thinking is that it *may* be possible to get it to work, but it may equally possibly not be possible to get it to work. It's a bit like saying that you can solve the four colour theorum as follows:

"Assume for contractiction that there's a smallest map that cannot be four-coloured. Evidently we can five-colour this map with at most one country being the fifth colour. [Proof: Take out the country with the fewest borders. Four colour the resulting map. Return the country. Turn it into the fifth colour]. If this country borders three countries, it can be coloured the fourth colour. Contradiction, QED. If this country borders four countries, follow this set of proceedures. Contradiction, QED. If this country borders five countries, follow this set of procedures. Contradiction, QED. If the country borders six countries, you have a special case in which all countries have six borders, and this can be three-coloured. Contradiction, QED."

Which works, except for the irritating fact that proving a set of procedures works for the five border case is ... difficult. And while nothing convinces me that the proof can't be patched up to deal with this fact, the fact remains that I, and a lot of very clever people, have spent ages attempting to do the aforementioned patching, and the lack of result speaks for itself ;).]

Date: 2006-08-29 03:53 pm (UTC)
From: [identity profile] damerell.livejournal.com
I just wrote a script which starts 9999 and appends the lowest digit that does not repeat a previously seen sequence. I don't know if this is your algorithm but it generates a 10003 digit sequence ending 999 (hence equivalent to a 10000 digit loop).

At the party I was thinking "assume some sequence ABCD is repeated" but didn't get very far showing a contradiction from that.

Date: 2006-08-29 04:08 pm (UTC)
simont: A picture of me in 2016 (Default)
From: [personal profile] simont
I don't know if this is your algorithm

It is, yes. The source I got it from (which I think was a book by Ian Stewart) asserted that this method always succeeds, for any n and d, but unfortunately didn't include a proof of this claim. However, I've tried it myself on a variety of (n,d) pairs and not seen a failure, and Ian Stewart is generally reliable, so I have no reason to think there isn't a proof. I just don't know what it is.