Saturday 17 March 2012

Games computers play, part n

While we wait for the next RoboCup, today's The New York Times has a story titled 'The Computer’s Next Conquest: Crosswords'. Crossword puzzles may seem like a trivial task solved quickly by brute force and a good word list without even having to look at the clues, but that depends a lot on the puzzle and a bit on the language as well. Morphology is of course not much of an issue for English, but as far as I know (and I'm by no means an expert here), neither is it usually one for Finnish, either, since the only inflections used in Finnish crosswords are typically only the singular and plural nominative and the first infinitive. Where it gets challenging are answers containing several words without indicated positions for intra-word spaces. The simplest brute force approach is to treat such an answer just as a string of random characters subject to a check for correctness afterwards. With a 15-character string and 26 characters to choose from, we get over 10^21 alternatives. Evaluating all those at a million per second would take only about five million years, and there might be two to four of them in a single puzzle... Of course, not all of the characters are unknown, but rather some characters can be determined based on orthogonal answers, which in turn limit the number of phonotactically permissible characters next to the ones already known. I suppose something like Markov chains together with some heuristics about typical patterns should do a much better job. Still, figuring out an answer like SNOISSIWNOOW might still remain a bit of a challenge.

No comments:

Post a Comment