teaching | Mohammad Moshtaghi

Elegant Derivatives of Large Products

Tue, 30 Nov 2021 11:57:40 -0700

An ongoing research project on modeling computer security has thrown me back into calculus for the first time since my undergraduate differential equations course in 2014, which as an aside was the only college course that forced me to buy the professor’s custom textbook just to get access to homework problems. (I’m still mad about that.) Anyway, my research led me to take the derivative of an expression of the form:

$$\prod_{i=1}^n f_i(x)$$

For the uninitiated, $\prod$ is a mathematical symbol meaning “product” (and is analogous to the $\sum$ symbol meaning “sum”). It says to look at each $f_i$ term as $i$ goes from $1, 2, \ldots, n$ and multiply them all together:

$$\prod_{i=1}^n f_i(x) = f_1(x) \times f_2(x) \times \cdots \times f_n(x)$$

Taking this derivative is something high-school me was much better prepared for than current postdoc me, but after some head scratching I found the elegant solution:

$$\frac{d}{dx} \prod_{i=1}^n f_i(x) = \prod_{i=1}^n f_i(x) \cdot \sum_{i=1}^n \frac{f_i'(x)}{f_i(x)}$$

As is common in mathematics, there’s a direct but tedious way to get this answer and another more elegant way to get the same thing. I’m not the first to derive this solution, but I’m writing it up for posterity because I will inevitably forget this trick in a few days. A PDF of the two derivations without all the handholding I’m about to do is available here.

Method 1: Product Rule

I had to sweep away years of mental cobwebs to remember the product rule for derivatives. My expression is a product, after all, so why not start there? The product rule states:

$$\frac{d}{dx}(u \cdot v) = u' \cdot v + u \cdot v'$$

This works for the product of two terms, but $\prod_{i=1}^n f_i(x)$ has $n$ terms. Let’s try peeling off terms one-by-one, starting with $f_1(x)$, and then applying the product rule.

$$\begin{align*} \frac{d}{dx} \prod_{i=1}^n f_i(x) &= \frac{d}{dx} \left(f_1(x) \cdot \prod_{i=2}^n f_i(x)\right) \\ &= f_1'(x) \cdot \prod_{i=2}^n f_i(x) + f_1(x) \cdot \frac{d}{dx} \left(\prod_{i=2}^n f_i(x)\right) \end{align*}$$

That’s progress, but now we have to repeat the process to deal with the $\frac{d}{dx} \left(\prod_{i=2}^n f_i(x)\right)$ term, peeling off $f_2(x)$ and applying the product rule again:

$$\begin{align*} \frac{d}{dx} \prod_{i=1}^n f_i(x) &= f_1'(x) \cdot \prod_{i=2}^n f_i(x) + f_1(x) \cdot \frac{d}{dx} \left(\prod_{i=2}^n f_i(x)\right) \\ &= f_1'(x) \cdot \prod_{i=2}^n f_i(x) + f_1(x) \\ &\quad \cdot \left(f_2'(x) \cdot \prod_{i=3}^n f_i(x) + f_2(x) \cdot \frac{d}{dx} \left(\prod_{i=3}^n f_i(x)\right)\right) \end{align*}$$

This expression’s going to have more layers than an ogre! If we were to do the peel-and-product rule strategy $n-1$ times, we would arrive at:

$$\begin{align*} \frac{d}{dx} \prod_{i=1}^n f_i(x) &= f_1'(x) \cdot \prod_{i=2}^n f_i(x) + f_1(x) \\ &\quad \cdot \left(f_2'(x) \cdot \prod_{i=3}^n f_i(x) + f_2(x) \right. \\ &\quad \cdot \left(f_3'(x) \cdot \prod_{i=4}^n f_i(x) + f_3(x) \cdots + f_{n-2}(x) \right. \\ &\quad \left. \left. \cdot \left(f_{n-1}'(x) \cdot f_n(x) + f_n'(x) \cdot f_{n-1}(x)\right) \cdots \right) \right) \end{align*}$$

Do you see the pattern? If we distribute the $f_i(x)$ terms at the end of each of the lines above into the parentheses that immediately follow,

$$\begin{align*} \frac{d}{dx} \prod_{i=1}^n f_i(x) &= f_1'(x) \cdot \prod_{i=2}^n f_i(x) + f_2'(x) \cdot f_1(x) \cdot \prod_{i=3}^n f_i(x) \\ &\quad + f_3'(x) \cdot f_1(x) \cdot f_2(x) \cdot \prod_{i=4}^n f_i(x) \\ &\quad + f_{n-1}'(x) \cdot f_1(x) \cdots f_{n-2}(x) \cdot f_n(x) \\ &\quad + f_n'(x) \cdot f_1(x) \cdots f_{n-1}(x) \end{align*}$$

We’re adding up a bunch of terms, each of which is a derivative $f_i'(x)$ multiplied by all the other $f_j(x)$’s, where $j \neq i$. Landing this messy, notation-heavy plane, we can rewrite this long sum as

$$\frac{d}{dx} \prod_{i=1}^n f_i(x) = \sum_{i=1}^n f_i'(x) \cdot \frac{\prod_{j=1}^n f_j(x)}{f_i(x)} = \prod_{i=1}^n f_i(x) \cdot \sum_{i=1}^n \frac{f_i'(x)}{f_i(x)},$$

which is the solution we came for.

Method 2: Leveraging Logarithms

Mathematicians crave elegance, and Method 1 wasn’t that. We had to keep keen eyes out for patterns as they emerged and carefully distribute/rearrange many terms to get the solution. This second method will be much cleaner, but as many clever methods do, it begins with an unintuitive step.

Let $F(x) = \prod_{i=1}^n f_i(x)$. Taking the natural logarithm of both sides,

$$\ln(F(x)) = \ln\left(\prod_{i=1}^n f_i(x)\right) = \sum_{i=1}^n \ln(f_i(x)),$$

where the last equality follows from the fact that the logarithm of a product is equal to the sum of logarithms. The derivative of the natural logarithm is:

$$\frac{d}{dx}\ln(x) = \frac{1}{x}$$

Using chain rule to take the derivative of both sides of the above equality,

$$\begin{align*} \frac{d}{dx}\ln(F(x)) &= \frac{d}{dx}\sum_{i=1}^n \ln(f_i(x)) \\ \frac{1}{F(x)} \cdot \frac{dF}{dx} &= \sum_{i=1}^n \frac{1}{f_i(x)} \cdot f_i'(x) \\ \frac{dF}{dx} &= F(x) \cdot \sum_{i=1}^n \frac{f_i'(x)}{f_i(x)} \end{align*}$$

If we plug $F(x) = \prod_{i=1}^n f_i(x)$ back in, out pops the solution:

$$\frac{d}{dx} \prod_{i=1}^n f_i(x) = \prod_{i=1}^n f_i(x) \cdot \sum_{i=1}^n \frac{f_i'(x)}{f_i(x)}$$

Concluding Thoughts

Having initially stumbled upon Method 1, I was surprised that this derivative came out so beautifully in the end. Credit where it’s due: I adapted the elegant Method 2 from this post regarding infinite products. I have some technical notes at the end of the linked PDF about when Method 2 breaks down, though I believe Method 1 does not have any of these problems.

Hopefully this was fun (and if it wasn’t, I doubt you read this far 😉). This is one of many examples in mathematics of finding multiple solutions to the same problem and learning something different each time. Next time you emerge victorious from a tedious derivation, see if you can find a clever alternative!

Pool Testing is k-ary Search for COVID-19

Thu, 09 Jul 2020 14:04:44 -0700

I’ve seen a handful of articles in the last week about pool testing and sample pooling for COVID-19. The gist of this technique is simple: to make our limited supply of test kits (reagents) stretch farther, mix several people’s samples together and then test the mixture for COVID-19. Ideally, if the test comes back negative, everyone in that “sample pool” could be declared COVID-free using just one test. Otherwise, if the mixture comes back positive, then additional tests are needed to find the individual positive samples.

I’m far from a public health and epidemiology expert, but as an enthusiastic computer scientist I can’t help but see binary search (or more generally, $k$-ary search) all over this pool testing technique. My goal in this post is to draw out this connection, illustrating why the efficiency of $k$-ary search (and of pool testing, by extension) is so attractive. I’ll also do some crude math to explore what happens to pool testing if there are too many infections in the population and how many tests it could actually save.

If you’re not interested in any of that but want to know more about pool testing, NPR has a short primer and the FDA has put out a statement on the topic. On the other hand, if you’re interested in more serious research on the topic, this footnote¹ is for you.

Finding Things Fast

In a search problem, the goal is to find a specific item $x^*$ amidst a whole bunch of possibilities $(x_1, x_2, \ldots, x_n)$. It’s like Where’s Waldo if Waldo were $x^*$ and each $x_i$ was a person on the page. Or, perhaps more practically, it’s the way you give your favorite maps app a specific address $x^*$ and it has to find that place’s data (latitude/longitude, reviews, etc.) amongst all possible places $x_i$ in its gigantic database.

A not-so-great approach to playing Where’s Waldo is to make a list of every person on the page and check them one-by-one to see if they’re Waldo. (This is, perhaps, an entertaining way to play since you get to appreciate all the fun and ridiculous details, but we’re going for efficiency here). In computer science, we call this strategy — or “algorithm” — brute force search: when all else fails, just try everything. How bad can this be? Well, in the worst case Waldo might be the very last person in our list, so we would have had to check all $n$ people before we found him. So, knowing that the least clever search strategy makes us try $n$ times before we find Waldo, can we do better?

Imagine that you have a friend who already knows where Waldo is and has agreed to help you out by telling you which half of the page he’s on (left or right). Now you only have to search half of the possibilities, or roughly $n/2$ people. Say your friend’s really helpful, and they’ll let you do this as many times as you want: they’ll keep telling you which half of the remaining section Waldo’s in until you find him. So at first you have $n$ people to look through, then after your friend helps once you have $n/2$, and after helping again you have $n/4$, and then $n/8$, and so on. In general, if your friend has helped you $i$ times, then you only have $n/(2^i)$ people to look through. Doing a little algebra, after your friend has helped you $\log_2(n)$ times, you only have $n/(2^{\log_2(n)}) = n/n = 1$ person left, and that person is Waldo. This strategy is binary search: “binary” because your friend splits the possibilities in two each time they help.

It may have been a while since you’ve brushed up on your logarithms, so let’s look at some numbers to get a sense of how fast this is. An average Where’s Waldo puzzle has roughly $n = 500$ people on it. So, if we were to use brute force search, we’d be looking through all $n = 500$ people. However, if we had our helpful friend to do binary search with us, we’d only need to ask their help $\log_2(500) \approx 9$ times before we found Waldo! Not bad, hey?

The key to binary search is in throwing away whole chunks of possibilities all at once, rapidly narrowing our search field. $k$-ary search is just a generalization, where instead of splitting the possibilities in half each time, we split them into $k$ equal parts. We then keep the $n/k$-sized section containing our target and repeat, requiring a total of $\log_k(n)$ search operations to find our target (which is even faster than binary search!). Before we call it quits with Waldo and get back to pool testing, it’s worth pointing out that there are plenty of ways to make searching even faster (especially with parallel algorithms). There are even search algorithms specifically for finding Waldo, if that’s your jam.

Pool Testing, Meet k-ary Search

Shifting our perspective on testing from personal experience (“Have I contracted COVID-19?") to the population level (“Who in our communities are positive?"), we can frame testing as a search problem. Among all individuals $(x_1, x_2, \ldots, x_n)$, we want to find those $x_i$’s that are positive for COVID-19 so they can isolate and get timely medical care. In this setting, brute force search would be individually testing every single person — and that’s way more tests than we have. The current US strategy of only testing symptomatic cases isn’t more “efficient” than brute force, it’s just brute force search on a (hopefully) small subset of the total population.

Pool testing, on the other hand, is similar to $k$-ary search. Just like our helpful friend telling us which half of the page Waldo isn’t on, a single pool test ideally shows us a whole group of people that we can declare COVID-19 negative; in our search for positive cases, we can look elsewhere.

Consider the situation depicted above, with $128$ total individuals needing tests but only $4$ of them COVID-19 positive (shown in red). If we tested all $128$ people individually (brute force), we would need $128$ tests. But suppose we could reliably test pools of up to $16$ people at once, as shown in the first row. Only three of the $128/16 = 8$ pool tests would come back positive, allowing us to immediately rule out $5 \times 16 = 80$ negative individuals. For the three positive pools, we need to do additional tests to find the positive cases. This could be done by individually testing everyone in the positive pools² — effectively using smaller brute force — requiring $3 \times 16 = 48$ additional tests. Alternatively (assuming we had enough samples), we could do pool testing again, this time on smaller pools.³ So long as there aren’t too many positive samples (something we’ll revisit shortly), this will help us save on tests: in the picture, we only end up needing $30$ additional tests to handle the $48$ individuals in positive pools. The table below compares the number of tests used by each testing strategy in this example, along with the number of samples per person required.

Testing Strategy	Tests Used	Tests Saved	Samples per Person
Individual (brute force)	$128$	$0\%$	$1$
Pool Testing (one round)	$56$	$56\%$	$1$
Pool Testing (repeated)	$38$	$70\%$	$> 1$

What Can Go Wrong?

At least a few serious things:

Added logistical complications for labs, including tracking which individuals' samples are in which pools, added mixing steps, etc.
The possibility of larger false-negative rates (i.e., testing negative when the sample was, in fact, positive) due to samples getting diluted when they’re mixed into pools.⁴
A loss of efficiency when there are too many positive samples.

For the rest of this post, I’m going to focus on that last problem. Pool testing’s strength lies in its ability to rule out large groups of negative cases at once, as we saw in the example. But with too many positive cases, the likelihood of pools testing positive becomes much higher, requiring a greater number of subsequent tests. At what point does pool testing end up using more tests than if we just tested everyone individually?

Instead of repeated pool testing (like in our example above), consider the more practical protocol of doing only one round of pool testing and then testing all samples in positive pools individually. If we have $n$ total samples, our pool size is $p$, and $x$ pool tests come back positive, then we will use a total of $n/p + xp$ tests: $n/p$ pool and $xp$ individual. Since we’d use $n$ tests doing brute force, pool testing only saves us tests if $n/p + xp < n$, or equivalently, if: $$x < (n/p)(1 - 1/p)$$

To connect the number of positive pools $x$ we’ll have based on how many positive cases are in the population, let $r$ be the probability that any given sample is COVID-19 positive. The probability that a pool of size $p$ contains a positive sample is $1 - (1 - r)^p$. There are $n/p$ total pools, so the expected (average) number of positive pools is: $$E[x] = (n/p)(1 - (1 - r)^p)$$ Therefore, we save on tests using pool testing in expectation if: $$\begin{align*} (n/p)(1 - (1 - r)^p) &< (n/p)(1 - 1/p) \\ r &< 1 - p^{-1/p} \end{align*}$$

The above graph plots this efficient region. The x-axis is the pool size $p$ and the y-axis is the infection rate $r$. Any point in the blue region represents a $(p,r)$ pair for which pool testing uses less tests than brute force in expectation. For example, with a pool size of $p = 16$, our crude math suggests that pool testing is more efficient for infection rates up to, coincidentally, $r \approx 16\%$. This would be a terrifyingly high infection rate (at the time of this writing, the CDC reports a $\approx 1\%$ positive case rate in the US), so it seems that in realistic settings even one round of pool testing would save a lot of tests.

Taking Our Savings to the Bank

To recap, we’ve seen how pool testing is related to $k$-ary search, how both of these techniques leverage the ability to rule out large groups of possibilities to narrow things down quickly, and how pool testing will save us tests so long as the infection rate is low enough. The last thing to nail down is how many tests are actually saved. From our work above, we know that the expected number of tests saved using one round of pool testing is: $$\begin{align*} E[\text{# tests saved}] &= \text{# tests for brute force} - E[\text{# tests for pool testing}] \\ &= n - (n/p + E[\text{# positive pools}] \cdot p) \\ &= n - n/p - (n/p)(1 - (1-r)^p)p \\ &= n(-1/p + (1-r)^p) \end{align*}$$ The fraction of tests saved is just the number of tests saved divided by $n$, so a single round of pool testing in pools of size $p$ uses a $(1-r)^p - 1/p$ fraction less tests. The following plot shows this equation for different pool sizes $p$ as a function of the infection rate $r$.

This shows that larger pool sizes have the potential for saving more tests, but also degrade in their efficiency more rapidly as the infection rate $r$ grows. Eventually, if the infection rate becomes too large, all four curves become negative and pool testing ends up using more tests than brute force. This corresponds to leaving the efficient region shown in the previous graph. If you’re having a hard time interpreting these curves, the following table compares pool testing with $p = 4$ (blue) and $p = 16$ (green).

Pool Size	Tests Saved at $r = 1\%$	Tests Saved at $r = 5\%$	Infection Rate for No Savings
$p = 4$	$71\%$	$56\%$	$r = 29\%$
$p = 16$	$79\%$	$38\%$	$r = 16\%$

The key takeaway is that from a purely test-savings perspective, pool testing does a remarkable job for any reasonable pool size. While it is true that pool testing can end up using more tests than traditional testing if the infection rate grows too large, my crude math suggests that the infection rate would need to be quite high ($r > 10\%$) before this inefficiency would be noticeable. At today’s US infection rate of $r \approx 1\%$, we could be $70\%$ more efficient with our limited supply of testing reagents, helping more people get tested more often. Perhaps if this technique were widely used, we would all be that much closer to safely getting on with our lives.

Cherif et al. 2020 detail a recent mathematical model for pool testing and Yelin et al. 2020 characterize its dilution and false-negative rates. The references in the Yelin paper also cover deeper mathematical analysis (e.g., Dorfman 1943) and how pool testing has been used to combat infectious disease in the past (e.g., for malaria in Taylor et al. 2010). ↩︎
From my brief sifting of the literature, practical implementations of pool testing only use one round of pool tests: roughly $65\%$ of each sample is used in a pool test, and the remaining $35\%$ is saved for an individual test in case the pool comes back positive. Repeated pool testing doesn’t seem to be used in practice, presumably because it would rely on collecting additional swab samples per person. ↩︎
I’ve depicted the subpool tests in a binary search kind of way where each positive pool is split in half and retested, but they could be broken into any number of groups. ↩︎
Yelin et al. 2020 estimate that the false-negative rates for pools as large as $16$ or $32$ is roughly $10\%$ based on their experiments. I chose not to do my speculative math for dilution and false-negative rates because I don’t understand how PCR tests actually work and don’t believe in blindly oversimplifying things for the sake of nice math. ↩︎

Solving the Molecube by Reduction

Mon, 06 Jan 2020 13:00:00 -0700

My sister-in-law surprised me with a Molecube for Christmas, which combines the logic of a Sudoku puzzle with the mechanics of a Rubik’s cube. Each ball on the Molecube is one of nine colors, and the goal is to reconfigure a shuffled Molecube so each of its faces has all nine colors on it.

It turns out that solving the Molecube is a wonderful exercise in what computer scientists call reduction, which involves transforming a problem we don’t know how to solve into one that we do, solving that version, and then translating the solution back into the original problem. In this post, I’ll give a reader-friendly primer on reduction, outline a reduction from the Molecube to the Rubik’s cube, and then wrap up by solving the Molecube with a standard Rubik’s cube algorithm.

If you’re interested in solving the Molecube yourself (and perhaps you’re here looking for hints), I’ve created a worksheet you can use and will point out when you should skip ahead in the post so as to avoid spoiling this delightful puzzle.

Reduction: There and Back Again

Say we have a problem $A$ that we don’t know how to solve and another problem $B$ that we do know how to solve. Said another way, we have an “algorithm” that can answer any question asked in the form of problem $B$, but we have no such algorithm for problem $A$ questions. The whole idea of reduction is to:

Take a problem $A$ question (which we don’t know how to answer) and “transform” it into a problem $B$ question (which we do know how to answer).
Use the “algorithm” for problem $B$ questions to get an answer for our transformed problem $A$ question.
Translate the algorithm’s answer back into the context of problem $A$.

That’s it. We use a known algorithm (for problem $B$) as our workhorse for a new and unknown problem (problem $A$), and out pop solutions for our unknown problem! This powerful technique underlies almost all of theoretical computer science, giving us a tool to relate difficult problems to one another (as opposed to treating every new problem as something totally unique). I recently heard a high-profile professor in Computer Science claim that reduction is one of only two truly new ideas our discipline has ever contributed to science (though she called this idea “hierarchy”, with the other idea being “abstraction”).

A natural question to ask next would be if all problems can be tackled with reduction. Unfortunately, in practice, finding the right translation between a pair of problems (Step 1, above) can be prohibitively difficult. Reduction is easiest when the two problems seem to have an obvious relationship we can exploit, which brings us to the Molecube and the Rubik’s cube.

A Sudoku-Like Transformation

It’s difficult to overstate how much of the Molecube’s solution is given away in its advertising as “Sudoku + Rubik’s cube”. My solution will treat this equation quite literally, starting with a transformation that relies on a Sudoku puzzle. I found this transformation by asking two simple questions:

How are the Molecube and the Rubik’s cube similar?

They’re both 3x3x3 cubes, meaning they both have 6 faces, 8 corners, and 12 edges. This totals 26 balls (on the Molecube) or blocks (on the Rubik’s cube).
Their physical mechanics (spinning, twisting, etc.) are identical.
Their goals are, in a way, also identical: from a shuffled configuration, reach a goal configuration.

How are the Molecube and the Rubik’s cube different?

Their goal configurations are different: the Molecube wants one ball of each color on each face, while the Rubik’s cube wants each face to be all the same color.
There are nine colors on the Molecube, but only six on the Rubik’s cube.
A block on the Rubik’s cube has 1–3 colors (one for each face it touches), while a ball on the Molecube is the same color on all “sides”.

The similarities hint at a solution: though the goal configurations are different, the cubes' structures and mechanics are the same. So if I find a goal configuration for the Molecube, I can use the Rubik’s cube algorithm to handle all the tricky rearranging involved in actually getting there.

If that’s enough of a framework for you to attempt your own solution, feel free to download the worksheet I’ve created to help you visualize the Molecube as a Sudoku puzzle (with colors instead of numbers). You’ll want to stop reading here and come back once you’ve completed a goal configuration or if you’re stuck and need hints.

Speaking of hints, the best way to unlock this tricky Sudoku-like color puzzle (getting one color on each face) is to study the Molecube’s colors and structure. I asked myself the following questions (which culminated in the table at the top of the worksheet):

How does the Molecube fit nine colors on a cube with 26 balls?
How are the colors distributed over the different types of balls (centers, corners, and edges)?
Are there any patterns that appear when trying to place balls of a certain color so it appears on each face exactly once?

The answers to these questions are revealing. There are three balls of each color, with the important exception of Green, which only has two. Further, Green is the unique color that is on two corners. Red and Purple are each on three edges, and the remaining six colors (White, Black, Orange, Yellow, Light Blue, and Blue) are each on one center, one corner, and one edge. This information — after some careful thinking — reveals the patterns we need:

As in the Rubik’s cube, centers opposite one another are always opposite one another (i.e., Black is always opposite White, Orange is always opposite Yellow, and Blue is always opposite Light Blue).
The only way the two Green corners avoid being on the same face is if they’re opposite one another (e.g., upper-right-back and lower-left-front).
The only way the three Red (or Purple) edges avoid being on the same face is if a Red (or Purple) appears exactly once in each “middle band” (shown below, left). So each middle band contains exactly one Red and one Purple.
For any of the remaining colors, (e.g., Blue) there is a center of that color. This blocks the corner of that color from being in the same “layer”, so the corner must go in the layer opposite the center (shown below, right). Once the position of the corner is fixed, there is only one position the edge of that color can go.

Using these rules and some trial and error, I found the solution shown below, also detailed in the worksheet solution. (An interesting aside: I don’t know if this is the only solution, but any solution works for the reduction). To relate the Molecube solution to the Rubik’s cube solution, we simply treat each 3×3 face on the Molecube as a “color” on the Rubik’s cube. (For example, the Black ball at the top-left of the White-center face becomes the Red-White-Green block on the Rubik’s cube). This completes the transformation step of the reduction.

Rubik-ing the Molecube

With the difficult transformation step out of the way, the rest of the reduction is easy. You can solve any shuffled Molecube just like you would a Rubik’s cube (assuming you know how to do that), but instead of aiming to have faces with all the same color, you aim to build the goal configuration we got from the transformation (above, left). To make this easier for me to visualize, I made a 3D rendering of my solution (front view on the left, back view on the right).

I was pleasantly surprised at how much Rubik’s cube muscle memory I still had from speedcubing in junior high (though if you need a refresher, I found this tutorial helpful). Interestingly, some of the steps in the Rubik’s cube “algorithm” are unnecessary for the Molecube. Remember the earlier observation that the Molecube’s balls are each a single color while the Rubik’s cube blocks can have 1–3 colors each? This means that the Molecube doesn’t care if its balls are rotated “in place”, though this is a problem for the Rubik’s cube (see the example below). So any steps in the Rubik’s cube algorithm that are meant to fix things like this can be skipped entirely.

But Does It Work?

To show that the reduction approach not only works but is also reasonably fast, here’s me solving the Molecube in just under 2 minutes.

There are a lot of Rubik’s cube-inspired puzzles these days (for example, the Ghost Cube and the Pyraminx). But solving the Molecube by reduction makes me wonder just how many of these new puzzles share a similar relationship to the original Rubik’s cube. If these relationships exist, we’d find that these new puzzles aren’t really new at all; they’re just an old puzzle we know how to solve, but with new names and nice packaging.