Turns out this is not strictly necessary for this PR but it's
still a good thing to do and makes it clearer that the stats
are not modified in a possibly racy way.
We will want to query `keep_cache` from parallel threads. If we compute
the results on-demand, that means we need synchronization for cache
access in those queries, which adds complexity and overhead. Instead, prefill
the cache with the status of all relevant modules. Note that this doesn't
actually do more work --- we always consult `keep_cache` for all cells of
all selected modules, so scanning all those cells and determining the kept
status of all dependency modules is always required.
Later in this PR we're going to parallelize `scan_module` itself, and that's also
much easier to do when no other parallel threads are running.
These updates should not be necessary. In fact, if they were necessary, this code
would be buggy, because the results would depend on the order in which wires are traversed:
If wire A is retained, which causes an update to `used_signals`, which then causes wire B
to be retained when it otherwise wouldn't be, then we would get different results depending
on whether A is visited before B.
These updates will also make it difficult to process these wires in parallel.
The negation here is confusing. The intent of the code is "if `s1` is preferred
over `s2` as the canonical `SigBit` for this signal, make `s1` the canonical `SigBit`
in `assign_map`", so write the code that way instead of "if `s2` is not preferred
over `s1` ...".
This doesn't change any behavior now that `compare_signals()` is a total order,
i.e. `s1` is preferred over `s2`, `s2` is preferred over `s1`, or `s1` and `s2` are equal.
Now, when `s1` and `s2` are equal, we don't call `assign_map.add(s1)`, but that's
already a noop in that case.
Currently when `s1` and `s2` are different bits of the same wire,
it is possible for both `compare_signals(s1, s2)` and `compare_signals(s2, s1)` to
return false. This means the calling code will call `assign_map.add()` for
both `s1` and `s2`, which doesn't make much sense --- one of `s1` or `s2`
should be consistently preferred.
So fix that by preferring the `SigBit` with the smaller bit offset.