Doing ABC runs in parallel can actually make things slower when every ABC run requires
spawning an ABC subprocess --- especially when using popen(), which on glibc does not
use vfork(). What seems to happen is that constant fork()ing keeps making the main
process data pages copy-on-write, so the main process code that is setting up each ABC
call takes a lot of minor page-faults, slowing it down.
The solution is pretty straightforward although a little tricky to implement.
We just reuse ABC subprocesses. Instead of passing the ABC script name on the command
line, we spawn an ABC REPL and pipe a command into it to source the script. When that's
done we echo an `ABC_DONE` token instead of exiting. Yosys then puts the ABC process
onto a stack which we can pull from the next time we do an ABC run.
For one of our large designs, this is an additional 5x speedup of the primary AbcPass.
It does 5155 ABC runs, all very small; runtime of the AbcPass goes from 760s to 149s
(not very scientific benchmarking but the effect size is large).
Large circuits can run hundreds or thousands of ABCs in a single AbcPass.
For some circuits, some of those ABC runs can run for hundreds of seconds.
Running ABCs in parallel with each other and in parallel with main-thread
processing (reading and writing BLIF files, copying ABC BLIF output into
the design) can give large speedups.
The YosysHQ Verific Extensions are compiled separately using their own
stripped-down version of the Yosys headers. To maintain ABI
compatibility with older extension builds post C++-ification of Yosys's
logging APIs, which are backwards compatible on the API but not ABI
level, this commit adds ABI compatible versions of a subset of the old
logging API used by the extensions.
The ID(OVERFLOW) IdString isn't used widely enough that we require a
statically allocated IdString, but I think it's good to have an example
workaround in place in case more collisions come up.
I've used this shell command to obtain the list:
rg -I -t cpp -t yacc -o \
'ID\((\$?[a-zA-Z0-9_]+)\)|ID::($?[a-zA-Z0-9_]+)' -r 'X($1$2)' \
| LC_ALL=C sort -u
This removed the entries X(_TECHMAP_FAIL_) and X(nomem2init).
The vast majority of ID(...) uses are in a context that is overloaded
for StaticIdString or will cause implicit conversion to an IdString
constant reference. For some sufficently overloaded contexts, implicit
conversion may fail, so it's useful to have a method to force obtaining
a `IdString const &` from an ID(...) use.
When turning all literal IdStrings of the codebase into StaticIdStrings
this was needed in exactly one place, for which this commit adds an
`id_string()` call.
The simple XOR `commutative_eat()` implementation produces a lot of collisions.
https://www.preprints.org/manuscript/201710.0192/v1/download is a useful reference on this topic.
Running the included `hashTest.cc` without the hashlib changes, I get 49,580,349 collisions.
The 49,995,000 (i,j) pairs (0 <= i < 10000, i < j < 10000) hash into only 414,651 unique hash values.
We get simple collisions like (0,1) colliding with (2,3).
With the hashlib changes, we get only 707,099 collisions and 49,287,901 unique hash values.
Much better! The `commutative_hash` implementation corresponds to `Sum(4)` in the paper
mentioned above.