For consistency.
Also trying a new thing: only rebuilding objects that use the pybind11 library. The idea is these are the only objects that include the Python/pybind headers and thus the only ones that depend on the Python ABI in any capacity, so other objects can be reused across wheel builds. This has the potential to cut down build times.
- Rewrite all Python features to use the pybind11 library instead of boost::python.
Unlike boost::python, pybind11 is a header-only library that is just included by Pyosys code, saving a lot of compile time on wheels.
- Factor out as much "translation" code from the generator into proper C++ files
- Fix running the embedded interpreter not supporting "from pyosys import libyosys as ys" like wheels
- Move Python-related elements to `pyosys` directory at the root of the repo
- Slight shift in bridging semantics:
- Containers are declared as "opaque types" and are passed by reference to Python - many methods have been implemented to make them feel right at home without the overhead/ambiguity of copying to Python and then copying back after mutation
- Monitor/Pass use "trampoline" pattern to support virual methods overridable in Python: virtual methods no longer require `py_` prefix
- Create really short test set for pyosys that just exercises basic functionality
The simple XOR `commutative_eat()` implementation produces a lot of collisions.
https://www.preprints.org/manuscript/201710.0192/v1/download is a useful reference on this topic.
Running the included `hashTest.cc` without the hashlib changes, I get 49,580,349 collisions.
The 49,995,000 (i,j) pairs (0 <= i < 10000, i < j < 10000) hash into only 414,651 unique hash values.
We get simple collisions like (0,1) colliding with (2,3).
With the hashlib changes, we get only 707,099 collisions and 49,287,901 unique hash values.
Much better! The `commutative_hash` implementation corresponds to `Sum(4)` in the paper
mentioned above.
Checking only happens at compile time if -std=c++20 (or greater) is enabled. Otherwise
the checking happens at run time.
This requires the format string to be a compile-time constant (when compiling with
C++20), so fix a few places where that isn't true.
The format string behavior is a bit more lenient than C printf. For %d/%u
you can pass any integer type and it will be converted and output without
truncating bits, i.e. any length specifier is ignored and the conversion is
always treated as 'll'. Any truncation needs to be done by casting the argument itself.
For %f/%g you can pass anything that converts to double, including integers.
Performance results with clang 19 -O3 on Linux:
```
hyperfine './yosys -dp "read_rtlil /usr/local/google/home/rocallahan/Downloads/jpeg.synth.il; dump"'
```
C++17 before: Time (mean ± σ): 101.3 ms ± 0.8 ms [User: 85.6 ms, System: 15.6 ms]
C++17 after: Time (mean ± σ): 98.4 ms ± 1.2 ms [User: 82.1 ms, System: 16.1 ms]
C++20 before: Time (mean ± σ): 100.9 ms ± 1.1 ms [User: 87.0 ms, System: 13.8 ms]
C++20 after: Time (mean ± σ): 97.8 ms ± 1.4 ms [User: 83.1 ms, System: 14.7 ms]
The generated code is reasonably efficient. E.g. with clang 19, `stringf()` with a format
with no %% escapes and no other parameters (a weirdly common case) often compiles to a fully
inlined `std::string` construction. In general the format string parsing is often (not always)
compiled away.
The lexer for liberty files was using istream's `get` and `unget` which
are notorious for bad performance and that showed up during profiling.
This replaces the direct `istream` use with a custom LibertyInputStream
that does its own buffering to provide `get` and `unget` that behave the
same way but are implemented with a fast path that is easy to inline and
optimize.
In commit ac988cf we made sure to undefine the CONST/VOID macros left
defined by `tcl.h`, but this in turn makes it an issue to include
additional Tcl headers later on (see issue #4808).
One way out is to avoid a global `tcl.h` include. In the process we drop
support for Tcl-enabled MXE builds, which were likely broken anyway due
to the additional Tcl APIs used from `tclapi.cc`.
This adjusts the way the headers kernel/{yosys,rtlil,register,log}.h
include each other to avoid the need of including headers outside of
include guards as well as avoiding the inclusion of rtlil.h in the
middle of yosys.h with rtlil.h depending on the prefix of yosys.h, and
the suffix of yosys.h depending on rtlil.h.
To do this I moved some of the declaration in yosys.h into a new header
yosys_common.h. I'm not sure if that is strictly necessary.
Including any of these files still results in the declarations of all
these headers being included, so this shouldn't be a breaking change for
any passes or external plugins.
My main motivation for this is that ccls's (clang based language server)
include guard handling gets confused by the previous way the includes
were done. It often ends up treating the include guard as a generic
disabled preprocessor conditional, breaking navigation and highlighting
for the core RTLIL data structures.
Additionally I think avoiding cyclic includes in the middle of header
files that depend on includes being outside of include guards will also
be less confusing for developers reading the code, not only for tools
like ccls.