mirror of
				https://github.com/YosysHQ/yosys
				synced 2025-11-04 05:19:11 +00:00 
			
		
		
		
	There are some leftovers, but this is an easy regex-based approach that removes most of them.
		
			
				
	
	
		
			166 lines
		
	
	
	
		
			7.5 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			166 lines
		
	
	
	
		
			7.5 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
Hashing and associative data structures in Yosys
 | 
						|
------------------------------------------------
 | 
						|
 | 
						|
Container classes based on hashing
 | 
						|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
Yosys uses ``dict<K, T>`` and ``pool<T>`` as main container classes.
 | 
						|
``dict<K, T>`` is essentially a replacement for ``std::unordered_map<K, T>``
 | 
						|
and ``pool<T>`` is a replacement for ``std::unordered_set<T>``.
 | 
						|
The main characteristics are:
 | 
						|
 | 
						|
* ``dict<K, T>`` and ``pool<T>`` are about 2x faster than the std containers
 | 
						|
   (though this claim hasn't been verified for over 10 years)
 | 
						|
 | 
						|
* references to elements in a ``dict<K, T>`` or ``pool<T>`` are invalidated by
 | 
						|
   insert and remove operations (similar to ``std::vector<T>`` on ``push_back()``).
 | 
						|
 | 
						|
* some iterators are invalidated by ``erase()``. specifically, iterators
 | 
						|
   that have not passed the erased element yet are invalidated. (``erase()``
 | 
						|
   itself returns valid iterator to the next element.)
 | 
						|
 | 
						|
* no iterators are invalidated by ``insert()``. elements are inserted at
 | 
						|
   ``begin()``. i.e. only a new iterator that starts at ``begin()`` will see the
 | 
						|
   inserted elements.
 | 
						|
 | 
						|
* the method ``.count(key, iterator)`` is like ``.count(key)`` but only
 | 
						|
   considers elements that can be reached via the iterator.
 | 
						|
 | 
						|
* iterators can be compared. ``it1 < it2`` means that the position of ``t2``
 | 
						|
   can be reached via ``t1`` but not vice versa.
 | 
						|
 | 
						|
* the method ``.sort()`` can be used to sort the elements in the container
 | 
						|
   the container stays sorted until elements are added or removed.
 | 
						|
 | 
						|
* ``dict<K, T>`` and ``pool<T>`` will have the same order of iteration across
 | 
						|
   all compilers, standard libraries and architectures.
 | 
						|
 | 
						|
In addition to ``dict<K, T>`` and ``pool<T>`` there is also an ``idict<K>`` that
 | 
						|
creates a bijective map from ``K`` to the integers. For example:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
   idict<string, 42> si;
 | 
						|
   log("%d\n", si("hello"));      // will print 42
 | 
						|
   log("%d\n", si("world"));      // will print 43
 | 
						|
   log("%d\n", si.at("world"));   // will print 43
 | 
						|
   log("%d\n", si.at("dummy"));   // will throw exception
 | 
						|
   log("%s\n", si[42]));  // will print hello
 | 
						|
   log("%s\n", si[43]));  // will print world
 | 
						|
   log("%s\n", si[44]));  // will throw exception
 | 
						|
 | 
						|
It is not possible to remove elements from an idict.
 | 
						|
 | 
						|
Finally ``mfp<K>`` implements a merge-find set data structure (aka. disjoint-set
 | 
						|
or union-find) over the type ``K`` ("mfp" = merge-find-promote).
 | 
						|
 | 
						|
The hash function
 | 
						|
~~~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
The hash function generally used in Yosys is the XOR version of DJB2:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
   state = ((state << 5) + state) ^ value
 | 
						|
 | 
						|
This is an old-school hash designed to hash ASCII characters. Yosys doesn't hash
 | 
						|
a lot of ASCII text, but it still happens to be a local optimum due to factors
 | 
						|
described later.
 | 
						|
 | 
						|
Hash function quality is multi-faceted and highly dependent on what is being
 | 
						|
hashed. Yosys isn't concerned by any cryptographic qualities, instead the goal
 | 
						|
is minimizing total hashing collision risk given the data patterns within Yosys.
 | 
						|
In general, a good hash function typically folds values into a state accumulator
 | 
						|
with a mathematical function that is fast to compute and has some beneficial
 | 
						|
properties. One of these is the avalanche property, which demands that a small
 | 
						|
change such as flipping a bit or incrementing by one in the input produces a
 | 
						|
large, unpredictable change in the output. Additionally, the bit independence
 | 
						|
criterion states that any pair of output bits should change independently when
 | 
						|
any single input bit is inverted. These properties are important for avoiding
 | 
						|
hash collision on data patterns like the hash of a sequence not colliding with
 | 
						|
its permutation, not losing from the state the information added by hashing
 | 
						|
preceding elements, etc.
 | 
						|
 | 
						|
DJB2 lacks these properties. Instead, since Yosys hashes large numbers of data
 | 
						|
structures composed of incrementing integer IDs, Yosys abuses the predictability
 | 
						|
of DJB2 to get lower hash collisions, with regular nature of the hashes
 | 
						|
surviving through the interaction with the "modulo prime" operations in the
 | 
						|
associative data structures. For example, some most common objects in Yosys are
 | 
						|
interned ``IdString``\ s of incrementing indices or ``SigBit``\ s with bit
 | 
						|
offsets into wire (represented by its unique ``IdString`` name) as the typical
 | 
						|
case. This is what makes DJB2 a local optimum. Additionally, the ADD version of
 | 
						|
DJB2 (like above but with addition instead of XOR) is used to this end for some
 | 
						|
types, abandoning the general pattern of folding values into a state value.
 | 
						|
 | 
						|
Making a type hashable
 | 
						|
~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
Let's first take a look at the external interface on a simplified level.
 | 
						|
Generally, to get the hash for ``T obj``, you would call the utility function
 | 
						|
``run_hash<T>(const T& obj)``, corresponding to ``hash_ops<T>::hash(obj)``,
 | 
						|
the default implementation of which uses ``hash_ops<T>::hash_into(Hasher(), obj)``.
 | 
						|
``Hasher`` is the class actually implementing the hash function, hiding its
 | 
						|
initialized internal state, and passing it out on ``hash_t yield()`` with
 | 
						|
perhaps some finalization steps.
 | 
						|
 | 
						|
``hash_ops<T>`` is the star of the show. By default it pulls the ``Hasher h``
 | 
						|
through a ``Hasher T::hash_into(Hasher h)`` method. That's the method you have to
 | 
						|
implement to make a record (class or struct) type easily hashable with Yosys
 | 
						|
hashlib associative data structures.
 | 
						|
 | 
						|
``hash_ops<T>`` is specialized for built-in types like ``int`` or ``bool`` and
 | 
						|
treats pointers the same as integers, so it doesn't dereference pointers. Since
 | 
						|
many RTLIL data structures like ``RTLIL::Wire`` carry their own unique index
 | 
						|
``Hasher::hash_t hashidx_;``, there are specializations for ``hash_ops<Wire*>``
 | 
						|
and others in ``kernel/hashlib.h`` that actually dereference the pointers and
 | 
						|
call ``hash_into`` on the instances pointed to.
 | 
						|
 | 
						|
``hash_ops<T>`` is also specialized for simple compound types like
 | 
						|
``std::pair<U>`` by calling hash_into in sequence on its members. For flexible
 | 
						|
size containers like ``std::vector<U>`` the size of the container is hashed
 | 
						|
first. That is also how implementing hashing for a custom record data type
 | 
						|
should be - unless there is strong reason to do otherwise, call ``h.eat(m)`` on
 | 
						|
the ``Hasher h`` you have received for each member in sequence and ``return
 | 
						|
h;``.
 | 
						|
 | 
						|
The ``hash_ops<T>::hash(obj)`` method is not indended to be called when
 | 
						|
context of implementing the hashing for a record or other compound type.
 | 
						|
When writing it, you should connect it to ``hash_ops<T>::hash_into(Hasher h)``
 | 
						|
as shown below. If you have a strong reason to do so, and you have
 | 
						|
to create a special implementation for top-level hashing, look at how
 | 
						|
``hash_ops<RTLIL::SigBit>::hash(...)`` is implemented in ``kernel/rtlil.h``.
 | 
						|
 | 
						|
Porting plugins from the legacy interface
 | 
						|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
Previously, the interface to implement hashing on custom types was just
 | 
						|
``unsigned int T::hash() const``. This meant hashes for members were computed
 | 
						|
independently and then ad-hoc combined with the hash function with some xorshift
 | 
						|
operations thrown in to mix bits together somewhat. A plugin can stay compatible
 | 
						|
with both versions prior and after the break by implementing both interfaces
 | 
						|
based on the existance and value of ``YS_HASHING_VERSION``.
 | 
						|
 | 
						|
.. code-block:: cpp
 | 
						|
   :caption: Example hash compatibility wrapper
 | 
						|
   :name: hash_plugin_compat
 | 
						|
 | 
						|
   #ifndef YS_HASHING_VERSION
 | 
						|
   unsigned int T::hash() const {
 | 
						|
      return mkhash(a, b);
 | 
						|
   }
 | 
						|
   #elif YS_HASHING_VERSION == 1
 | 
						|
   Hasher T::hash_into(Hasher h) const {
 | 
						|
      h.eat(a);
 | 
						|
      h.eat(b);
 | 
						|
      return h;
 | 
						|
   }
 | 
						|
   Hasher T::hash() const {
 | 
						|
      Hasher h;
 | 
						|
      h.eat(*this);
 | 
						|
      return h;
 | 
						|
   }
 | 
						|
   #else
 | 
						|
   #error "Unsupported hashing interface"
 | 
						|
   #endif
 | 
						|
 | 
						|
Feel free to contact Yosys maintainers with related issues.
 |