mirror of
https://github.com/YosysHQ/yosys
synced 2025-06-06 06:03:23 +00:00
Reorganising documentation
Also changing to furo theme.
This commit is contained in:
parent
4f1cd66829
commit
045c04096e
40 changed files with 661 additions and 1282 deletions
9
docs/source/using_yosys/index.rst
Normal file
9
docs/source/using_yosys/index.rst
Normal file
|
@ -0,0 +1,9 @@
|
|||
Using Yosys (advanced)
|
||||
======================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
more_scripting
|
||||
memory_mapping
|
||||
yosys_flows
|
654
docs/source/using_yosys/memory_mapping.rst
Normal file
654
docs/source/using_yosys/memory_mapping.rst
Normal file
|
@ -0,0 +1,654 @@
|
|||
.. _chapter:memorymap:
|
||||
|
||||
Memory mapping
|
||||
==============
|
||||
|
||||
Documentation for the Yosys ``memory_libmap`` memory mapper. Note that not all supported patterns
|
||||
are included in this document, of particular note is that combinations of multiple patterns should
|
||||
generally work. For example, `Write port with byte enables`_ could be used in conjunction with any
|
||||
of the simple dual port (SDP) models. In general if a hardware memory definition does not support a
|
||||
given configuration, additional logic will be instantiated to guarantee behaviour is consistent with
|
||||
simulation.
|
||||
|
||||
See also: `passes/memory/memlib.md <https://github.com/YosysHQ/yosys/blob/master/passes/memory/memlib.md>`_
|
||||
|
||||
Additional notes
|
||||
----------------
|
||||
|
||||
Memory kind selection
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The memory inference code will automatically pick target memory primitive based on memory geometry
|
||||
and features used. Depending on the target, there can be up to four memory primitive classes
|
||||
available for selection:
|
||||
|
||||
- FF RAM (aka logic): no hardware primitive used, memory lowered to a bunch of FFs and multiplexers
|
||||
|
||||
- Can handle arbitrary number of write ports, as long as all write ports are in the same clock domain
|
||||
- Can handle arbitrary number and kind of read ports
|
||||
|
||||
- LUT RAM (aka distributed RAM): uses LUT storage as RAM
|
||||
|
||||
- Supported on most FPGAs (with notable exception of ice40)
|
||||
- Usually has one synchronous write port, one or more asynchronous read ports
|
||||
- Small
|
||||
- Will never be used for ROMs (lowering to plain LUTs is always better)
|
||||
|
||||
- Block RAM: dedicated memory tiles
|
||||
|
||||
- Supported on basically all FPGAs
|
||||
- Supports only synchronous reads
|
||||
- Two ports with separate clocks
|
||||
- Usually supports true dual port (with notable exception of ice40 that only supports SDP)
|
||||
- Usually supports asymmetric memories and per-byte write enables
|
||||
- Several kilobits in size
|
||||
|
||||
- Huge RAM:
|
||||
|
||||
- Only supported on several targets:
|
||||
|
||||
- Some Xilinx UltraScale devices (UltraRAM)
|
||||
|
||||
- Two ports, both with mutually exclusive synchronous read and write
|
||||
- Single clock
|
||||
- Initial data must be all-0
|
||||
|
||||
- Some ice40 devices (SPRAM)
|
||||
|
||||
- Single port with mutually exclusive synchronous read and write
|
||||
- Does not support initial data
|
||||
|
||||
- Nexus (large RAM)
|
||||
|
||||
- Two ports, both with mutually exclusive synchronous read and write
|
||||
- Single clock
|
||||
|
||||
- Will not be automatically selected by memory inference code, needs explicit opt-in via
|
||||
ram_style attribute
|
||||
|
||||
In general, you can expect the automatic selection process to work roughly like this:
|
||||
|
||||
- If any read port is asynchronous, only LUT RAM (or FF RAM) can be used.
|
||||
- If there is more than one write port, only block RAM can be used, and this needs to be a
|
||||
hardware-supported true dual port pattern
|
||||
|
||||
- … unless all write ports are in the same clock domain, in which case FF RAM can also be used,
|
||||
but this is generally not what you want for anything but really small memories
|
||||
|
||||
- Otherwise, either FF RAM, LUT RAM, or block RAM will be used, depending on memory size
|
||||
|
||||
This process can be overridden by attaching a ram_style attribute to the memory:
|
||||
|
||||
- `(* ram_style = "logic" *)` selects FF RAM
|
||||
- `(* ram_style = "distributed" *)` selects LUT RAM
|
||||
- `(* ram_style = "block" *)` selects block RAM
|
||||
- `(* ram_style = "huge" *)` selects huge RAM
|
||||
|
||||
It is an error if this override cannot be realized for the given target.
|
||||
|
||||
Many alternate spellings of the attribute are also accepted, for compatibility with other software.
|
||||
|
||||
Initial data
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Most FPGA targets support initializing all kinds of memory to user-provided values. If explicit
|
||||
initialization is not used the initial memory value is undefined. Initial data can be provided by
|
||||
either initial statements writing memory cells one by one of ``$readmemh`` or ``$readmemb`` system
|
||||
tasks. For an example pattern, see `Synchronous read port with initial value`_.
|
||||
|
||||
Write port with byte enables
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Byte enables can be used with any supported pattern
|
||||
- To ensure that multiple writes will be merged into one port, they need to have disjoint bit
|
||||
ranges, have the same address, and the same clock
|
||||
- Any write enable granularity will be accepted (down to per-bit write enables), but using smaller
|
||||
granularity than natively supported by the target is very likely to be inefficient (eg. using
|
||||
4-bit bytes on ECP5 will result in either padding the bytes with 5 dummy bits to native 9-bit
|
||||
units or splitting the RAM into two block RAMs)
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [31 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable[0])
|
||||
mem[write_addr][7:0] <= write_data[7:0];
|
||||
if (write_enable[1])
|
||||
mem[write_addr][15:8] <= write_data[15:8];
|
||||
if (write_enable[2])
|
||||
mem[write_addr][23:16] <= write_data[23:16];
|
||||
if (write_enable[3])
|
||||
mem[write_addr][31:24] <= write_data[31:24];
|
||||
if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Simple dual port (SDP) memory patterns
|
||||
--------------------------------------
|
||||
|
||||
Asynchronous-read SDP
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- This will result in LUT RAM on supported targets
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
always @(posedge clk)
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
assign read_data = mem[read_addr];
|
||||
|
||||
Synchronous SDP with clock domain crossing
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Will result in block RAM or LUT RAM depending on size
|
||||
- No behavior guarantees in case of simultaneous read and write to the same address
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge write_clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
end
|
||||
|
||||
always @(posedge read_clk) begin
|
||||
if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Synchronous SDP read first
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- The read and write parts can be in the same or different processes.
|
||||
- Will result in block RAM or LUT RAM depending on size
|
||||
- As long as the same clock is used for both, yosys will ensure read-first behavior. This may
|
||||
require extra circuitry on some targets for block RAM. If this is not necessary, use one of the
|
||||
patterns below.
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Synchronous SDP with undefined collision behavior
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Like above, but the read value is undefined when read and write ports target the same address in
|
||||
the same cycle
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
|
||||
if (read_enable) begin
|
||||
read_data <= mem[read_addr];
|
||||
|
||||
// 👇 this if block 👇
|
||||
if (write_enable && read_addr == write_addr)
|
||||
read_data <= 'x;
|
||||
end
|
||||
end
|
||||
|
||||
- Or below, using the no_rw_check attribute
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
(* no_rw_check *)
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
|
||||
if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Synchronous SDP with write-first behavior
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Will result in block RAM or LUT RAM depending on size
|
||||
- May use additional circuitry for block RAM if write-first is not natively supported. Will always
|
||||
use additional circuitry for LUT RAM.
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
|
||||
if (read_enable) begin
|
||||
read_data <= mem[read_addr];
|
||||
if (write_enable && read_addr == write_addr)
|
||||
read_data <= write_data;
|
||||
end
|
||||
end
|
||||
|
||||
Synchronous SDP with write-first behavior (alternate pattern)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- This pattern is supported for compatibility, but is much less flexible than the above
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
read_addr_reg <= read_addr;
|
||||
end
|
||||
|
||||
assign read_data = mem[read_addr_reg];
|
||||
|
||||
Single-port RAM memory patterns
|
||||
-------------------------------
|
||||
|
||||
Asynchronous-read single-port RAM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Will result in single-port LUT RAM on supported targets
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
always @(posedge clk)
|
||||
if (write_enable)
|
||||
mem[addr] <= write_data;
|
||||
assign read_data = mem[addr];
|
||||
|
||||
Synchronous single-port RAM with mutually exclusive read/write
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Will result in single-port block RAM or LUT RAM depending on size
|
||||
- This is the correct pattern to infer ice40 SPRAM (with manual ram_style selection)
|
||||
- On targets that don't support read/write block RAM ports (eg. ice40), will result in SDP block RAM instead
|
||||
- For block RAM, will use "NO_CHANGE" mode if available
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[addr] <= write_data;
|
||||
else if (read_enable)
|
||||
read_data <= mem[addr];
|
||||
end
|
||||
|
||||
Synchronous single-port RAM with read-first behavior
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Will only result in single-port block RAM when read-first behavior is natively supported;
|
||||
otherwise, SDP RAM with additional circuitry will be used
|
||||
- Many targets (Xilinx, ECP5, …) can only natively support read-first/write-first single-port RAM
|
||||
(or TDP RAM) where the write_enable signal implies the read_enable signal (ie. can never write
|
||||
without reading). The memory inference code will run a simple SAT solver on the control signals to
|
||||
determine if this is the case, and insert emulation circuitry if it cannot be easily proven.
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[addr] <= write_data;
|
||||
if (read_enable)
|
||||
read_data <= mem[addr];
|
||||
end
|
||||
|
||||
Synchronous single-port RAM with write-first behavior
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Will result in single-port block RAM or LUT RAM when supported
|
||||
- Block RAMs will require extra circuitry if write-first behavior not natively supported
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[addr] <= write_data;
|
||||
if (read_enable)
|
||||
if (write_enable)
|
||||
read_data <= write_data;
|
||||
else
|
||||
read_data <= mem[addr];
|
||||
end
|
||||
|
||||
Synchronous read port with initial value
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Initial read port values can be combined with any other supported pattern
|
||||
- If block RAM is used and initial read port values are not natively supported by the target, small
|
||||
emulation circuit will be inserted
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
reg [DATA_WIDTH - 1 : 0] read_data;
|
||||
initial read_data = 'h1234;
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Read register reset patterns
|
||||
----------------------------
|
||||
|
||||
Resets can be combined with any other supported pattern (except that synchronous reset and
|
||||
asynchronous reset cannot both be used on a single read port). If block RAM is used and the
|
||||
selected reset (synchronous or asynchronous) is used but not natively supported by the target, small
|
||||
emulation circuitry will be inserted.
|
||||
|
||||
Synchronous reset, reset priority over enable
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
|
||||
if (read_reset)
|
||||
read_data <= {sval};
|
||||
else if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Synchronous reset, enable priority over reset
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
if (read_enable)
|
||||
if (read_reset)
|
||||
read_data <= 'h1234;
|
||||
else
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Synchronous read port with asynchronous reset
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
end
|
||||
|
||||
always @(posedge clk, posedge reset_read) begin
|
||||
if (reset_read)
|
||||
read_data <= 'h1234;
|
||||
else if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Asymmetric memory patterns
|
||||
--------------------------
|
||||
|
||||
To construct an asymmetric memory (memory with read/write ports of differing widths):
|
||||
|
||||
- Declare the memory with the width of the narrowest intended port
|
||||
- Split all wide ports into multiple narrow ports
|
||||
- To ensure the wide ports will be correctly merged:
|
||||
|
||||
- For the address, use a concatenation of actual address in the high bits and a constant in the
|
||||
low bits
|
||||
- Ensure the actual address is identical for all ports belonging to the wide port
|
||||
- Ensure that clock is identical
|
||||
- For read ports, ensure that enable/reset signals are identical (for write ports, the enable
|
||||
signal may vary — this will result in using the byte enable functionality)
|
||||
|
||||
Asymmetric memory is supported on all targets, but may require emulation circuitry where not
|
||||
natively supported. Note that when the memory is larger than the underlying block RAM primitive,
|
||||
hardware asymmetric memory support is likely not to be used even if present as it is more expensive.
|
||||
|
||||
Wide synchronous read port
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [7:0] mem [0:255];
|
||||
wire [7:0] write_addr;
|
||||
wire [5:0] read_addr;
|
||||
wire [7:0] write_data;
|
||||
reg [31:0] read_data;
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
if (read_enable) begin
|
||||
read_data[7:0] <= mem[{read_addr, 2'b00}];
|
||||
read_data[15:8] <= mem[{read_addr, 2'b01}];
|
||||
read_data[23:16] <= mem[{read_addr, 2'b10}];
|
||||
read_data[31:24] <= mem[{read_addr, 2'b11}];
|
||||
end
|
||||
end
|
||||
|
||||
Wide asynchronous read port
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Note: the only target natively supporting this pattern is Xilinx UltraScale
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [7:0] mem [0:511];
|
||||
wire [8:0] write_addr;
|
||||
wire [5:0] read_addr;
|
||||
wire [7:0] write_data;
|
||||
wire [63:0] read_data;
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
end
|
||||
|
||||
assign read_data[7:0] = mem[{read_addr, 3'b000}];
|
||||
assign read_data[15:8] = mem[{read_addr, 3'b001}];
|
||||
assign read_data[23:16] = mem[{read_addr, 3'b010}];
|
||||
assign read_data[31:24] = mem[{read_addr, 3'b011}];
|
||||
assign read_data[39:32] = mem[{read_addr, 3'b100}];
|
||||
assign read_data[47:40] = mem[{read_addr, 3'b101}];
|
||||
assign read_data[55:48] = mem[{read_addr, 3'b110}];
|
||||
assign read_data[63:56] = mem[{read_addr, 3'b111}];
|
||||
|
||||
Wide write port
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [7:0] mem [0:255];
|
||||
wire [5:0] write_addr;
|
||||
wire [7:0] read_addr;
|
||||
wire [31:0] write_data;
|
||||
reg [7:0] read_data;
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable[0])
|
||||
mem[{write_addr, 2'b00}] <= write_data[7:0];
|
||||
if (write_enable[1])
|
||||
mem[{write_addr, 2'b01}] <= write_data[15:8];
|
||||
if (write_enable[2])
|
||||
mem[{write_addr, 2'b10}] <= write_data[23:16];
|
||||
if (write_enable[3])
|
||||
mem[{write_addr, 2'b11}] <= write_data[31:24];
|
||||
if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
True dual port (TDP) patterns
|
||||
-----------------------------
|
||||
|
||||
- Many different variations of true dual port memory can be created by combining two single-port RAM
|
||||
patterns on the same memory
|
||||
- When TDP memory is used, memory inference code has much less maneuver room to create requested
|
||||
semantics compared to individual single-port patterns (which can end up lowered to SDP memory
|
||||
where necessary) — supported patterns depend strongly on the target
|
||||
- In particular, when both ports have the same clock, it's likely that "undefined collision" mode
|
||||
needs to be manually selected to enable TDP memory inference
|
||||
- The examples below are non-exhaustive — many more combinations of port types are possible
|
||||
- Note: if two write ports are in the same process, this defines a priority relation between them
|
||||
(if both ports are active in the same clock, the later one wins). On almost all targets, this will
|
||||
result in a bit of extra circuitry to ensure the priority semantics. If this is not what you want,
|
||||
put them in separate processes.
|
||||
|
||||
- Priority is not supported when using the verific front end and any priority semantics are ignored.
|
||||
|
||||
TDP with different clocks, exclusive read/write
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk_a) begin
|
||||
if (write_enable_a)
|
||||
mem[addr_a] <= write_data_a;
|
||||
else if (read_enable_a)
|
||||
read_data_a <= mem[addr_a];
|
||||
end
|
||||
|
||||
always @(posedge clk_b) begin
|
||||
if (write_enable_b)
|
||||
mem[addr_b] <= write_data_b;
|
||||
else if (read_enable_b)
|
||||
read_data_b <= mem[addr_b];
|
||||
end
|
||||
|
||||
TDP with same clock, read-first behavior
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- This requires hardware inter-port read-first behavior, and will only work on some targets (Xilinx, Nexus)
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable_a)
|
||||
mem[addr_a] <= write_data_a;
|
||||
if (read_enable_a)
|
||||
read_data_a <= mem[addr_a];
|
||||
end
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable_b)
|
||||
mem[addr_b] <= write_data_b;
|
||||
if (read_enable_b)
|
||||
read_data_b <= mem[addr_b];
|
||||
end
|
||||
|
||||
TDP with multiple read ports
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- The combination of a single write port with an arbitrary amount of read ports is supported on all
|
||||
targets — if a multi-read port primitive is available (like Xilinx RAM64M), it'll be used as
|
||||
appropriate. Otherwise, the memory will be automatically split into multiple primitives.
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [31:0] mem [0:31];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] <= write_data;
|
||||
end
|
||||
|
||||
assign read_data_a = mem[read_addr_a];
|
||||
assign read_data_b = mem[read_addr_b];
|
||||
assign read_data_c = mem[read_addr_c];
|
||||
|
||||
Not yet supported patterns
|
||||
--------------------------
|
||||
|
||||
Synchronous SDP with write-first behavior via blocking assignments
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Would require modifications to the Yosys Verilog frontend.
|
||||
- Use `Synchronous SDP with write-first behavior`_ instead
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr] = write_data;
|
||||
|
||||
if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
Asymmetric memories via part selection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Would require major changes to the Verilog frontend.
|
||||
- Build wide ports out of narrow ports instead (see `Wide synchronous read port`_)
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [31:0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
wire [1:0] byte_lane;
|
||||
wire [7:0] write_data;
|
||||
|
||||
always @(posedge clk) begin
|
||||
if (write_enable)
|
||||
mem[write_addr][byte_lane * 8 +: 8] <= write_data;
|
||||
|
||||
if (read_enable)
|
||||
read_data <= mem[read_addr];
|
||||
end
|
||||
|
||||
|
||||
Undesired patterns
|
||||
------------------
|
||||
|
||||
Asynchronous writes
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- Not supported in modern FPGAs
|
||||
- Not supported in yosys code anyhow
|
||||
|
||||
.. code:: verilog
|
||||
|
||||
reg [DATA_WIDTH - 1 : 0] mem [2**ADDR_WIDTH - 1 : 0];
|
||||
|
||||
always @* begin
|
||||
if (write_enable)
|
||||
mem[write_addr] = write_data;
|
||||
end
|
||||
|
||||
assign read_data = mem[read_addr];
|
||||
|
8
docs/source/using_yosys/more_scripting.rst
Normal file
8
docs/source/using_yosys/more_scripting.rst
Normal file
|
@ -0,0 +1,8 @@
|
|||
More scripting
|
||||
--------------
|
||||
|
||||
.. toctree::
|
||||
|
||||
opt_passes
|
||||
selections
|
||||
troubleshooting
|
332
docs/source/using_yosys/opt_passes.rst
Normal file
332
docs/source/using_yosys/opt_passes.rst
Normal file
|
@ -0,0 +1,332 @@
|
|||
.. _chapter:opt:
|
||||
|
||||
Optimization passes
|
||||
===================
|
||||
|
||||
.. TODO: copypaste
|
||||
|
||||
Yosys employs a number of optimizations to generate better and cleaner results.
|
||||
This chapter outlines these optimizations.
|
||||
|
||||
Simple optimizations
|
||||
--------------------
|
||||
|
||||
The Yosys pass opt runs a number of simple optimizations. This includes removing
|
||||
unused signals and cells and const folding. It is recommended to run this pass
|
||||
after each major step in the synthesis script. At the time of this writing the
|
||||
opt pass executes the following passes that each perform a simple optimization:
|
||||
|
||||
- Once at the beginning of opt:
|
||||
|
||||
- opt_expr
|
||||
- opt_merge -nomux
|
||||
|
||||
- Repeat until result is stable:
|
||||
|
||||
- opt_muxtree
|
||||
- opt_reduce
|
||||
- opt_merge
|
||||
- opt_rmdff
|
||||
- opt_clean
|
||||
- opt_expr
|
||||
|
||||
The following section describes each of the opt\_ passes.
|
||||
|
||||
The opt_expr pass
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
This pass performs const folding on the internal combinational cell types
|
||||
described in :ref:`chapter:celllib`. This means a cell with all
|
||||
constant inputs is replaced with the constant value this cell drives. In some
|
||||
cases this pass can also optimize cells with some constant inputs.
|
||||
|
||||
.. table:: Const folding rules for $_AND\_ cells as used in opt_expr.
|
||||
:name: tab:opt_expr_and
|
||||
:align: center
|
||||
|
||||
========= ========= ===========
|
||||
A-Input B-Input Replacement
|
||||
========= ========= ===========
|
||||
any 0 0
|
||||
0 any 0
|
||||
1 1 1
|
||||
--------- --------- -----------
|
||||
X/Z X/Z X
|
||||
1 X/Z X
|
||||
X/Z 1 X
|
||||
--------- --------- -----------
|
||||
any X/Z 0
|
||||
X/Z any 0
|
||||
--------- --------- -----------
|
||||
:math:`a` 1 :math:`a`
|
||||
1 :math:`b` :math:`b`
|
||||
========= ========= ===========
|
||||
|
||||
.. How to format table?
|
||||
|
||||
:numref:`Table %s <tab:opt_expr_and>` shows the replacement rules used for
|
||||
optimizing an $_AND\_ gate. The first three rules implement the obvious const
|
||||
folding rules. Note that ‘any' might include dynamic values calculated by other
|
||||
parts of the circuit. The following three lines propagate undef (X) states.
|
||||
These are the only three cases in which it is allowed to propagate an undef
|
||||
according to Sec. 5.1.10 of IEEE Std. 1364-2005 :cite:p:`Verilog2005`.
|
||||
|
||||
The next two lines assume the value 0 for undef states. These two rules are only
|
||||
used if no other substitutions are possible in the current module. If other
|
||||
substitutions are possible they are performed first, in the hope that the ‘any'
|
||||
will change to an undef value or a 1 and therefore the output can be set to
|
||||
undef.
|
||||
|
||||
The last two lines simply replace an $_AND\_ gate with one constant-1 input with
|
||||
a buffer.
|
||||
|
||||
Besides this basic const folding the opt_expr pass can replace 1-bit wide $eq
|
||||
and $ne cells with buffers or not-gates if one input is constant.
|
||||
|
||||
The opt_expr pass is very conservative regarding optimizing $mux cells, as these
|
||||
cells are often used to model decision-trees and breaking these trees can
|
||||
interfere with other optimizations.
|
||||
|
||||
The opt_muxtree pass
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This pass optimizes trees of multiplexer cells by analyzing the select inputs.
|
||||
Consider the following simple example:
|
||||
|
||||
.. code:: verilog
|
||||
:number-lines:
|
||||
|
||||
module uut(a, y); input a; output [1:0] y = a ? (a ? 1 : 2) : 3; endmodule
|
||||
|
||||
The output can never be 2, as this would require ``a`` to be 1 for the outer
|
||||
multiplexer and 0 for the inner multiplexer. The opt_muxtree pass detects this
|
||||
contradiction and replaces the inner multiplexer with a constant 1, yielding the
|
||||
logic for ``y = a ? 1 : 3``.
|
||||
|
||||
The opt_reduce pass
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This is a simple optimization pass that identifies and consolidates identical
|
||||
input bits to $reduce_and and $reduce_or cells. It also sorts the input bits to
|
||||
ease identification of shareable $reduce_and and $reduce_or cells in other
|
||||
passes.
|
||||
|
||||
This pass also identifies and consolidates identical inputs to multiplexer
|
||||
cells. In this case the new shared select bit is driven using a $reduce_or cell
|
||||
that combines the original select bits.
|
||||
|
||||
Lastly this pass consolidates trees of $reduce_and cells and trees of $reduce_or
|
||||
cells to single large $reduce_and or $reduce_or cells.
|
||||
|
||||
These three simple optimizations are performed in a loop until a stable result
|
||||
is produced.
|
||||
|
||||
The opt_rmdff pass
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This pass identifies single-bit d-type flip-flops ($_DFF\_, $dff, and $adff
|
||||
cells) with a constant data input and replaces them with a constant driver.
|
||||
|
||||
The opt_clean pass
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This pass identifies unused signals and cells and removes them from the design.
|
||||
It also creates an ``\unused_bits`` attribute on wires with unused bits. This
|
||||
attribute can be used for debugging or by other optimization passes.
|
||||
|
||||
The opt_merge pass
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This pass performs trivial resource sharing. This means that this pass
|
||||
identifies cells with identical inputs and replaces them with a single instance
|
||||
of the cell.
|
||||
|
||||
The option -nomux can be used to disable resource sharing for multiplexer cells
|
||||
($mux and $pmux. This can be useful as it prevents multiplexer trees to be
|
||||
merged, which might prevent opt_muxtree to identify possible optimizations.
|
||||
|
||||
FSM extraction and encoding
|
||||
---------------------------
|
||||
|
||||
The fsm pass performs finite-state-machine (FSM) extraction and recoding. The
|
||||
fsm pass simply executes the following other passes:
|
||||
|
||||
- Identify and extract FSMs:
|
||||
|
||||
- fsm_detect
|
||||
- fsm_extract
|
||||
|
||||
- Basic optimizations:
|
||||
|
||||
- fsm_opt
|
||||
- opt_clean
|
||||
- fsm_opt
|
||||
|
||||
- Expanding to nearby gate-logic (if called with -expand):
|
||||
|
||||
- fsm_expand
|
||||
- opt_clean
|
||||
- fsm_opt
|
||||
|
||||
- Re-code FSM states (unless called with -norecode):
|
||||
|
||||
- fsm_recode
|
||||
|
||||
- Print information about FSMs:
|
||||
|
||||
- fsm_info
|
||||
|
||||
- Export FSMs in KISS2 file format (if called with -export):
|
||||
|
||||
- fsm_export
|
||||
|
||||
- Map FSMs to RTL cells (unless called with -nomap):
|
||||
|
||||
- fsm_map
|
||||
|
||||
The fsm_detect pass identifies FSM state registers and marks them using the
|
||||
``\fsm_encoding = "auto"`` attribute. The fsm_extract extracts all FSMs marked
|
||||
using the ``\fsm_encoding`` attribute (unless ``\fsm_encoding`` is set to
|
||||
"none") and replaces the corresponding RTL cells with a $fsm cell. All other
|
||||
fsm\_ passes operate on these $fsm cells. The fsm_map call finally replaces the
|
||||
$fsm cells with RTL cells.
|
||||
|
||||
Note that these optimizations operate on an RTL netlist. I.e. the fsm pass
|
||||
should be executed after the proc pass has transformed all RTLIL::Process
|
||||
objects to RTL cells.
|
||||
|
||||
The algorithms used for FSM detection and extraction are influenced by a more
|
||||
general reported technique :cite:p:`fsmextract`.
|
||||
|
||||
FSM detection
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
The fsm_detect pass identifies FSM state registers. It sets the ``\fsm_encoding
|
||||
= "auto"`` attribute on any (multi-bit) wire that matches the following
|
||||
description:
|
||||
|
||||
- Does not already have the ``\fsm_encoding`` attribute.
|
||||
- Is not an output of the containing module.
|
||||
- Is driven by single $dff or $adff cell.
|
||||
- The ``\D``-Input of this $dff or $adff cell is driven by a multiplexer tree
|
||||
that only has constants or the old state value on its leaves.
|
||||
- The state value is only used in the said multiplexer tree or by simple
|
||||
relational cells that compare the state value to a constant (usually $eq
|
||||
cells).
|
||||
|
||||
This heuristic has proven to work very well. It is possible to overwrite it by
|
||||
setting ``\fsm_encoding = "auto"`` on registers that should be considered FSM
|
||||
state registers and setting ``\fsm_encoding = "none"`` on registers that match
|
||||
the above criteria but should not be considered FSM state registers.
|
||||
|
||||
Note however that marking state registers with ``\fsm_encoding`` that are not
|
||||
suitable for FSM recoding can cause synthesis to fail or produce invalid
|
||||
results.
|
||||
|
||||
FSM extraction
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The fsm_extract pass operates on all state signals marked with the
|
||||
(``\fsm_encoding != "none"``) attribute. For each state signal the following
|
||||
information is determined:
|
||||
|
||||
- The state registers
|
||||
|
||||
- The asynchronous reset state if the state registers use asynchronous reset
|
||||
|
||||
- All states and the control input signals used in the state transition
|
||||
functions
|
||||
|
||||
- The control output signals calculated from the state signals and control
|
||||
inputs
|
||||
|
||||
- A table of all state transitions and corresponding control inputs- and
|
||||
outputs
|
||||
|
||||
The state registers (and asynchronous reset state, if applicable) is simply
|
||||
determined by identifying the driver for the state signal.
|
||||
|
||||
From there the $mux-tree driving the state register inputs is recursively
|
||||
traversed. All select inputs are control signals and the leaves of the $mux-tree
|
||||
are the states. The algorithm fails if a non-constant leaf that is not the state
|
||||
signal itself is found.
|
||||
|
||||
The list of control outputs is initialized with the bits from the state signal.
|
||||
It is then extended by adding all values that are calculated by cells that
|
||||
compare the state signal with a constant value.
|
||||
|
||||
In most cases this will cover all uses of the state register, thus rendering the
|
||||
state encoding arbitrary. If however a design uses e.g. a single bit of the
|
||||
state value to drive a control output directly, this bit of the state signal
|
||||
will be transformed to a control output of the same value.
|
||||
|
||||
Finally, a transition table for the FSM is generated. This is done by using the
|
||||
ConstEval C++ helper class (defined in kernel/consteval.h) that can be used to
|
||||
evaluate parts of the design. The ConstEval class can be asked to calculate a
|
||||
given set of result signals using a set of signal-value assignments. It can also
|
||||
be passed a list of stop-signals that abort the ConstEval algorithm if the value
|
||||
of a stop-signal is needed in order to calculate the result signals.
|
||||
|
||||
The fsm_extract pass uses the ConstEval class in the following way to create a
|
||||
transition table. For each state:
|
||||
|
||||
1. Create a ConstEval object for the module containing the FSM
|
||||
2. Add all control inputs to the list of stop signals
|
||||
3. Set the state signal to the current state
|
||||
4. Try to evaluate the next state and control output
|
||||
5. If step 4 was not successful:
|
||||
|
||||
- Recursively goto step 4 with the offending stop-signal set to 0.
|
||||
- Recursively goto step 4 with the offending stop-signal set to 1.
|
||||
|
||||
6. If step 4 was successful: Emit transition
|
||||
|
||||
Finally a $fsm cell is created with the generated transition table and added to
|
||||
the module. This new cell is connected to the control signals and the old
|
||||
drivers for the control outputs are disconnected.
|
||||
|
||||
FSM optimization
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
The fsm_opt pass performs basic optimizations on $fsm cells (not including state
|
||||
recoding). The following optimizations are performed (in this order):
|
||||
|
||||
- Unused control outputs are removed from the $fsm cell. The attribute
|
||||
``\unused_bits`` (that is usually set by the opt_clean pass) is used to
|
||||
determine which control outputs are unused.
|
||||
|
||||
- Control inputs that are connected to the same driver are merged.
|
||||
|
||||
- When a control input is driven by a control output, the control input is
|
||||
removed and the transition table altered to give the same performance without
|
||||
the external feedback path.
|
||||
|
||||
- Entries in the transition table that yield the same output and only differ in
|
||||
the value of a single control input bit are merged and the different bit is
|
||||
removed from the sensitivity list (turned into a don't-care bit).
|
||||
|
||||
- Constant inputs are removed and the transition table is altered to give an
|
||||
unchanged behaviour.
|
||||
|
||||
- Unused inputs are removed.
|
||||
|
||||
FSM recoding
|
||||
~~~~~~~~~~~~
|
||||
|
||||
The fsm_recode pass assigns new bit pattern to the states. Usually this also
|
||||
implies a change in the width of the state signal. At the moment of this writing
|
||||
only one-hot encoding with all-zero for the reset state is supported.
|
||||
|
||||
The fsm_recode pass can also write a text file with the changes performed by it
|
||||
that can be used when verifying designs synthesized by Yosys using Synopsys
|
||||
Formality .
|
||||
|
||||
Logic optimization
|
||||
------------------
|
||||
|
||||
Yosys can perform multi-level combinational logic optimization on gate-level
|
||||
netlists using the external program ABC . The abc pass extracts the
|
||||
combinational gate-level parts of the design, passes it through ABC, and
|
||||
re-integrates the results. The abc pass can also be used to perform other
|
||||
operations using ABC, such as technology mapping (see :ref:`sec:techmap_extern`
|
||||
for details).
|
6
docs/source/using_yosys/selections.rst
Normal file
6
docs/source/using_yosys/selections.rst
Normal file
|
@ -0,0 +1,6 @@
|
|||
Selections
|
||||
~~~~~~~~~~
|
||||
|
||||
See :doc:`/cmd/select`
|
||||
|
||||
Also :doc:`/cmd/show`
|
4
docs/source/using_yosys/troubleshooting.rst
Normal file
4
docs/source/using_yosys/troubleshooting.rst
Normal file
|
@ -0,0 +1,4 @@
|
|||
Troubleshooting
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
See :doc:`/cmd/bugpoint`
|
8
docs/source/using_yosys/yosys_flows.rst
Normal file
8
docs/source/using_yosys/yosys_flows.rst
Normal file
|
@ -0,0 +1,8 @@
|
|||
Flows, command types, and order
|
||||
-------------------------------
|
||||
|
||||
Synthesis granularity
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Formal verification
|
||||
~~~~~~~~~~~~~~~~~~~
|
Loading…
Add table
Add a link
Reference in a new issue