add task list

2025-07-21 20:32:25 -07:00 · 2025-07-21 20:32:25 -07:00 · b731339fff
commit b731339fff
parent 3b278a848c
1 changed files with 65 additions and 0 deletions
--- a/nlnet-2024-12-324.txt
+++ b/nlnet-2024-12-324.txt
@ -0,0 +1,65 @@
+takentaal v1.0
+
+# {50000} Libre-Chip's First CPU Architecture And Formal Proof of No Spectre bugs
+
+Modern computers suffer from a constant stream of new speculative-execution security flaws (Spectre-style bugs). To address this major category of flaws, we are working towards building a high-performance computer processor (CPU) with speculative execution and working on a mathematical proof that it doesn't suffer from any speculative-execution data leaks, thereby demonstrating that this major category of flaws can be eliminated without crippling the computer's performance.
+
+https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability#Timeline
+
+## {6000} Adding Rocq output in Fayalite
+
+This is for adding the code for translating Fayalite HDL to Rocq, as well as determining how exactly we'll describe HDL in Rocq. I expect the translation code to be of comparable size to the compiler portion of the simulator (the simulator is broken into three main parts, a compiler to an IR optimized for interpretation, the interpreter itself, and the code for reading/writing simulator I/O and handling time simulation), so somewhere around 5000 LoC.
+
+- {2000} Figure out how exactly we should represent HDL in Rocq, writing down a manually-translated version of common HDL components (e.g. how to translate a register, a memory, an add/sub/mul/div, etc.).
+- {4000} Write the code to do the translation in Fayalite.
+
+## {4000} Adding supporting code for generating FPGA bitstreams from Fayalite
+
+This is for adding the tooling to run all the right programs to generate FPGA bitstreams, as well as adding code to handle connecting I/O ports to the FPGA pins. I expect this to be on the order of 1000-2000 LoC for the FPGA pins code, as well as a few hundred for running all the right programs in sequence.
+
+- {2000} Write support for board interface descriptions and the code for running the FPGA toolchain (similar to the existing code for running SymbiYosys -- the current formal verification toolchain).
+- {1000} Add support for the Orange Crab since both Cesar and Jacob have one.
+- {1000} Add support for the Arty A7 100T since that's what we're using for CI.
+
+## {10000} Register Renaming, Execution, and Instruction Retire
+
+This covers getting register renaming working, as well as scheduling, executing simple ALU and Branch instructions, and properly handling instruction retire. (Some of that work is already done.)
+
+A lot of this is the work to come up with a detailed low-level plan for the CPU, so I don't have a good idea of how complex or not this is, though I expect it to be probably 40% of the CPU's complexity.
+
+- {1500} Add to the simulator in Fayalite the ability to transfer non-HDL data (e.g. HashMap) through the digital signalling mechanism, this allows using those data types when writing procedural models.
+- {6000} Create a model of the whole rename/execute/retire control system, using procedural implementations of the most complex HDL modules where appropriate.
+- {2500} Translate the procedural model to use actual synthesizeable HDL. includes a proof of correctness of the out-of-order CPU in relation to a sequential CPU (probably most easily done by adding the proof to the retire stage).
+
+## {8000} Instruction Fetch/Decode
+
+This covers instruction fetch, decoding, and caching. For the decoder, unless OpenPower has gotten around to releasing the Latex source code, I'm expecting to use a parser I wrote that parses the instruction descriptions out of the PowerISA v3.1C PDF and writes out XML.
+
+https://git.libre-chip.org/libre-chip/parse_powerisa_pdf
+
+- {1000} Create the next-instruction logic -- includes some sort of branch prediction or branch target buffer so we can actually keep the rest of the CPU pipeline full. This should support fetching more than one instruction per clock.
+- {1000} Create the fetch and i-cache logic.
+- {2000} Create the PowerISA decoder -- it translates to the internal microcode. For now, only needs to support a reasonable subset of 64-bit LE integer instructions in problem mode (aka. user mode), FP and VMX/VSX can be disabled.
+- {2000} Create a model of the instruction fetch/decode control system, using procedural implementations of the most complex HDL modules where appropriate.
+- {2000} Translate the procedural model to use actual synthesizeable HDL.
+
+## {10000} Load/Store instructions
+
+This covers implementing the load/store hierarchy, including an L1 cache. For now, the CPU will only target on-FPGA memory blocks, as well as simple I/O devices. (Support for DRAM can be added at a later point outside of this grant.)
+It should include d-cache, some kind of memory, and at least one IO device.
+
+It should include at least lr/sc, some atomic fetch-op, cached load/store, and IO load/store (IO needs to wait until non-speculative to start executing).
+
+- {1000} memory system: main memory and IO devices -- I'm expecting just a big sram to be good enough for simulation of memory, on the fpga we could probably get away with a relatively small sram and put off a dram interface for later. for the IO device, I'm thinking we'd have a simple fixed-frequency uart.
+- {1000} d-cache
+- {2500} memory load execution unit (we'll want to be able to do more than one load at once)
+- {2500} memory store execution unit
+- {2000} adding atomics: lr/sc, atomic fetch-add (or other fetch-op)
+- {1000} adding order-violation detection logic, so we can make memory look like it has total-store-order (for x86), or even sequential consistency (meaning we can ignore all non-IO fences)
+
+## {12000} Work towards the Formal Proof of No Spectre bugs
+
+This covers working on the Formal Proof of No Spectre bugs as well as improvements to other software/hardware needed for that. A major portion of this is to figure out what exact properties we need to formally prove for each part of the CPU, so we can put those parts together to make a working proof for the entire CPU. I'm hoping this will be enough work to get nearly all of the proof written out, even if there are some flaws we discover that we'll have to put off fixing for a later grant.
+
+- {3000} Write Rocq and HDL logic for tracking which instructions will eventually be cancelled and which will eventually be retired.
+- {9000} Attempt Proof that our CPU but with zeroed outputs for all eventually-cancelled instructions is equivalent to our real CPU design. This may need significant modifications to the CPU. This task may be too big and need further subdividing once we've made some progress on the proof so we know where to subdivide it.