From 0a89fbeebb0885883596bf3c12ff836a3f044f88 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Thu, 28 Nov 2024 19:45:11 -0800 Subject: [PATCH 01/29] add WIP nlnet grant proposal --- src/SUMMARY.md | 2 ++ src/grants/index.md | 1 + src/grants/nlnet-first.md | 62 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 65 insertions(+) create mode 100644 src/grants/index.md create mode 100644 src/grants/nlnet-first.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index b3d79d5..8f2ce4d 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -1,6 +1,8 @@ * [Libre-Chip](README.md) * [Proposal for Libre-Chip's First CPU Architecture](first_arch/index.md) * [Register Renaming](first_arch/register_renaming.md) + * [Grants](grants/index.md) + * [First NLNet Grant Proposal](grants/nlnet-first.md) * [Conduct](Conduct.md) * [License](LICENSE.md) * [GPL 3.0](gpl-3.0.md) diff --git a/src/grants/index.md b/src/grants/index.md new file mode 100644 index 0000000..6533138 --- /dev/null +++ b/src/grants/index.md @@ -0,0 +1 @@ +# Grants diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md new file mode 100644 index 0000000..4c7fd51 --- /dev/null +++ b/src/grants/nlnet-first.md @@ -0,0 +1,62 @@ +# First NLNet Grant Proposal -- First CPU Architecture And Formal Proof of No Spectre bugs + +Project Name: Libre-Chip's First CPU Architecture And Formal Proof of No Spectre bugs + +Website/Wiki: + +# Abstract + +Modern CPUs suffer from a [constant stream of new speculative-execution security flaws](https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability#Timeline) ([Spectre](https://meltdownattack.com/)-style bugs), additionally software has to constantly add more convoluted and slow mitigations to properly inforce security boundaries, such as a WebAssembly VM needing to add code to prevent leaking data from outside the VM to whatever programs are running inside the VM, or a web browser needing to stop JavaScript code from seeing memory that it should not have access to. + +To address this major category of CPU flaws, Jacob Lifshay invented a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks -- this should work for nearly any kind of CPU design. + +We are working towards building a high-performance superscalar CPU with speculative execution and working on proving that it doesn't suffer from any speculative-execution data leaks, thereby demonstrating that this major category of CPU flaws can be fixed once and for all without crippling CPU performance. + +## Relevant Previous Involvement + +* Jacob Lifshay -- TODO + +* Others... TODO + +# Requested support + +Requested Amount €50000 + +## Cost Explanation + +Nearly all of the cost is for labor, with some amount set aside for general project management (e.g. tracking funding, server maintenance, etc.), and for hardware costs such as buying bigger FPGAs so the CPU design will fit. + +TODO, come up with estimated budgets. + +## Compare with existing/historical efforts + +# UPEC, as described in "An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors" + + + +According to , UPEC is limited in that it only works on a specific conservative mechanism (preventing speculative load instructions from executing untill all prior branches are resolved). Jacob Lifshay's idea, by contrast, should work even if the CPU runs speculative loads before resolving branches, with much less stringient conditions on those loads. + +# "RTL Verification for Secure Speculation Using Contract Shadow Logic" + + + +TODO + +## Technical challenges we expect to solve during the project + +We expect to solve 2 technical challenges: + +* To create a working OoO Superscalar CPU with Speculative Execution. +* To make substantial progress on creating a formal proof that the CPU doesn't have any Spectre-style vulnerabilities. This is complex enough and has enough potential problems we may run into that we may not finish by the end of this grant, and may need additional grants afterward to finish, hence why the goal is making substantial progress instead of having the proof 100% complete and working. + +To address this general category of CPU flaws, I (Jacob Lifshay) invented a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks: + +1. Assume that to leak data, you must be able to observe some change in the digital signals going from/to the CPU core that depends on the results (defined to include any side-effects that change any state, such as cache state changes or branch predictor changes) of some transient instructions (instructions that were speculatively executed and then canceled). +2. Take the concrete cycle-accurate CPU design, and construct a theoretical equivalent that has the additional feature of looking into the future to determine which instructions will be canceled and preemptively forces their results to be zeros (or any other value that doesn't depend on transient instructions' data; since we're doing math here, we don't have to limit ourselves to physically possible designs that respect time flow). The reasoning is that if all transient instructions' results are always zeros, they can't possibly be leaking any useful data, therefore the theoretical equivalent CPU design doesn't have any speculative-execution data leaks. +3. Formally prove that all digital signals from/to the concrete CPU core always exactly match the timing and data in all respects of the theoretical equivalent CPU design. This proves that the concrete CPU core can't have any speculative-execution data leaks. + +It is expected that automated formal proof software (stuff like SMT solvers such as Z3) will probably be too slow to be usable for the formal correctness proof, therefore the plan is to use Fayalite to generate a translation of the CPU design to Coq or some other similar language for writing formal proofs. Fayalite is a HDL software library that we have been developing since around April 2024 that targets the FIRRTL intermediate language (used by the Chisel ecosystem for generating Verilog; firtool, the FIRRTL to Verilog translator, is part of LLVM Circt, so is well maintained). + +## Ecosystem + +TODO From 3c95a139950b5ccafdc3531eb8ffd35a27d4d474 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Thu, 28 Nov 2024 19:50:21 -0800 Subject: [PATCH 02/29] add TODO european dimension --- src/grants/nlnet-first.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index 4c7fd51..f4003c1 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -60,3 +60,5 @@ It is expected that automated formal proof software (stuff like SMT solvers such ## Ecosystem TODO + +TODO European dimension? benefits people everywhere by having CPUs with better security, including Europeans. From 67d6d929e41246ee7fc3da24f3a0460c18596b6b Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Thu, 28 Nov 2024 21:09:23 -0800 Subject: [PATCH 03/29] fix subheader sizes --- src/grants/nlnet-first.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index f4003c1..e76cf45 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -30,13 +30,13 @@ TODO, come up with estimated budgets. ## Compare with existing/historical efforts -# UPEC, as described in "An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors" +### UPEC, as described in "An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors" According to , UPEC is limited in that it only works on a specific conservative mechanism (preventing speculative load instructions from executing untill all prior branches are resolved). Jacob Lifshay's idea, by contrast, should work even if the CPU runs speculative loads before resolving branches, with much less stringient conditions on those loads. -# "RTL Verification for Secure Speculation Using Contract Shadow Logic" +### "RTL Verification for Secure Speculation Using Contract Shadow Logic" From 007421bcd8a27b882ff2a202f983d7861ef7c05f Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 29 Nov 2024 14:43:15 -0800 Subject: [PATCH 04/29] fill out most sections of NLNet grant proposal --- src/grants/nlnet-first.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index e76cf45..b929dc2 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -6,7 +6,7 @@ Website/Wiki: # Abstract -Modern CPUs suffer from a [constant stream of new speculative-execution security flaws](https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability#Timeline) ([Spectre](https://meltdownattack.com/)-style bugs), additionally software has to constantly add more convoluted and slow mitigations to properly inforce security boundaries, such as a WebAssembly VM needing to add code to prevent leaking data from outside the VM to whatever programs are running inside the VM, or a web browser needing to stop JavaScript code from seeing memory that it should not have access to. +Modern CPUs suffer from a [constant stream of new speculative-execution security flaws](https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability#Timeline) ([Spectre](https://meltdownattack.com/)-style bugs), additionally software has to constantly add more convoluted and slow mitigations to properly enforce security boundaries, such as a WebAssembly VM needing to add code to prevent leaking data from outside the VM to whatever programs are running inside the VM, or a web browser needing to stop JavaScript code from seeing memory that it should not have access to. To address this major category of CPU flaws, Jacob Lifshay invented a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks -- this should work for nearly any kind of CPU design. @@ -14,7 +14,7 @@ We are working towards building a high-performance superscalar CPU with speculat ## Relevant Previous Involvement -* Jacob Lifshay -- TODO +* Jacob Lifshay -- Worked on designing PowerISA CPUs with Libre-SOC for 5yr, built a simple OoO Superscalar CPU simulator , built a RV32I CPU with VGA output in a few weeks that runs a 2.5D maze game . Also is the main author of the [Fayalite](https://git.libre-chip.org/libre-chip/fayalite) HDL library * Others... TODO @@ -34,19 +34,19 @@ TODO, come up with estimated budgets. -According to , UPEC is limited in that it only works on a specific conservative mechanism (preventing speculative load instructions from executing untill all prior branches are resolved). Jacob Lifshay's idea, by contrast, should work even if the CPU runs speculative loads before resolving branches, with much less stringient conditions on those loads. +According to , UPEC is limited in that it only works on a specific conservative mechanism (preventing speculative load instructions from executing until all prior branches are resolved). by contrast our formal proof method should work even if the CPU runs speculative loads before resolving branches, with much less stringent conditions on those loads. ### "RTL Verification for Secure Speculation Using Contract Shadow Logic" -TODO +Contract Shadow Logic works by assuming that memory is partitioned into secret and non-secret regions, by contrast our formal proof method is based on verifying that no data processed by transient instructions (instructions that were speculatively executed and then canceled) can affect any digital signals that leave the processor. ## Technical challenges we expect to solve during the project We expect to solve 2 technical challenges: -* To create a working OoO Superscalar CPU with Speculative Execution. +* To create a working OoO Superscalar CPU with Speculative Execution, trying to design it such that it doesn't have any Spectre-style vulnerabilities (mostly by doing a better job of tracking any state whatsoever that has speculative modifications). * To make substantial progress on creating a formal proof that the CPU doesn't have any Spectre-style vulnerabilities. This is complex enough and has enough potential problems we may run into that we may not finish by the end of this grant, and may need additional grants afterward to finish, hence why the goal is making substantial progress instead of having the proof 100% complete and working. To address this general category of CPU flaws, I (Jacob Lifshay) invented a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks: @@ -55,10 +55,10 @@ To address this general category of CPU flaws, I (Jacob Lifshay) invented a way 2. Take the concrete cycle-accurate CPU design, and construct a theoretical equivalent that has the additional feature of looking into the future to determine which instructions will be canceled and preemptively forces their results to be zeros (or any other value that doesn't depend on transient instructions' data; since we're doing math here, we don't have to limit ourselves to physically possible designs that respect time flow). The reasoning is that if all transient instructions' results are always zeros, they can't possibly be leaking any useful data, therefore the theoretical equivalent CPU design doesn't have any speculative-execution data leaks. 3. Formally prove that all digital signals from/to the concrete CPU core always exactly match the timing and data in all respects of the theoretical equivalent CPU design. This proves that the concrete CPU core can't have any speculative-execution data leaks. -It is expected that automated formal proof software (stuff like SMT solvers such as Z3) will probably be too slow to be usable for the formal correctness proof, therefore the plan is to use Fayalite to generate a translation of the CPU design to Coq or some other similar language for writing formal proofs. Fayalite is a HDL software library that we have been developing since around April 2024 that targets the FIRRTL intermediate language (used by the Chisel ecosystem for generating Verilog; firtool, the FIRRTL to Verilog translator, is part of LLVM Circt, so is well maintained). +It is expected that automated formal proof software (stuff like SMT solvers such as Z3) will probably be too slow to be usable for the formal correctness proof, therefore the plan is to use Fayalite to generate a translation of the CPU design to Coq or some other similar language for writing formal proofs. [Fayalite](https://git.libre-chip.org/libre-chip/fayalite) is a HDL library written in Rust that we have been developing since around April 2024 that targets the [FIRRTL](https://github.com/chipsalliance/firrtl-spec) intermediate language (used by the [Chisel](https://www.chisel-lang.org/) ecosystem for generating Verilog; firtool, the FIRRTL to Verilog translator, is part of [LLVM Circt](https://circt.llvm.org/), so is well maintained): ## Ecosystem -TODO +We plan on working with the OpenPower Foundation for any custom ISA extensions we may need. We are already working with the FIRRTL specification GitHub repo to resolve problems we encounter, as well as with LLVM Circt, and with the Rust Language. -TODO European dimension? benefits people everywhere by having CPUs with better security, including Europeans. +This project benefits Europeans (as well as everyone else) by providing a CPU with good performance that doesn't suffer from Spectre-style flaws, as well as working on providing a method of formally proving if any particular CPU design doesn't suffer from Spectre-style flaws, hopefully encouraging CPU designers everywhere to formally prove their CPU designs to not have Spectre-style flaws instead of the reactionary fixes that they have been using that only covers known Spectre-style flaws and has a constant stream of failures. From c60c00243ae145925e5f9236c82dea9ecf06f234 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 29 Nov 2024 14:48:38 -0800 Subject: [PATCH 05/29] add link to WIP high-level CPU design --- src/grants/nlnet-first.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index b929dc2..780b505 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -55,7 +55,9 @@ To address this general category of CPU flaws, I (Jacob Lifshay) invented a way 2. Take the concrete cycle-accurate CPU design, and construct a theoretical equivalent that has the additional feature of looking into the future to determine which instructions will be canceled and preemptively forces their results to be zeros (or any other value that doesn't depend on transient instructions' data; since we're doing math here, we don't have to limit ourselves to physically possible designs that respect time flow). The reasoning is that if all transient instructions' results are always zeros, they can't possibly be leaking any useful data, therefore the theoretical equivalent CPU design doesn't have any speculative-execution data leaks. 3. Formally prove that all digital signals from/to the concrete CPU core always exactly match the timing and data in all respects of the theoretical equivalent CPU design. This proves that the concrete CPU core can't have any speculative-execution data leaks. -It is expected that automated formal proof software (stuff like SMT solvers such as Z3) will probably be too slow to be usable for the formal correctness proof, therefore the plan is to use Fayalite to generate a translation of the CPU design to Coq or some other similar language for writing formal proofs. [Fayalite](https://git.libre-chip.org/libre-chip/fayalite) is a HDL library written in Rust that we have been developing since around April 2024 that targets the [FIRRTL](https://github.com/chipsalliance/firrtl-spec) intermediate language (used by the [Chisel](https://www.chisel-lang.org/) ecosystem for generating Verilog; firtool, the FIRRTL to Verilog translator, is part of [LLVM Circt](https://circt.llvm.org/), so is well maintained): +It is expected that automated formal proof software (stuff like SMT solvers such as Z3) will probably be too slow to be usable for the formal correctness proof, therefore the plan is to use Fayalite to generate a translation of the CPU design to Coq or some other similar language for writing formal proofs. [Fayalite](https://git.libre-chip.org/libre-chip/fayalite) is a HDL library written in Rust that we have been developing since around April 2024 that targets the [FIRRTL](https://github.com/chipsalliance/firrtl-spec) intermediate language (used by the [Chisel](https://www.chisel-lang.org/) ecosystem for generating Verilog; firtool, the FIRRTL to Verilog translator, is part of [LLVM Circt](https://circt.llvm.org/), so is well maintained). + +A WIP high-level design of our CPU: ## Ecosystem From 8fbabfef40f0b46bfc810910b465bdb4c4d2a961 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 29 Nov 2024 15:34:34 -0800 Subject: [PATCH 06/29] add estimated budget --- src/grants/nlnet-first.md | 42 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index 780b505..8a39933 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -26,7 +26,45 @@ Requested Amount €50000 Nearly all of the cost is for labor, with some amount set aside for general project management (e.g. tracking funding, server maintenance, etc.), and for hardware costs such as buying bigger FPGAs so the CPU design will fit. -TODO, come up with estimated budgets. +We're aiming for a rate of somewhere around $69305.60/yr per person which is the minimum salary for small employers for 2025 in Washington State USA (where Jacob Lifshay lives). + +Estimated Budget: + +### General Project Management + +€2000 + +For tracking funding, server maintenance, and similar. + +### Hardware Costs + +€3000 + +For buying bigger FPGAs, CI server hardware improvements, Oscilloscopes (if necessary), and similar. + +### Improvements to Fayalite + +€10000 + +For adding the code for translation to Coq, adding tooling for generating FPGA bitstreams, as well as other improvements to Fayalite as necessary. + +### Building enough of the CPU to execute multiple instructions + +€10000 + +This covers getting register renaming and ALU and/or branch instructions working, as well as other portions of the CPU design as necessary. This may or may not include actually fetching those instructions from memory, instead the instructions may be supplied by a testing harness. + +### Implementing Memory Subsystem + +€15000 + +This covers the rest needed for the CPU to be able to fetch instructions from memory and execute them as well as implementing Load/Store instructions. + +### Work towards the Formal Proof of No Spectre bugs + +€10000 + +This covers working on the Formal Proof of No Spectre bugs as well as improvements to other software/hardware needed for that. ## Compare with existing/historical efforts @@ -46,7 +84,7 @@ Contract Shadow Logic works by assuming that memory is partitioned into secret a We expect to solve 2 technical challenges: -* To create a working OoO Superscalar CPU with Speculative Execution, trying to design it such that it doesn't have any Spectre-style vulnerabilities (mostly by doing a better job of tracking any state whatsoever that has speculative modifications). +* To create a working OoO Superscalar CPU with Speculative Execution, trying to design it such that it doesn't have any Spectre-style vulnerabilities (mostly by doing a better job of tracking any state whatsoever that has speculative modifications). The CPU probably will not yet support some of the features required for high performance (e.g. a complex cache hierarchy) and/or supporting running an operating system and/or floating-point. The plan is to add those features later if they are not completed as part of this grant. * To make substantial progress on creating a formal proof that the CPU doesn't have any Spectre-style vulnerabilities. This is complex enough and has enough potential problems we may run into that we may not finish by the end of this grant, and may need additional grants afterward to finish, hence why the goal is making substantial progress instead of having the proof 100% complete and working. To address this general category of CPU flaws, I (Jacob Lifshay) invented a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks: From 906907689edf32aade49c4f11e2e41c3e25e8480 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 29 Nov 2024 15:36:00 -0800 Subject: [PATCH 07/29] minor wording adjustment --- src/grants/nlnet-first.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index 8a39933..9fa8ede 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -26,7 +26,7 @@ Requested Amount €50000 Nearly all of the cost is for labor, with some amount set aside for general project management (e.g. tracking funding, server maintenance, etc.), and for hardware costs such as buying bigger FPGAs so the CPU design will fit. -We're aiming for a rate of somewhere around $69305.60/yr per person which is the minimum salary for small employers for 2025 in Washington State USA (where Jacob Lifshay lives). +We're aiming for a rate of somewhere around $69305.60/yr per person which is the minimum salary for small employers for 2025 in Washington State USA (where Jacob Lifshay lives at the time). Estimated Budget: From 4af662dea89b38bd1af1316cf653897d098f52d3 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 29 Nov 2024 16:03:46 -0800 Subject: [PATCH 08/29] wording fixes --- src/grants/nlnet-first.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index 9fa8ede..6a94b9f 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -6,9 +6,9 @@ Website/Wiki: # Abstract -Modern CPUs suffer from a [constant stream of new speculative-execution security flaws](https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability#Timeline) ([Spectre](https://meltdownattack.com/)-style bugs), additionally software has to constantly add more convoluted and slow mitigations to properly enforce security boundaries, such as a WebAssembly VM needing to add code to prevent leaking data from outside the VM to whatever programs are running inside the VM, or a web browser needing to stop JavaScript code from seeing memory that it should not have access to. +Modern CPUs suffer from a [constant stream of new speculative-execution security flaws](https://en.wikipedia.org/wiki/Transient_execution_CPU_vulnerability#Timeline) ([Spectre](https://meltdownattack.com/)-style bugs). Because of that, software has to constantly add more convoluted and slow mitigations to properly enforce security boundaries, such as a WebAssembly VM needing to add code to prevent leaking data from outside the VM to whatever programs are running inside the VM, or a web browser needing to stop JavaScript code from seeing memory that it should not have access to. -To address this major category of CPU flaws, Jacob Lifshay invented a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks -- this should work for nearly any kind of CPU design. +To address this major category of CPU flaws, Jacob Lifshay formulated a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks -- this should work for nearly any kind of CPU design. We are working towards building a high-performance superscalar CPU with speculative execution and working on proving that it doesn't suffer from any speculative-execution data leaks, thereby demonstrating that this major category of CPU flaws can be fixed once and for all without crippling CPU performance. @@ -87,13 +87,13 @@ We expect to solve 2 technical challenges: * To create a working OoO Superscalar CPU with Speculative Execution, trying to design it such that it doesn't have any Spectre-style vulnerabilities (mostly by doing a better job of tracking any state whatsoever that has speculative modifications). The CPU probably will not yet support some of the features required for high performance (e.g. a complex cache hierarchy) and/or supporting running an operating system and/or floating-point. The plan is to add those features later if they are not completed as part of this grant. * To make substantial progress on creating a formal proof that the CPU doesn't have any Spectre-style vulnerabilities. This is complex enough and has enough potential problems we may run into that we may not finish by the end of this grant, and may need additional grants afterward to finish, hence why the goal is making substantial progress instead of having the proof 100% complete and working. -To address this general category of CPU flaws, I (Jacob Lifshay) invented a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks: +To address this general category of CPU flaws, Jacob Lifshay formulated a way to formally prove that a CPU with speculative execution doesn't suffer from any speculative-execution data leaks: 1. Assume that to leak data, you must be able to observe some change in the digital signals going from/to the CPU core that depends on the results (defined to include any side-effects that change any state, such as cache state changes or branch predictor changes) of some transient instructions (instructions that were speculatively executed and then canceled). 2. Take the concrete cycle-accurate CPU design, and construct a theoretical equivalent that has the additional feature of looking into the future to determine which instructions will be canceled and preemptively forces their results to be zeros (or any other value that doesn't depend on transient instructions' data; since we're doing math here, we don't have to limit ourselves to physically possible designs that respect time flow). The reasoning is that if all transient instructions' results are always zeros, they can't possibly be leaking any useful data, therefore the theoretical equivalent CPU design doesn't have any speculative-execution data leaks. 3. Formally prove that all digital signals from/to the concrete CPU core always exactly match the timing and data in all respects of the theoretical equivalent CPU design. This proves that the concrete CPU core can't have any speculative-execution data leaks. -It is expected that automated formal proof software (stuff like SMT solvers such as Z3) will probably be too slow to be usable for the formal correctness proof, therefore the plan is to use Fayalite to generate a translation of the CPU design to Coq or some other similar language for writing formal proofs. [Fayalite](https://git.libre-chip.org/libre-chip/fayalite) is a HDL library written in Rust that we have been developing since around April 2024 that targets the [FIRRTL](https://github.com/chipsalliance/firrtl-spec) intermediate language (used by the [Chisel](https://www.chisel-lang.org/) ecosystem for generating Verilog; firtool, the FIRRTL to Verilog translator, is part of [LLVM Circt](https://circt.llvm.org/), so is well maintained). +It is expected that automated formal proof software (like SMT solvers such as Z3) will probably be too slow to be usable for the formal correctness proof, therefore the plan is to use Fayalite to generate a translation of the CPU design to Coq or some other similar language for writing formal proofs. [Fayalite](https://git.libre-chip.org/libre-chip/fayalite) is a HDL library written in Rust that we have been developing since around April 2024 that targets the [FIRRTL](https://github.com/chipsalliance/firrtl-spec) intermediate language (used by the [Chisel](https://www.chisel-lang.org/) ecosystem for generating Verilog; firtool, the FIRRTL to Verilog translator, is part of [LLVM Circt](https://circt.llvm.org/), so is well maintained). A WIP high-level design of our CPU: From 80c7463550aa660f95598563dc151de10ee0fb1d Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Sat, 30 Nov 2024 16:23:03 -0800 Subject: [PATCH 09/29] add Cesar as requested in https://forum.libre-chip.org/t/starting-working-on-a-nlnet-grant-proposal/29/4 --- src/grants/nlnet-first.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index 6a94b9f..cacfb59 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -16,6 +16,11 @@ We are working towards building a high-performance superscalar CPU with speculat * Jacob Lifshay -- Worked on designing PowerISA CPUs with Libre-SOC for 5yr, built a simple OoO Superscalar CPU simulator , built a RV32I CPU with VGA output in a few weeks that runs a 2.5D maze game . Also is the main author of the [Fayalite](https://git.libre-chip.org/libre-chip/fayalite) HDL library +* Cesar Strauss - Contributed to the Libre-SOC project, mostly on digital +design and formal verification. Presented the talk "An introduction to +Formal Verification of Digital Circuits" on FOSDEM 2024 +(). + * Others... TODO # Requested support From 364340fcaa64ca91a9fb5a4befcaba847fc08e7a Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Sat, 30 Nov 2024 20:37:02 -0800 Subject: [PATCH 10/29] submitted to NLNet, add NLNet's grant proposal ID --- src/SUMMARY.md | 2 +- src/grants/nlnet-first.md | 4 +--- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 8f2ce4d..60b3b40 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -2,7 +2,7 @@ * [Proposal for Libre-Chip's First CPU Architecture](first_arch/index.md) * [Register Renaming](first_arch/register_renaming.md) * [Grants](grants/index.md) - * [First NLNet Grant Proposal](grants/nlnet-first.md) + * [First NLNet Grant Proposal -- 2024-12-324](grants/nlnet-first.md) * [Conduct](Conduct.md) * [License](LICENSE.md) * [GPL 3.0](gpl-3.0.md) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index cacfb59..556604a 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -1,4 +1,4 @@ -# First NLNet Grant Proposal -- First CPU Architecture And Formal Proof of No Spectre bugs +# First NLNet Grant Proposal -- First CPU Architecture And Formal Proof of No Spectre bugs -- 2024-12-324 Project Name: Libre-Chip's First CPU Architecture And Formal Proof of No Spectre bugs @@ -21,8 +21,6 @@ design and formal verification. Presented the talk "An introduction to Formal Verification of Digital Circuits" on FOSDEM 2024 (). -* Others... TODO - # Requested support Requested Amount €50000 From f62d5b01ea73a81dbe988a00267b2331dfe30243 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Wed, 15 Jan 2025 20:33:54 -0800 Subject: [PATCH 11/29] add link to https://forum.libre-chip.org/t/idea-for-maybe-allowing-much-wider-fetch-rename-etc-widths-in-a-cpu/33 --- src/first_arch/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/src/first_arch/index.md b/src/first_arch/index.md index 3501a70..9ca3f99 100644 --- a/src/first_arch/index.md +++ b/src/first_arch/index.md @@ -11,6 +11,7 @@ Possible follow-on work, order TBD: * Expand CPU to support more operations (e.g. floating-point, vector ops., paging) * Add FPGA-style programmable decoder and μOp Cache, with trap-to-software fallback for uncommon/complex ops, which allows running more ISAs such as RISC-V, older x86-32/64, more PowerISA support, and then if we can figure out the legal situation, ARM and modern x86-64. The idea is QEMU (or similar) will be compiled with a special compiler that generates both the fallback software and the bitstream for the decoder, the compiled output will be covered by QEMU's license which has some patent clauses, which hopefully helps with the legal situation. Jacob Lifshay came up with this idea [back in 2022](https://web.archive.org/web/20220612082603/https://bugs.libre-soc.org/show_bug.cgi?id=841) * Formal proof that the CPU doesn't have any spectre-style bugs even though it still is OoO superscalar with speculative execution. Jacob Lifshay came up with this idea [back in 2020](https://web.archive.org/web/20201021124234/https://bugs.libre-soc.org/show_bug.cgi?id=209) +* [Idea for much wider fetch/decode/rename width based on parallel-prefix-sum and taking more than 1 cycle per batch of renamed instructions](https://forum.libre-chip.org/t/idea-for-maybe-allowing-much-wider-fetch-rename-etc-widths-in-a-cpu/33) ## Register Renaming & Internal Registers From 9407bdf273f3dfe14a7d10d3f11f4c21fb746973 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Tue, 24 Jun 2025 15:49:35 -0700 Subject: [PATCH 12/29] switch chat links to zulip since it seems as though we don't control who has mod privileges on irc, probably because we all have matrix accounts and matrix can't automatically authenticate with oftc yet. Cesar privately gave his permission in the meeting today. --- src/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/README.md b/src/README.md index 98e23ea..4f92a06 100644 --- a/src/README.md +++ b/src/README.md @@ -7,7 +7,7 @@ More content is coming soon! * Sign up for Announcements by joining us on [our Forum](https://forum.libre-chip.org/) * Join our discussions on [our Forum][forum-general] and enable notifications for [General][forum-general] (click the bell icon). * Join us on [our Git instance](https://git.libre-chip.org/) -* Chat with us at [#libre-chip:matrix.org](https://matrix.to/#/#libre-chip:matrix.org) or at #libre-chip on [OFTC IRC](https://www.oftc.net/) +* Chat with us on [Zulip](https://libre-chip.zulipchat.com/#narrow/channel/486103-general) * If you're interested in supporting us, sign up for announcements and we'll announce when we're ready! * We have a [Code of Conduct](Conduct.md) From b87072562e572c5a0d9174e28286f9d0038de6df Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Tue, 24 Jun 2025 15:56:03 -0700 Subject: [PATCH 13/29] use libre-chip mirror for CI actions --- .forgejo/workflows/deploy.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.forgejo/workflows/deploy.yml b/.forgejo/workflows/deploy.yml index c61df54..ded530e 100644 --- a/.forgejo/workflows/deploy.yml +++ b/.forgejo/workflows/deploy.yml @@ -10,7 +10,7 @@ jobs: permissions: contents: write steps: - - uses: https://code.forgejo.org/actions/checkout@v3 + - uses: https://git.libre-chip.org/mirrors/checkout@v3 with: fetch-depth: 0 - name: Install dependencies From 1267a00d7f6ff30ca6b4e006e9bee880a529cab2 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Wed, 13 Aug 2025 22:25:16 -0700 Subject: [PATCH 14/29] note grant was approved by NLnet --- src/grants/nlnet-first.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/grants/nlnet-first.md b/src/grants/nlnet-first.md index 556604a..404f0e6 100644 --- a/src/grants/nlnet-first.md +++ b/src/grants/nlnet-first.md @@ -1,5 +1,7 @@ # First NLNet Grant Proposal -- First CPU Architecture And Formal Proof of No Spectre bugs -- 2024-12-324 +(Edit: [approved by NLnet](https://nlnet.nl/project/Libre-Chip-proof/)) + Project Name: Libre-Chip's First CPU Architecture And Formal Proof of No Spectre bugs Website/Wiki: From 67f6c5c6f169cbd175258be557ea91c11f5ac0db Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Wed, 13 Aug 2025 22:25:39 -0700 Subject: [PATCH 15/29] group tasks for first_arch based off what is actually in the grant --- src/first_arch/index.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/first_arch/index.md b/src/first_arch/index.md index 9ca3f99..d67ac0b 100644 --- a/src/first_arch/index.md +++ b/src/first_arch/index.md @@ -4,13 +4,14 @@ The CPU architecture will be developed in several stages: -1. Getting an initial working CPU -- First NLNet/etc. Grant +1. [First NLNet Grant Proposal -- First CPU Architecture And Formal Proof of No Spectre bugs -- 2024-12-324](../first_arch/index.md) + 1. Getting an initial working CPU + 2. Formal proof that the CPU doesn't have any spectre-style bugs even though it still is OoO superscalar with speculative execution. Jacob Lifshay came up with this idea [back in 2020](https://web.archive.org/web/20201021124234/https://bugs.libre-soc.org/show_bug.cgi?id=209) Possible follow-on work, order TBD: * Expand CPU to support more operations (e.g. floating-point, vector ops., paging) * Add FPGA-style programmable decoder and μOp Cache, with trap-to-software fallback for uncommon/complex ops, which allows running more ISAs such as RISC-V, older x86-32/64, more PowerISA support, and then if we can figure out the legal situation, ARM and modern x86-64. The idea is QEMU (or similar) will be compiled with a special compiler that generates both the fallback software and the bitstream for the decoder, the compiled output will be covered by QEMU's license which has some patent clauses, which hopefully helps with the legal situation. Jacob Lifshay came up with this idea [back in 2022](https://web.archive.org/web/20220612082603/https://bugs.libre-soc.org/show_bug.cgi?id=841) -* Formal proof that the CPU doesn't have any spectre-style bugs even though it still is OoO superscalar with speculative execution. Jacob Lifshay came up with this idea [back in 2020](https://web.archive.org/web/20201021124234/https://bugs.libre-soc.org/show_bug.cgi?id=209) * [Idea for much wider fetch/decode/rename width based on parallel-prefix-sum and taking more than 1 cycle per batch of renamed instructions](https://forum.libre-chip.org/t/idea-for-maybe-allowing-much-wider-fetch-rename-etc-widths-in-a-cpu/33) ## Register Renaming & Internal Registers From 1ee95335c0cb5ade054f9ae544a6fc08879e36e1 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Thu, 9 Oct 2025 23:33:07 -0700 Subject: [PATCH 16/29] switch to use server's new actions repo --- .forgejo/workflows/deploy.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.forgejo/workflows/deploy.yml b/.forgejo/workflows/deploy.yml index ded530e..1037826 100644 --- a/.forgejo/workflows/deploy.yml +++ b/.forgejo/workflows/deploy.yml @@ -10,7 +10,7 @@ jobs: permissions: contents: write steps: - - uses: https://git.libre-chip.org/mirrors/checkout@v3 + - uses: actions/checkout@v3 with: fetch-depth: 0 - name: Install dependencies From ec7930739a8aa261aefdb12da0883a0e10eca705 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Tue, 4 Nov 2025 02:16:33 -0800 Subject: [PATCH 17/29] fix broken link to first nlnet grant --- src/first_arch/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/first_arch/index.md b/src/first_arch/index.md index d67ac0b..8dbf9f8 100644 --- a/src/first_arch/index.md +++ b/src/first_arch/index.md @@ -4,7 +4,7 @@ The CPU architecture will be developed in several stages: -1. [First NLNet Grant Proposal -- First CPU Architecture And Formal Proof of No Spectre bugs -- 2024-12-324](../first_arch/index.md) +1. [First NLNet Grant Proposal -- First CPU Architecture And Formal Proof of No Spectre bugs -- 2024-12-324](../grants/nlnet-first.md) 1. Getting an initial working CPU 2. Formal proof that the CPU doesn't have any spectre-style bugs even though it still is OoO superscalar with speculative execution. Jacob Lifshay came up with this idea [back in 2020](https://web.archive.org/web/20201021124234/https://bugs.libre-soc.org/show_bug.cgi?id=209) From 1a6f8b5714c3bfa63f2b751700a4400b2e8d1115 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 28 Nov 2025 14:13:38 -0800 Subject: [PATCH 18/29] add WIP grant proposal: Libre-Chip's CPU with a Programmable Decoder to Run Multiple ISAs at Full Speed --- src/grants/cpu_with_programmable_decoder.md | 68 +++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 src/grants/cpu_with_programmable_decoder.md diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md new file mode 100644 index 0000000..040a25f --- /dev/null +++ b/src/grants/cpu_with_programmable_decoder.md @@ -0,0 +1,68 @@ +# NLNet Grant Proposal -- Libre-Chip's CPU with a Programmable Decoder to Run Multiple ISAs at Full Speed + +Project Name: Libre-Chip's CPU with a Programmable Decoder to Run Multiple ISAs at Full Speed + +Website/Wiki: + +# Abstract + +Modern computers are built on several different mutually-incompatible popular ISAs such as x86_64, PowerISA, AArch64, and RISC-V. Many of the most popular ISAs have no high-speed libre/open-source implementations, which makes them much harder to trust since you can't inspect their source-code to look for bugs or secret backdoors. Additionally, there are basically no existing modern CPUs which can run more than one of those ISAs without requiring software emulation, which is slow and can often be buggy. + +To solve those issues, we are building a libre-licensed CPU with speculative out-of-order superscalar execution that will support a programmable decoder (loosely inspired by FPGAs) followed by a µOp cache so the CPU can be programmed to decode and run just about any ISA you like at full speed, by handling the most common instructions entirely in hardware, with software fallback for decoding some of the less common instructions which can still easily be executed in hardware (storing the decoded instructions in the µOp cache), and full software emulation for the remaining instructions. + +We are planning on building a custom compiler so the user can select an ISA and that compiler can compile the source code of QEMU to automatically generate the required bitstream for the FPGA, as well as generating the software required for decoding and/or emulating the remaining parts of the chosen ISA. + +Our CPU is a continuation of the CPU from , which we will also be extending to support more features such as memory paging, floating-point instructions, and better compatibility with the PowerISA specification. + +## Relevant Previous Involvement + +* Jacob Lifshay -- Currently working on . Worked on designing PowerISA CPUs with Libre-SOC for 5yr, built a simple OoO Superscalar CPU simulator , built a RV32I CPU with VGA output in a few weeks that runs a 2.5D maze game . Also is the main author of the [Fayalite](https://git.libre-chip.org/libre-chip/fayalite) HDL library. Has some experience writing compilers, e.g. Fayalite's simulator is implemented as a simple compiler to an internal p-code, also wrote a compiler from a language based on QuickBASIC to x86 (it successfully compiled a program he also wrote that animates a 3D diagram of a molecule), also wrote a JavaScript interpreter that properly handles running generators and is a compiler to internal p-code and has some optimization passes for specialization based on deduced types. Has experience writing and using LLVM IR, as well as being a contributor to LLVM and Rust. + +* Cesar Strauss - Currently working on . Contributed to the Libre-SOC project, mostly on digital +design and formal verification. Presented the talk "An introduction to +Formal Verification of Digital Circuits" on FOSDEM 2024 +(). + +* TODO: does Tobias Platen want to participate? + +# Requested support + +Requested Amount €100000 + +## Cost Explanation + +Libre-Chip is currently funded by NLnet for , which we expect to be mostly completed by the time we can start working on this grant. + +We are requesting more than €50000 based on Jacob Lifshay having previously completed a grant as part of Libre-SoC. If that is not sufficient, we can adjust our budget down by removing some tasks, and possibly by not implementing as much of the custom compiler, leaving some of it as future work. + +All of the cost is for labor. + +We're aiming for a rate of $69305.60/yr per person which is rate used in our grant application for + +Estimated Budget: + +* € 40000 Adding missing features to our CPU, such as memory paging, floating-point instructions, a better cache hierarchy, and better compatibility with the PowerISA specification. +* € 20000 Add the programmable decoder and µOp cache to our CPU design. +* € 20000 Build a compiler that can extract the decoder portion of QEMU using pattern matching and some symbolic execution of LLVM IR, converting to a HDL IR more suitable for hardware. +* € 15000 Write code to convert the HDL IR to a bitstream we can program into the decoder. +* € 10000 Get the fallback emulator to work, as well as misc. other parts of the compiler needed to make the whole system work together. + +## Compare with existing/historical efforts + +The [Transmeta Crusoe](https://en.wikipedia.org/wiki/Transmeta_Crusoe) is somewhat similar in that it implemented x86 by translating to an internal VLIW instruction set by using software JIT compilation, however our grant proposal differs in that we have hardware to handle the most common instructions instead of relying on software for everything, also we aim for compatibility with much more than just x86, unlike the Transmeta Crusoe. + +## Technical challenges we expect to solve during the project + +We expect to solve 3 technical challenges: + +* To design and write a working programmable decoder and µOp cache such that our CPU can run arbitrary ISAs. For now we're planning on it being able to support older x86_64 (where the patents have expired), PowerISA, and RISC-V, though we don't necessarily expect to have complete support for all of those within the scope of this grant. +* To build a custom compiler that can successfully extract a decoder for the most common instructions of the user's selected ISA, as well as generate a bitstream to program our decoder. +* To continue work on getting our CPU to be more complete and able to run more complex software. + +A WIP high-level design of our CPU: + +## Ecosystem + +We are likely to work with QEMU upstream, as well as LLVM and Clang. We are already working with the FIRRTL specification GitHub repo to resolve problems we encounter, as well as with LLVM Circt, and with the Rust Language. + +This project benefits Europeans (as well as everyone else) by providing a libre/open-source CPU with good performance that supports many of the most-popular ISAs all on the same CPU. From ea3967903428a0feddbfe696f80747feede6b29a Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 28 Nov 2025 14:47:26 -0800 Subject: [PATCH 19/29] update wording --- src/grants/cpu_with_programmable_decoder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index 040a25f..868bd1b 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -8,7 +8,7 @@ Website/Wiki: Modern computers are built on several different mutually-incompatible popular ISAs such as x86_64, PowerISA, AArch64, and RISC-V. Many of the most popular ISAs have no high-speed libre/open-source implementations, which makes them much harder to trust since you can't inspect their source-code to look for bugs or secret backdoors. Additionally, there are basically no existing modern CPUs which can run more than one of those ISAs without requiring software emulation, which is slow and can often be buggy. -To solve those issues, we are building a libre-licensed CPU with speculative out-of-order superscalar execution that will support a programmable decoder (loosely inspired by FPGAs) followed by a µOp cache so the CPU can be programmed to decode and run just about any ISA you like at full speed, by handling the most common instructions entirely in hardware, with software fallback for decoding some of the less common instructions which can still easily be executed in hardware (storing the decoded instructions in the µOp cache), and full software emulation for the remaining instructions. +To solve those issues, we are building a libre-licensed CPU with speculative out-of-order superscalar execution that will support a programmable decoder (loosely inspired by FPGAs) followed by a µOp cache so the CPU can be programmed to decode and run at full speed just about any ISA you select, by handling the most common instructions entirely in hardware, with software fallback for decoding some of the less common instructions which can still easily be executed in hardware (storing the decoded instructions in the µOp cache), and full software emulation for the remaining instructions. Additionally you can easily switch between selected ISAs by reprogramming the decoder, allowing you to e.g. run an x86_64 program alongside a RISC-V program in the same OS and quickly context-switch between them. We are planning on building a custom compiler so the user can select an ISA and that compiler can compile the source code of QEMU to automatically generate the required bitstream for the FPGA, as well as generating the software required for decoding and/or emulating the remaining parts of the chosen ISA. From 7331330d68f0619151d550d0d255aa24da224943 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 28 Nov 2025 15:02:14 -0800 Subject: [PATCH 20/29] more wording adjustments --- src/grants/cpu_with_programmable_decoder.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index 868bd1b..4c0a339 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -33,9 +33,7 @@ Requested Amount €100000 Libre-Chip is currently funded by NLnet for , which we expect to be mostly completed by the time we can start working on this grant. -We are requesting more than €50000 based on Jacob Lifshay having previously completed a grant as part of Libre-SoC. If that is not sufficient, we can adjust our budget down by removing some tasks, and possibly by not implementing as much of the custom compiler, leaving some of it as future work. - -All of the cost is for labor. +We are requesting more than €50000 based on Jacob Lifshay having previously completed a grant as part of Libre-SoC. If that is not allowed, we can adjust our budget down by removing some tasks, and possibly by not implementing as much of the custom compiler, leaving some of it as future work. We're aiming for a rate of $69305.60/yr per person which is rate used in our grant application for From 0b279f1f0fc627946895ee84d45a477080296228 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 28 Nov 2025 15:03:20 -0800 Subject: [PATCH 21/29] wording --- src/grants/cpu_with_programmable_decoder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index 4c0a339..32e1add 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -35,7 +35,7 @@ Libre-Chip is currently funded by NLnet for +We're aiming for a FTE rate of $69305.60/yr per person which is rate used in our grant application for Estimated Budget: From 29cc01dfddf2cb4a905590d25f5a901cfb18763c Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Fri, 28 Nov 2025 15:06:58 -0800 Subject: [PATCH 22/29] add programmable decoder grant to summary.md --- src/SUMMARY.md | 1 + 1 file changed, 1 insertion(+) diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 60b3b40..311ac28 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -3,6 +3,7 @@ * [Register Renaming](first_arch/register_renaming.md) * [Grants](grants/index.md) * [First NLNet Grant Proposal -- 2024-12-324](grants/nlnet-first.md) + * [NLNet Grant Proposal: Programmable Decoder](grants/cpu_with_programmable_decoder.md) * [Conduct](Conduct.md) * [License](LICENSE.md) * [GPL 3.0](gpl-3.0.md) From 259afa1fa9a74a0465e4b89ef791ce3291aef30f Mon Sep 17 00:00:00 2001 From: Tobias Alexandra Platen Date: Sun, 30 Nov 2025 11:24:17 +0100 Subject: [PATCH 23/29] add my paragraph --- src/grants/cpu_with_programmable_decoder.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index 32e1add..5051abf 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -23,7 +23,10 @@ design and formal verification. Presented the talk "An introduction to Formal Verification of Digital Circuits" on FOSDEM 2024 (). -* TODO: does Tobias Platen want to participate? +* Tobias Platen - Currently working on . Contributed to the Libre-SOC project, mostly on +ECP5 FPGA prototypes and DDR SDRAM memory interfaces. Presented the talk "Using the ECP5 +for Libre-SOC prototyping" on FOSDEM 2024 (). + # Requested support From 2f742e7606a4f5fdfdd86aea20d6f9ebd81656af Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Sun, 30 Nov 2025 02:47:02 -0800 Subject: [PATCH 24/29] clean up whitespace --- src/grants/cpu_with_programmable_decoder.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index 5051abf..7d4bf22 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -23,11 +23,10 @@ design and formal verification. Presented the talk "An introduction to Formal Verification of Digital Circuits" on FOSDEM 2024 (). -* Tobias Platen - Currently working on . Contributed to the Libre-SOC project, mostly on -ECP5 FPGA prototypes and DDR SDRAM memory interfaces. Presented the talk "Using the ECP5 +* Tobias Platen - Currently working on . Contributed to the Libre-SOC project, mostly on +ECP5 FPGA prototypes and DDR SDRAM memory interfaces. Presented the talk "Using the ECP5 for Libre-SOC prototyping" on FOSDEM 2024 (). - # Requested support Requested Amount €100000 From 8bf2a914fa26e749744375fb1db3e51734a74561 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Sun, 30 Nov 2025 11:52:22 -0800 Subject: [PATCH 25/29] apply wording change suggested by cesar in https://git.libre-chip.org/libre-chip/website/pulls/1#issuecomment-380 --- src/grants/cpu_with_programmable_decoder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index 7d4bf22..28ffddc 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -45,7 +45,7 @@ Estimated Budget: * € 20000 Add the programmable decoder and µOp cache to our CPU design. * € 20000 Build a compiler that can extract the decoder portion of QEMU using pattern matching and some symbolic execution of LLVM IR, converting to a HDL IR more suitable for hardware. * € 15000 Write code to convert the HDL IR to a bitstream we can program into the decoder. -* € 10000 Get the fallback emulator to work, as well as misc. other parts of the compiler needed to make the whole system work together. +* € 10000 Get the fallback software decoder and the software instruction emulator to work, as well as misc. other parts of the compiler needed to make the whole system work together. ## Compare with existing/historical efforts From df1721f487e0f8cd6b37e1eb9cc08898e80d0beb Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Sun, 30 Nov 2025 11:58:03 -0800 Subject: [PATCH 26/29] adjust requested amount to actually be the sum of the tasks' amounts --- src/grants/cpu_with_programmable_decoder.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index 28ffddc..e9dbe86 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -29,7 +29,7 @@ for Libre-SOC prototyping" on FOSDEM 2024 ( Date: Sun, 30 Nov 2025 19:01:25 -0800 Subject: [PATCH 27/29] replace xml entities with unicode characters --- src/grants/cpu_with_programmable_decoder.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index e9dbe86..7a9ac01 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -8,7 +8,7 @@ Website/Wiki: Modern computers are built on several different mutually-incompatible popular ISAs such as x86_64, PowerISA, AArch64, and RISC-V. Many of the most popular ISAs have no high-speed libre/open-source implementations, which makes them much harder to trust since you can't inspect their source-code to look for bugs or secret backdoors. Additionally, there are basically no existing modern CPUs which can run more than one of those ISAs without requiring software emulation, which is slow and can often be buggy. -To solve those issues, we are building a libre-licensed CPU with speculative out-of-order superscalar execution that will support a programmable decoder (loosely inspired by FPGAs) followed by a µOp cache so the CPU can be programmed to decode and run at full speed just about any ISA you select, by handling the most common instructions entirely in hardware, with software fallback for decoding some of the less common instructions which can still easily be executed in hardware (storing the decoded instructions in the µOp cache), and full software emulation for the remaining instructions. Additionally you can easily switch between selected ISAs by reprogramming the decoder, allowing you to e.g. run an x86_64 program alongside a RISC-V program in the same OS and quickly context-switch between them. +To solve those issues, we are building a libre-licensed CPU with speculative out-of-order superscalar execution that will support a programmable decoder (loosely inspired by FPGAs) followed by a µOp cache so the CPU can be programmed to decode and run at full speed just about any ISA you select, by handling the most common instructions entirely in hardware, with software fallback for decoding some of the less common instructions which can still easily be executed in hardware (storing the decoded instructions in the µOp cache), and full software emulation for the remaining instructions. Additionally you can easily switch between selected ISAs by reprogramming the decoder, allowing you to e.g. run an x86_64 program alongside a RISC-V program in the same OS and quickly context-switch between them. We are planning on building a custom compiler so the user can select an ISA and that compiler can compile the source code of QEMU to automatically generate the required bitstream for the FPGA, as well as generating the software required for decoding and/or emulating the remaining parts of the chosen ISA. @@ -29,23 +29,23 @@ for Libre-SOC prototyping" on FOSDEM 2024 (, which we expect to be mostly completed by the time we can start working on this grant. -We are requesting more than €50000 based on Jacob Lifshay having previously completed a grant as part of Libre-SoC. If that is not allowed, we can adjust our budget down by removing some tasks, and possibly by not implementing as much of the custom compiler, leaving some of it as future work. +We are requesting more than €50000 based on Jacob Lifshay having previously completed a grant as part of Libre-SoC. If that is not allowed, we can adjust our budget down by removing some tasks, and possibly by not implementing as much of the custom compiler, leaving some of it as future work. We're aiming for a FTE rate of $69305.60/yr per person which is rate used in our grant application for Estimated Budget: -* € 40000 Adding missing features to our CPU, such as memory paging, floating-point instructions, a better cache hierarchy, and better compatibility with the PowerISA specification. -* € 20000 Add the programmable decoder and µOp cache to our CPU design. -* € 20000 Build a compiler that can extract the decoder portion of QEMU using pattern matching and some symbolic execution of LLVM IR, converting to a HDL IR more suitable for hardware. -* € 15000 Write code to convert the HDL IR to a bitstream we can program into the decoder. -* € 10000 Get the fallback software decoder and the software instruction emulator to work, as well as misc. other parts of the compiler needed to make the whole system work together. +* € 40000 Adding missing features to our CPU, such as memory paging, floating-point instructions, a better cache hierarchy, and better compatibility with the PowerISA specification. +* € 20000 Add the programmable decoder and µOp cache to our CPU design. +* € 20000 Build a compiler that can extract the decoder portion of QEMU using pattern matching and some symbolic execution of LLVM IR, converting to a HDL IR more suitable for hardware. +* € 15000 Write code to convert the HDL IR to a bitstream we can program into the decoder. +* € 10000 Get the fallback software decoder and the software instruction emulator to work, as well as misc. other parts of the compiler needed to make the whole system work together. ## Compare with existing/historical efforts @@ -55,7 +55,7 @@ The [Transmeta Crusoe](https://en.wikipedia.org/wiki/Transmeta_Crusoe) is somewh We expect to solve 3 technical challenges: -* To design and write a working programmable decoder and µOp cache such that our CPU can run arbitrary ISAs. For now we're planning on it being able to support older x86_64 (where the patents have expired), PowerISA, and RISC-V, though we don't necessarily expect to have complete support for all of those within the scope of this grant. +* To design and write a working programmable decoder and µOp cache such that our CPU can run arbitrary ISAs. For now we're planning on it being able to support older x86_64 (where the patents have expired), PowerISA, and RISC-V, though we don't necessarily expect to have complete support for all of those within the scope of this grant. * To build a custom compiler that can successfully extract a decoder for the most common instructions of the user's selected ISA, as well as generate a bitstream to program our decoder. * To continue work on getting our CPU to be more complete and able to run more complex software. From 991b15c3209ecfa1e051a483718bdeb6fec0cca8 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Sun, 30 Nov 2025 19:06:05 -0800 Subject: [PATCH 28/29] move text to fit within length constraint --- src/grants/cpu_with_programmable_decoder.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index 7a9ac01..bf89baa 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -8,11 +8,7 @@ Website/Wiki: Modern computers are built on several different mutually-incompatible popular ISAs such as x86_64, PowerISA, AArch64, and RISC-V. Many of the most popular ISAs have no high-speed libre/open-source implementations, which makes them much harder to trust since you can't inspect their source-code to look for bugs or secret backdoors. Additionally, there are basically no existing modern CPUs which can run more than one of those ISAs without requiring software emulation, which is slow and can often be buggy. -To solve those issues, we are building a libre-licensed CPU with speculative out-of-order superscalar execution that will support a programmable decoder (loosely inspired by FPGAs) followed by a µOp cache so the CPU can be programmed to decode and run at full speed just about any ISA you select, by handling the most common instructions entirely in hardware, with software fallback for decoding some of the less common instructions which can still easily be executed in hardware (storing the decoded instructions in the µOp cache), and full software emulation for the remaining instructions. Additionally you can easily switch between selected ISAs by reprogramming the decoder, allowing you to e.g. run an x86_64 program alongside a RISC-V program in the same OS and quickly context-switch between them. - -We are planning on building a custom compiler so the user can select an ISA and that compiler can compile the source code of QEMU to automatically generate the required bitstream for the FPGA, as well as generating the software required for decoding and/or emulating the remaining parts of the chosen ISA. - -Our CPU is a continuation of the CPU from , which we will also be extending to support more features such as memory paging, floating-point instructions, and better compatibility with the PowerISA specification. +To solve those issues, we are building a libre-licensed CPU with speculative out-of-order superscalar execution that will support a programmable decoder (loosely inspired by FPGAs) followed by a µOp cache so the CPU can be programmed to decode and run at full speed just about any ISA you select, by handling the most common instructions entirely in hardware, with software fallback for decoding some of the less common instructions which can still easily be executed in hardware (storing the decoded instructions in the µOp cache), and full software emulation for the remaining instructions. ## Relevant Previous Involvement @@ -61,6 +57,13 @@ We expect to solve 3 technical challenges: A WIP high-level design of our CPU: +Text that was intended to be part of the Abstract, but didn't fit within the 1200 character length constraint: +Additionally you can easily switch between selected ISAs by reprogramming the decoder, allowing you to e.g. run an x86_64 program alongside a RISC-V program in the same OS and quickly context-switch between them. + +We are planning on building a custom compiler so the user can select an ISA and that compiler can compile the source code of QEMU to automatically generate the required bitstream for the FPGA, as well as generating the software required for decoding and/or emulating the remaining parts of the chosen ISA. + +Our CPU is a continuation of the CPU from , which we will also be extending to support more features such as memory paging, floating-point instructions, and better compatibility with the PowerISA specification. + ## Ecosystem We are likely to work with QEMU upstream, as well as LLVM and Clang. We are already working with the FIRRTL specification GitHub repo to resolve problems we encounter, as well as with LLVM Circt, and with the Rust Language. From 2543b1ce72de2085abda7111ab25473d8924e834 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Sun, 30 Nov 2025 19:10:03 -0800 Subject: [PATCH 29/29] add NLnet project code --- src/SUMMARY.md | 2 +- src/grants/cpu_with_programmable_decoder.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 311ac28..6a4250d 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -3,7 +3,7 @@ * [Register Renaming](first_arch/register_renaming.md) * [Grants](grants/index.md) * [First NLNet Grant Proposal -- 2024-12-324](grants/nlnet-first.md) - * [NLNet Grant Proposal: Programmable Decoder](grants/cpu_with_programmable_decoder.md) + * [NLNet Grant Proposal: Programmable Decoder -- 2025-12-681](grants/cpu_with_programmable_decoder.md) * [Conduct](Conduct.md) * [License](LICENSE.md) * [GPL 3.0](gpl-3.0.md) diff --git a/src/grants/cpu_with_programmable_decoder.md b/src/grants/cpu_with_programmable_decoder.md index bf89baa..d622560 100644 --- a/src/grants/cpu_with_programmable_decoder.md +++ b/src/grants/cpu_with_programmable_decoder.md @@ -1,4 +1,4 @@ -# NLNet Grant Proposal -- Libre-Chip's CPU with a Programmable Decoder to Run Multiple ISAs at Full Speed +# NLNet Grant Proposal -- Libre-Chip's CPU with a Programmable Decoder to Run Multiple ISAs at Full Speed -- 2025-12-681 Project Name: Libre-Chip's CPU with a Programmable Decoder to Run Multiple ISAs at Full Speed