ETHDam(n)
During ETHDam's Quadratic Funding round, run on clr.fund, we discovered a critical bug in MACI. The issue stemmed from the lack of validation on MACI public keys within the Poll contract. A user (spoiler alert, it was a self-inflicted denial of service (DoS)) was able to submit a MACI public key which was not a point of the Baby JubJub elliptic curve, and it broke everything.
So... what happened, really?
Well, during the ETHDam round, there was an issue with the subgraph that caused the frontend web app to incorrectly display that the voting period had ended, which prevented users from voting.
A quick way for us to identify the issue was to directly call the Poll's smart contract to submit an invalid vote. If the contract accepted votes, that meant that it was not a MACI issue. Given the contract accepted votes, it immediately helped us confirm that the bug was a frontend issue. What we didn't know at the time was that this vote, as well as its associated key, would cause a denial of service.
The message in question can be seen on Gnosis Chain's block explorer. Below is proof that the wallet which submitted the invalid key is from one of our teammates (just in case you thought we'd want to censor it :P).
{
  "address": "0xc59975735ed4774b3Ee8479D0b5A26388B929a34",
  "msg": "This is proof that I control this wallet and dossed MACI by mistake while doing an invalid vote",
  "sig": "0x8962b66462630f12476d7bdb348f08af574ba40dd32c6f149ea26717830f13f50f4e95574e8fc909dd3dd1e20bcd85ae2c3caaf41bed6b123973353635483b7f1b",
  "version": "2"
}
One can verify the message using etherscan, paste the address, message and signature. Don't take our word for it, verify!
The message was sent alongside an invalid MACI public key. As a result, due to how MACI messages are processed, the zk-SNARK circuit failed to generate a proof, which prevented the QF round from finalizing.
A MACI public key is (supposedly, at least) a point on the Baby JubJub elliptic curve. The mistake that MACI's code made was to not validate at the smart contract level that the keys accepted as arguments were actually valid points on the curve. At the time the bug was triggered, the two coordinates of the point (x and y) were only checked to be less or equal to the curve prime order.
The MACI key in question looked like the below object:
{
  "x": 1,
  "y": 1
}
In MACI, messages are (well, should be) encrypted using a shared key generated using a random keypair and the coordinator's public key. This allows for the coordinator to reverse the process - as long as they have the random public key - and decrypt the message using the same shared key. This is achieved using Elliptic-curve Diffie–Hellman (ECDH).
Shared key generation from the user's side:
ECDH(randomKeyPair.privateKey, coordinatorPublicKey)
Shared key generation from the coordinator's side:
ECDH(coordinatorPrivateKey, randomKeyPair.publicKey)
The randomKeyPair.publicKey that was sent as {x: 1, y: 1} was eventually passed by the coordinator to a zk-SNARK circuit to try and decrypt the corresponding message and perform further processing.
When passed to the circuit, the code would perform the ECDH operation by performing a scalar multiplication between the public and the coordinator private key. This happens inside the MessageToCommand template, which calls the ECDH template. Here, there is a call to escalarMulAny which in turns calls SegmentMulAny, where we encounter the final call to Edwards2Montgomery where the error pops up.
Looking inside the Edwards2Montgomery template, we can see the operation that is performed on the public key. In short, the point is converted from the Twisted Edwards form to Montgomery form, as it allows for cheaper operations inside the circuit (whereas outside it is represented in its original Twisted Edward form).
The equation to convert between the two forms is presented below:
                1 + y       1 + y
    [u, v] = [ -------  , ---------- ]
                1 - y      (1 - y)x
We can see how passing a y of 1, the equation would result in a division by zero. Below you can see the full stack trace of the error:
Error in template Edwards2Montgomery_349 line: 38
Error in template SegmentMulAny_362 line: 81
Error in template EscalarMulAny_364 line: 163
Error in template Ecdh_365 line: 23
Please note that passing
xas 0 would also trigger this error, though within theEscalarMulAnytemplate, this case is handled and the G8 point is passed instead, preventing an error while generating the proof.
Given this issue, we weren't able to complete the proof generation for message processing. It is somewhat good news, as once a message is posted to MACI's smart contracts, it's not possible to censor it even if these messages trigger bugs.
How was the round saved?
After understanding the bug, the team came up with a solution that would both allow EthDAM to pay out projects with cryptographic guarantee of the results, as well as provide a solution that is fully transparent and verifiable. After all, MACI is all about verifiability.
Given that the clr.fund round was run using a token with no monetary value (EthDAMToken), instead of spinning up a new round and asking users to re-submit their votes, we opted for an automated solution which would scrap all signups and contribution of this token, as well as all messages, and to re-submit them to the new clr.fund round contracts.
This way, voters can validate that all signups and votes were included, and by not submitting this invalid message, being able to generate zk-SNARK proofs and validate publicly the final tally result.
The script in question can be found in a gist. In a nutshell, the script pulled all signups from the original round fundingRound contract, and submitted them on the new contract, after having approved the new contract to spend all EthDAMToken tokens. Then, it pulls the messages, aside from the invalid one, and posts them to the new Poll contract.
After this, the EthDAM team was able to complete the tally alongside ZK proofs to be submitted on chain.
How did we fix MACI?
To fix the bug, we added code to validate that a given point is on the curve. The Solidity library, found within the original Baby JubJub paper, was originally developed by yondonfu and for sake of simplicity, it was copied over to the MACI's repo. The code needed just a couple of small upgrades to work with more recent Solidity versions (0.8.20).
/**
 * @dev Check if a given point is on the curve
 * (168700x^2 + y^2) - (1 + 168696x^2y^2) == 0
 */
function isOnCurve(uint256 _x, uint256 _y) internal pure returns (bool) {
  uint256 xSq = mulmod(_x, _x, Q);
  uint256 ySq = mulmod(_y, _y, Q);
  uint256 lhs = addmod(mulmod(A, xSq, Q), ySq, Q);
  uint256 rhs = addmod(1, mulmod(mulmod(D, xSq, Q), ySq, Q), Q);
  return submod(lhs, rhs, Q) == 0;
}
As seen in the code above, the function will evaluate the Baby JubJub equation using the input public key x and y values. Any value that is not on the curve, will be rejected, as shown below:
// check if the public key is on the curve
if (!CurveBabyJubJub.isOnCurve(_encPubKey.x, _encPubKey.y)) {
  revert InvalidPubKey();
}
The bug was fixed with this PR. On top of preventing such invalid values from being accepted by the contract, we also added more validation across the TypeScript libraries to make it harder for users to make such mistakes.
Footnote
The MACI team would like to first of all thank EthDAM's team for their patience while we navigated through this issue, and trusting us with coming up with a transparent solution for the round.
Furthermore, we thank mikerah from HashCloak for helping to review the fix, and their expertise in handling security incidents like this. Together with them, we looked at other repositories using similar code, looking for the same mistake, though we did not find similar protocols where an invalid key would cause a denial of service. If you are a ZK developer reading this, we hope this was a lesson that would help remind us all to always validate user input for future scenarios.
Finally, a big thank you to Raphael for explaining the math behind the issue, and suggesting that full point validation would end up being a much better solution than just not accepting y = 1.
Contract addresses
Below is a list of the contract addresses for this EthDAM round. All code is verified on Gnosis's block explorer and can be reviewed.
RoundToken
clr.fund Instance
Original round
New Round
MACI New Version
We decided to publish a new version of MACI with the fix (and other changes made since the 1.2 release). The new version is 1.2.1.
Please do not use the 1.2 version of the contracts for any new deployments, as it contains the bug described in this post. All other packages are safe, though 1.2.1 is the recommended version to use.