I’ve started to describe the work that went into enabling hot maximum connection setting in PR 2698. This work took longer than anticipated for several reasons, but those are all normal and sensible.
When Patrick and I were discussing the shape of the code, we started to discuss the scope of the work as well. What does it mean to finish a feature? Are there concrete steps along the way where:
- someone can review each step as a sufficiently-independent whole
- the design of the entire system is evident at each step
- each step provides more value than it costs
- we can release at each step (not releasing subsequent steps immediately or ever) with minimal disruption to the system as a whole
These are the questions I try to ask of all of my development work, and they’re good questions to keep in mind whatever you’re doing.
You Must be an Engineer; You Must be a Manager
I made one comment early in the process to the effect of “If I were to write this feature myself, with no other code backing it up, I wouldn’t have started from here.”
It’s not that the system was designed to make this feature impossible, but the system was not designed to make this feature easy.
In a situation like this, you have four options:
- don’t do anything
- rush through and make a mess
- change the system first to make your feature easy
- implement your feature, then change the system so it looks easy
There are good reasons to choose any of these options. We went with the fourth option.
Tomorrow Never Codes
The problem with the fourth option is that it’s a promise to yourself and others. It’s like paying for a feature with a credit card; at some point, you may have to pay it back. If you don’t, it’ll hurt more and more.
Unlike a credit card, you can walk away from the code with few repercussions, so it’s not a great metaphor.
You may have heard of the concept of Technical Debt, originally introduced in the Portland Pattern Repository wiki. This concept has grown past its original definition, but the core principle is too important to be diluted: anything you choose not to do now that will impede future development.
That includes documentation, potential bugs, duplication, complexity, lack of tests, bad design.
When I turned over this rock, I found some interesting things. They’re not bad and they weren’t obviously bad when they were first introduced, but my change revealed that they’re no longer correct and they needed to be changed.
Why So Negative?
If you look at the first commit on PR 2959, you’ll see that it looks very simple. It might even look a little silly. Why do I care so much about the definition of variables?
I care because the way we think about them and work with them should reflect reality.
The way the code was originally written, we use integers to count the total number of allowed maximum connections, the number of file descriptors available on the system, and the current number of connections. This is all well and good; you can’t have half a connection lying around, or pi connections, or something like that. You can even have zero current connections or zero file descriptors available (although that’s probably an error case, we should be able to represent that).
The C++ type int
makes sense, but it also makes no sense. It allows the
numbers we want, but it also allows numbers we don’t want.
What does it mean to have -1 file descriptors available? What does it mean to set a maximum limit of -20 incoming connections?
I don’t have a good answer to that, and I don’t think there is a good answer. If there is, I’m not going to spend much time thinking about it; there are other things far more important for me to do. Let’s treat those as errors.
If they’re errors, why should the system allow them in the first place? I do have validation in the RPC code, but I’d rather defend against bad values in lots of places, including places where the compiler can help me out when we’re building the Core.
The right solution here, I believe, is to switch from int
as a type to
unsigned int
. You could also make an argument for any other unsigned type; I
went for the minimal change possible so that someone could skim the code and
understand my intent.
What’s an unsigned integer? It’s always zero or positive. (If you’re interested in the deep details, you need to understand the bitwise representation of numbers in computer systems. It gets really interesting when you understand why an unsigned 8-bit integer can hold the values 0 through 255 but a signed 8-bit integer can hold the values -128 to 127.)
Always Wear Your Badge
This kind of change has a ripple effect, which may or may not touch other parts of the system.
If the number of max connections stored in nMaxConnections
must always be a
positive integer, any time I change that value, I have to make sure I store a
positive integer. All of the std::max()
calls in that first commit of the PR
make sure that any negative number turns into 0.
Of course, there’s some oddness in C++ that I don’t quite understand that the
compiler doesn’t realize that 0 is a perfectly valid unsigned integer, so I
have to cast it explicitly with (unsigned int)0
, but that’s another one of
those things that I’ll clean up in a subsequent commit or learn something new,
so the fun continues.
If I take this further, which I did, any function or method call intended to set or retrieve the number of maximum connections also has to change. Why let other code I’ve written be sloppy?
The nice part is that the code I actually wrote to manage the values in PR 2698, not the scaffolding code, barely needed to change, except for one spot, specifically univalue.
This library provides a convenient way of grabbing values from RPC commands. Whatever you pass in, it’ll try to interpret a wallet address as a string, the strings “true” and “false” as boolean values, and numbers as either integer or rational numbers. That’s great!
… except it has no concept of unsigned integers.
This brings this post full circle. I wrote a little bit of code to work around this for this one specific use case. There’s no reason to set a negative number for the number of maximum allowed connections. The RPC code should throw an error.
Should my code detect that, or should univalue?
There’s no single obvious answer, but this question will come up again in work such as PR 2942, where it again makes no sense to allow a negative block height as an RPC parameter.
Your homework for this week is to think through the alternatives and come up with your own answer as to how you’d approach this problem. I can think of multiple solutions. None of them are too much code. None would take too much effort. Each has positives and negatives.
To me, that’s what makes work like this both interesting and frustrating. I get to make interesting decisions with no obvious answers and I get to discuss these situations with smart, motivated people.
On the other hand, I get it wrong sometimes. Then again, PRs are cheap and easy to open.
Happy hacking!