EffVer: Version your code by the effort required to upgrade(jacobtomlinson.dev) |
EffVer: Version your code by the effort required to upgrade(jacobtomlinson.dev) |
Most languages these days have a built-in test suite. They can define "no breaking changes" so that it actually means something. Have a set of tests called API. During a major release cycle, you can add tests, but you can't change the tests you have, and the tests have to keep passing. The package registry can run those tests, and if any fail, you don't get to post a minor version release with that code.
This goes from an underdefined "our API will have no breaking changes" to "this is the guaranteed behavior of the API, and cannot change until the major version number is bumped". If a downstream user of the package sees some behavior they want added to the API contract, they can write a test and submit it as a PR, and that test can go into the next release if the maintainers agree that it's a stable behavior which they don't intend to change.
When you move from e.g. 1.0 to 2.0, the tests which now fail are moved to "1.0 API", but they're never removed. No test which is ever in an "API" testset can ever be removed, the package manager enforces this. Provide some mechanism so users of the package can annotate API tests in packages they use as a part of their own test suite, so that when they upgrade, those test failing is an immediate message about what no longer works. If you only rely on behavior which is in common from 1.0 to 2.0, it should be safe to upgrade.
No more taking people's word for it when they say "no breaking changes", no more bikeshedding about what is or isn't a breaking change, just... tests. End of.
https://fireproof.storage/posts/roadmap-to-1.0/
We’d define 1.0 in exactly the way you describe, where we can add tests for 1.1 but not remove them without triggering 2.0
I did draft a git-based implementation (https://github.com/abathur/tdverpy), but it just obviously can't be as compelling as one that was part of a language's native tooling/ecosystem could be.
I do think that having an API subset of tests is better than basing the system on all tests. Packages should have as many tests as possible, I frequently write tests which I know will break when I do further work on the code, so that I notice when it happens, and because if it happens accidentally it's probably a bug. Wouldn't want a versioning system to have a side effect of making people reluctant to write a test, because it would commit them to the results. I envision tests migrating from the rest of the suite to the API set over time.
I do like that your system completely specifies the meaning of minor and patch numbers, and wonder if there's a way to tweak my proposal so that it does so as well.
If a maintainer staunchly refuses to define an API, that's useful information, the kind you can't get with standard SemVer, where the only mechanism is trusting strangers to do the right thing. Which, to be fair, works ok, some of the time.
Existing package managers could even implement it in a completely backwards- compatible way: if you as a package maintainer don't care for it, you simply never add "API tests".
1. For a library there is API and there are implementation details. What if test depended on implementation detail?
2. What if tests had undisputable bug?
3. Test refactoring requires major release now?
4. Realistically test suite will have some execution paths not covered.
I like the idea of running same tests over multiple versions, to observe changes. But I disagree that it would automate semver. (Maybe in very limited subset of cases)
P.S. Not an actual downvoter, but if I would have downvoted, these would have been the reasons.
If you want to communicate impact it might make more sense to add on to semver in some way with a 2 axis "amount of effort" and "likelihood it impacts you" as say "-b7" or something. That said, start trying to include so much information in the version string and eventually you'll just end up with an compressed version of the release notes and not a version number.
Maybe if people did that for their dependencies, we wouldn't have certain software stacks with thousands of them for a simple helloworld-ish backend.
I'm of the opinion that SemVer or any other version arrangement is not to be trusted blindly. When I see a minor version upgrade, it gives me some hope I can upgrade without much trouble but I've been burned too many times to go in blind like that.
I hear ya. So what what we should be doing is to make a 4096-dimensional vector based on an embedding created from our release notes. And use that as the release version :D
SocialVer:
- upvote or downvote major release changes
- emojis to communicate the level of upgrade pain
- emojis to communicate the level of disaster after upgrading
There are many 'pros' to this approach: it's stupidly simple and tools can autogenerate it easily, it's trivially sortable, it tells you how long ago the release happened, but above all it intentionally conveys nothing about your perception of the magnitude of changes and therefore is never misleading. The real meaning is conveyed via release notes: high level changes (with emphasis on any breaking changes) followed by a detailed changelog.
I understand the desire to convey more meaning in the version number itself, but every alternative approach I've tried always falls apart in some way and/or becomes more trouble than it's worth, especially when it's a marketing person who wants version numbers to get bigger faster or a "humble" team member who is anxious to call this the 1.0 release.
Stuff like SemVer seems like a good idea initially, but even with a rigorous test suite there are cases where a bug fix or new feature aren't quite as backwards compatible as intended, so trust in the version number only goes so far. Or it tends to give undo emphasis, e.g. in this release you are pushing out several backwards compatible bug fixes and you are finally pulling a feature you deprecated a long time ago. You have good evidence that nobody has used this feature in years, and for all intents and purposes this is a very small patch release, but you instead have to bump the major version, implying that it's a big release.
Something like EffVer is an interesting approach, but when it ends up being inaccurate for you (i.e. when a supposedly painless upgrade is anything but), then all it has done is pour salt on the wound.
Either way, the amount of work to do for an upgrade depends on which parts of the product you are using and whether those parts have any changes in the new version. For this reason, most projects also have a changelog which gives you more detailed information about the upgrade. When preparing for an upgrade it is advised to read the changelog.
The more breaking changes are, the more effort it is required to take them into account. Semver only applies to APIs. Effver could apply to UIs too, but for APIs, it would be similar, just not as well defined (because it is more general).
zero version still denotes a codebase under development
A human wrote that and said "Yeah, this makes sense to me." All code is under development until it's not.A major version of zero means pre-release code. i.e. a codebase under active development (with the implicit assumption that there will likely be major breaking changes).
A major version of zero just means "I am not committing to a stable API until 1.0" which is a completely fair stance. I'm not going to write code that's very clearly unstable and in active churn and try to pretend it's stable. I'm also not going to keep around a legacy API at that point yet.
Compare that to a standard bump in major version (i.e. 1.0 to 2.0). In this case there is an expectation of a migration path and in all likelihood a versioned legacy API that'll stick around so that users can slowly migrate across the breaking changes between API versions.
Frankly I'm not going to commit to doing that for 0.X.Y/indev projects.
You have just described all actively written software as "major zero". This is why it's a silly concept.
Though that'd communicate something totally different from EffVer... Bumps in my X are not "macro effort".
In reality projects/vendors often make versioning decisions for marketing reasons. If you add a ton of killer features with no backwards incompatibility and trivial upgrade path, you might still bump the major version number even though normally that would denote radical backwards-incompatible change.
The need for marketing versioning will not go away, so maybe what we need is an upgrade quantifier modifier to the version number.
E.g., 8.0.0 can be a major functionality release, and 8.0.0-ez can be a major functionality release that has an easy upgrade path while 8.0.0-hd can be a major release that has a difficult (hd == headache) upgrade path.
(I realise that TLAs have a limited namespace and are bound to have multiple meanings in many contexts, but The EFF is quite a prominent and well-established use in the computing/software arena.)
If the micro version you're running is 100 versions behind, is it still expected to be micro effort?
There are almost certainly better paths through, and I suspect the idea isn't broadly usable without different kinds of tests that have different kinds of rules. It's probably not helpful to have to increment your major version just because you use snapshot testing and some dependency update causes a trivial shift in the output.
I also fiddled around a little with an idea I called "earmarks", which are basically just version-bound tests. You could use these to express the idea that, say, test_x shouldn't pass or fail until the version is >= a.b.c.
This would make it easy to deprecate an API today and go ahead and ship a test that requires the API to be present and functioning until the next major release but not after. Or, for example, to make a commitment device that asserts the project will hit some doc/lint/typing goals by some clear point.
Since it's an open-ended mechanism, I imagine something like it is the lower-friction way for a real project to explore applying these concepts without full toolchain support.
SemVar makes sense but every piece of software using it doesn't. Not every piece of software makes an interface commitment. Those that do should use SemVar, the rest should just use the date or some monotonically increasing number.
Is your claim that a release which introduces bad algorithmic complexity requires a major release to fix in semver? Who thinks this?
If the test is in the API testset, it's API. If it isn't, it's an implementation detail.
> What if tests had undisputable bug?
If it's in the API testset, time for a major version bump. If not, fix it.
> Test refactoring requires major release now?
Only if you're refactoring the API, as defined by the API testset, thereby producing a breaking change.
> Realistically test suite will have some execution paths not covered.
Doesn't matter. If the behavior isn't in the API testset, it's not a part of the API.
> But I disagree that it would automate semver.
The point isn't to automate semver, I'm not even sure what that would mean. It's to define it, in a useful and objective way.
I don't think your proposed scheme needs to be perfect in that regard, but acknowledging the concern and at least putting it in perspective would probably help.
SemVer is just a pinky-swear not to break people's code. In the real world, people's code breaks anyway, and then you get an argument about what's API, and expected behavior, and so on, and so forth.
What I'm proposing is simply to replace the pinky promise with tests. From some of the other comments, I think this point may have been missed: it isn't every test in your test suite, it's the ones marked "API", only.
This is a strict improvement over social-contract SemVer in two ways: one is that the package manager won't let the maintainers break the API tests without a major version bump. The other is that, if you, as a user, are unsure if some behavior is part of the stable API, you can write and submit a test to that package. If that test is accepted, great: that behavior now cannot change without a major version bump, because, again, the package manager will not bundle the package if that test breaks. Furthermore, even on a major version bump, it is instantly clear if that test is still valid, or not, you can just check before upgrading. If they don't accept the PR, you know that it isn't considered part of the API, so you add the test to your own test suite, so that at least you know quickly what broke if they change it.
assert(true); is a thing. I don't think this solution would actually work. Tests might be refactored or improved and that shouldn't trigger a major release.
But the package registry checks all API tests against the last version and rejects the registration if they change at all. That can be relaxed for non-semantic parts of the test, like a description of what the test means, but none of the code is allowed to change. It would be better if this were based on the AST, so that whitespace tweaks don't trigger a build failure, that's practical to achieve in most languages.
Refactoring an API test isn't worth losing the guarantees a system like this provides, and it's only the API tests which come with any restrictions, maintainers may do as they please with the rest of the test suite. An improved API test has to be provided as a new test. Part of the proposal is that users can refer to API tests in their own code, as a way to determine if tests they rely on break in a major release, so the tests need unique names, which means they can be rearranged in the file. It also means that if there's a typo in the name, or the name sucks, well, you're stuck with that until the next major release, and even then it goes on to live in infamy, forever, in the obsolete-API portion of the test suite. Not ideal, but it can't be avoided.
As you should be, it's a great contribution. I was referring to the downvotes mentioned in a comment further up.
I agree that what you propose is an improvement, but it can be misunderstood to claim that it can prevent _any_ real-world breakage. There will always be aspects not covered by tests and which other people still rely on.
I've had this experience with API contract tests between systems. Despite covering a lot of details and preventing deployments that failed these tests, we would occasionally run into problems where passing changes would break stuff in production. There was always an area of uncodified assumptions, and for a case of tens of different clients, whereas public libraries can have millions. So, I believe this is also applicable to your proposed solution.
You can argue that your solution significantly shrinks this area of uncertainty while also _defining_ it, which helps when reasoning about what you can depend on - and I agree. But it does not eliminate the gap, and this is what people were pointing out.
I was just a little frustrated that the discussion even went there, because I didn't think you were even claiming what they were arguing against. That was happening because I think you left a gap by not addressing it clearly, and wanted to point it out, because people seemed to be taking past each other.
It's only intended as an improvement to practice. SQLite has a test suite which exhaustively tests every single branch of the code, which D. Richard Hipp wrote on contract to one of his clients. It takes more than a day to run. We might all aspire to such a level of professionalism, but realistically, most programs and libraries will fall short of glory here. And while this exceptional test harness has in fact limited the scope of SQLite bugs a great deal, it hasn't eliminated them entirely.
So with a TestVer system, or whatever we want to call it, there will still be breaking updates, there will still be bugs. But it provides a mechanism for defining the invariant behaviors of a major release number.
It's possible some of the early respondents thought that I considered an appropriate response to a minor change which breaks downstream code to be "lol, too bad". That might happen sometimes, minus the lol we may hope, but more often the response should be more proactive: a revert, adding a test which clarifies the new behavior, something.
The best part of this system is that users of a package can write additional tests of that package and submit them as PRs, if there's some behavior they see the package exhibiting which doesn't appear to be in the test suite. This is easier by far than making changes to the package itself, just add the tests to your own suite, if they pass, make a fork of the repo, add those test to their API suite, submit a PR. Whether they accept the patch or not, you have it in your own test suite, so you'll be informed immediately if a later release breaks it.
If there's no stable behavior because the software is still at that stage of development, it's 0.x software still. That's true in SemVer as well as this refinement of it.
Contrariwise, if you think software is ready for 1.0 and you can't come up with any tests which display guaranteed behavior which won't change without that major version bump, then no, it's not ready.
During the ramp-up to 1.0, release candidates and such, some of the tests, the ones which evidently demonstrate the expected behavior of the API, get moved to the API testset. Since it's still zero season, this imposes no restrictions. The tests could have bugs, the code could have bugs, the API tests might need tweaking, the API can still change, all of this is fine.
Then you release 1.0 and the API tests have to stay green until 2.0. I think we have different estimates of how likely it is that it would make sense to change those tests and not call that a breaking change, because that's what those tests are for. They are a subset of the (often much larger) test suite, designed specifically on the premise that if these behaviors change, it will break the API. I don't think it's hard to write those, I've never found that difficult in my own code. If you can't make a few assertions about the behavior of a package, does it even make sense to describe it as having a stable API? What would that even mean?
A realistic version of this system would have to allow preludes to the API tests, and those can change. Setup might involve internal details, mockups, other things which it isn't prudent to lock in. That theoretically lets maintainers change behavior while pretending they didn't (dumb example, changing the meaning of `lowercase` to `uppercase` and replacing the setup string with digits), but the point of this isn't to stop people from being jerks, that isn't possible.
There aren't restrictions on what the API tests can be, either. Someone with a philosophical objection to all of this can engage in malicious compliance and write "assert 1 + 1 = 2" to pacify the package manager. A minimal API test set which is still in the spirit of the concept is to assert that the exported/public names in the package are defined. That already provides a hard guarantee, after all, and if it's a subset of the exports, that shows which ones are intended to be stable and which ones are experimental.
Users can build on that by writing tests which use those exported values, demonstrating expected behavior, there's no need or expectation that every possible corner is covered by the test suite right at 1.0. Maintainers can add those tests to the API set if they agree that the assertions should be invariant.
Part of why I like this idea is there's a reluctance, which you're showing, to make major version releases. It's stressful for the maintainers and the users. In this system, the broken tests stay in the suite, and get moved out to the version 1 set. Users can assert the tests in the suite which fix behavior they rely on, and automated tooling can tell them if it's definitely not safe to upgrade (nothing can assure that any upgrade is completely safe, certainly not Scout's-honor SemVer). Making major releases should be less fraught. It's sadly common for package maintainers to make changes which are actually breaking, and claim that's not what happened, so they can keep a 1 in the front of their version string. That's a perverse incentive.
The worst case scenario which seems to concern you is what? The code is good but the tests are bad? Ok, write a sheepish NEWS.md assuring everyone that nothing is really changing but the test suite, and cut another major version. Laugh about it later.
Example: now that 1.0 is released, I want to add two new massively breaking changes. The team opens two PRs. The first one merges and bumps the version to 2.0, then the next one merges, and it gets bumped to 3.0. That sounds ridiculous.