Packagers don't know best

Packagers don't know best(vagabond.github.io)

119 points by decklin 13 years ago | 126 comments

tmoertel 13 years ago |

The packagers actually do know what’s best. What they do makes patches flow faster not only downstream but also upstream. Improvements and fixes get to more people and get to them faster.

Unbundling upstream libraries from downstream projects flattens the change-flow network, reducing the time it takes for things to get fixed and for the fixes to propagate. For example, say that project P uses library L and bundles a slightly modified L in its release. Whenever L’s developers fix or improve or security-patch L, P’s users don’t get the new code. They have to wait for P’s developers to get around to pulling the new code from L, applying their own modifications, and re-releasing P.

Packagers say that’s crazy. They ask: Why does P need a modified L? Is it to add fixes or new features? If so, let’s get them into L proper so that L proper will not only meet P’s needs but also provide those fixes and new features to everyone else. Is it because P’s version of L is no longer L but in name? Then let’s stop calling it L and confusing everybody. Fold the no-longer-L into P or release it as a fork of L called M that can have a life of its own.

The point is that keeping L out of P makes two things to happen: (1) It ensures that when L’s developers improve L, all users, including P’s downstream users, get those improvements right away. (2) It ensures that when P’s developers improve L, those improvements flow upstream to L quickly and reach all of L’s users, too.

More improvements, to more people, faster. That's the idea.

usea 13 years ago | |

I don't want to speak with any authority on this subject since it's mostly foreign to me. However, I will say that your explanation comes with a pretty big assumption: that a library sits on some 2-dimensional spectrum from bad to good. If you don't subscribe to this notion, libraries don't improve; they simply change. If you accept this, then you begin to see why arbitrarily changing parts of a software package without even trying to understand the consequences of those changes is madness.

I'm not saying I disagree with you, just trying to point out a spot you might have overlooked.

sounds 13 years ago | | |

Ok, I definitely have a bias toward what the grandparent is saying: flatten the hierarchy.

But just from a logical standpoint, couldn't you apply the following equally to the Riak guy who is complaining: "arbitrarily changing parts of a software package without even trying to understand the consequences of those changes is madness"

It absolutely is madness. If the Riak guys want to use leveldb in a way Google won't support, they should rally with the package managers and get Google to stop being "pretend open source." (Hint, Google: just releasing the source doesn't work if you ignore all bug reports and patches from outside.)

I suspect the real issue here is too much "Not Invented Here" syndrome by all parties involved.

freshhawk 13 years ago | |

Well put, and this particular problem can easily be handled by having a prominent doc section called "Information for packagers" that outlines all this stuff. This isn't a new problem and seems to be best handled by engaging with the packagers and putting a small amount of effort into helping them, it's easy and pays enormous dividends.

dvanduzer 13 years ago | | |

The "information for packagers" doc section is often referred to as a Makefile.

It enumerates minimum versions of shared libraries, as well as explicit versions of static libraries.

dvanduzer 13 years ago | |

They really don't.

This is part of the reason for the plethora of Linux distributions. Some deployments can afford the rapid pace (and consequent instability) of the short term Ubuntu releases or Fedora. Other deployments really do require the longer term stability of the more methodical Ubuntu LTS releases or CentOS / RHEL.

Any improvement requires change, but not all changes are an improvement.

tmoertel 13 years ago | | |

I'm not sure I was able to get my point across to you. Let me try another approach.

The improvement I'm talking about occurs upsteam of the distributions, even though it is caused by the distributions' packaging policies.

Libraries are upstream from projects, and projects are upstream from distributions. If the distributions discourage projects from bundling libraries, this policy will encourage project developers to talk to the upstream library developers to get desired changes into the libraries, rather than go the customize-and-bundle route. This improved coordination and patch-flow benefits the users of the libraries and the users of the projects, regardless of whether those users rely on any particular distribution to get the software. Users are, as always, still free to pick whatever distribution best suits their preferences, or no distribution at all. Still, they benefit from the distributions' debundling policy.

npsimons 13 years ago | |

Yeah, I love how the article is so myopic, they can't imagine a world in which they might be using a package that some other package is also using, therefore, it might need to be upgraded separately from their package. So the author has worked on two large projects that have dependencies, yet he thinks he has the experience to say that splitting a package up (say, into docs, libs and executables) is a bad thing? How many embedded devices has he administered? Or clusters? Or simple networks where things are setup to have NFS mounts across machines, and it's obvious that while you can install the docs on the NFS-doc-server once, you may need to have separate binary and library installs for each architecture/OS on the NFS-binary-servers. There's a reason sysadmins love well packaged software.

dvanduzer 13 years ago | | |

Sysadmin here.

Authors of well packaged software know when they need fine grained library features, and include static versions of that library. Authors of well packaged software also pay attention to the distribution of commonly used libraries, and make careful decisions when using system-provided shared versions of those libraries.

The author of the article is complaining when someone downstream overrides those decisions. If you're asking how many clusters one of the primary developers of Riak has run, you may not be reading closely enough.

regularfry 13 years ago | | |

Splitting a package into docs, libs and executables makes a lot of sense. Splitting those further, so you've got umpteen "independent" packages which 95% of users are just going to have to manually recombine to get the functionality the upstream package provides out of the box, can get pathological. Debian has historically been particularly bad at this, and Ubuntu inherited that tendency.

dagw 13 years ago | |

In principle I agree, however in my experience the time from me submitting a patch to L that I need for my new feature in P to work until that patch makes it into a stable release of L that packagers actually ship can be month.

In that time frame I'm stuck between not shipping a new version of P (often unacceptable as I've got users to answer to) or shipping my own slightly modified version of L.

jordigh 13 years ago |

The point of not using embedded libraries isn't about saving space. It's about not having several slightly different versions of the same bug spread out across several slightly different versions of the same library.

Saving space is just a nice side effect, so why not have that too?

The DLL hell problem doesn't exist in a GNU-based system because we have sonames. Windows and Mac OS X don't have those; instead, the software libraries there can't coordinate with each other harmoniously, so each program has to have all of its libraries packaged with its own set of bugs while making a hostile and rude gesture to the rest of the programs in the OS.

JoshTriplett 13 years ago |

"I know you have all these rules that try to make packages consistent so sysadmins don't have to give any extra thought to each individual one, but I'm a special snowflake that you should treat differently."

I care about having a system with hundreds or thousands of packages installed on it that all work consistently.

Linux is not OS X, and packages are not .dmg files; I want your package using the system version of libfoo, not your own fork of libfoo. If you have awesome changes to libfoo, then you should either get them into upstream libfoo, or go all the way and actually fork libfoo into libfooier upstream to allow packaging it separately.

dvanduzer 13 years ago | |

Reading your comment, I am starting to understand why everyone is so confused about this.

Linux and OS X both have the same underlying options for static or shared libraries. There is a large amount of "enterprise" software that is distributed just like a .dmg file.

There is plenty of middle ground between having everything dynamically linked and everything statically linked. The author of the article believes that packagers should trust developers to make good decisions. (Granted, there are plenty of bad developers and abandonware galore. Packagers are justified in stepping in to make new decisions here.)

mwcampbell 13 years ago | | |

As a sysadmin, which do you prefer: self-contained software distributions with all the dependencies included, or packages that use the host distro's package manager, use system versions of libraries wherever possible, and otherwise integrate well with the host system? The latter seems better to me, but it's an honest question, not rhetorical.

kiallmacinnes 13 years ago | |

This.

Very few people understand the importance, and benefits, when they have never seen anything other than .msi[1], .dmg, or worse, .zip.

[1]: Lets just pretend for a minute that Windows software only comes as plain MSI.. all the various .exe's which splatter stuff all over the disk simply doesn't exist.

dfc 13 years ago |

Look at this ubuntu erlang package, it depends on 40 other packages, as well. That isn’t even the worst of it, if you type ‘erl’ it tells you to install ‘erlang-base’, which only has a handful of dependencies, none of which are any of these erlang libraries!

That package is a dummy package that depends on erlang-base and the rest of the base erlang platform. You would have to force dpkg to ignore dependencies in order to install erlang without erlang-base. I would love to hear how that happened.

Splitting things up into multiple packages makes distributions easier to manage. One person can take the lead on package-dev while another person can take the lead on package-doc. Splitting things up into multiple smaller packages also makes distributing fixes a lot easier. With a one line fix to one include would you rather send out the entire erlang environment or just the small package that needed the fix?

And yes splitting things up to save storage requirements is most useful for resource constrained devices, not new servers/laptops. But it means that a user who is comfortable with Debian or Fedora on the server/desktop can use their same trusty OS on their next project when the device places serious restrictions on system overhead.

omaranto 13 years ago | |

You misunderstood the point the author made about erlang-base: it's not that he or she somehow installed Erlang without installing erlang-base, but rather that if in an Ubuntu system you try to run 'erl' before installing any Erlang packages at all, you receive a message telling you something like "to get the erl command install the package 'erlang-base'" and if you go and do that, you don't get the Erlang standard library! The point is that Ubuntu should either suggest 'erlang' instead, or not have all those separate tiny packages in the first place.

YokoZar 13 years ago | | |

Ahh, this is just a simple bug in the command-not-found package that makes those recommendations when you type in a missing binary, not an underlying problem with the entire philosophy of splitting pacakges!

dfc 13 years ago | | |

Nice catch. It did not occur to me that people would run a program without installing it, use command-not-found and/ignore the suggests list when installing a package. As another commenter pointed out this has nothing to do with splitting packages up. Its a toss up between user error or bug in c-n-f.

andrewflnr 13 years ago |

I'm going to go ahead and plug the Nix package manager here:

  Nix is a purely functional package manager. This means
  that it can ensure that an upgrade to one package cannot
  break others, that you can always roll back to previous
  version, that multiple versions of a package can coexist
  on the same system, and much more.

So you can all have your own versions of lager or whatever, and still have everything managed sort of nicely. Doesn't solve the "include the docs or not" problem, though. And I'm not sure if it does anything for tmoertel's patching concerns.

http://nixos.org/nix/

zaphar 13 years ago | |

Nix is possibly the only system I know of other than maybe homebrew, strangely enough, that has designed itself sanely.

It solves the security issue the space issue, and also the I need special patches for my version of this lib in my unique application.

nbouscal 13 years ago | |

I had never heard of this and it looks awesome, thanks for the link.

schmonz 13 years ago |

Upstream developers don't know best, either. Packagers sometimes make bad decisions, just like upstream does, because we're all people. "Install our software the way we think you should" is a point of view, but not a very smart one unless it's accompanied by a willingness to be persuaded otherwise. This particular upstream developer clearly hasn't seen an OS-agnostic cross-platform package manager like pkgsrc, where one of the packager's tasks is often to make software more portable than upstream cares to bother with. To take one obvious example, we make sure libtool works on all our supported platforms, and then we make sure software with its own precious all-the-world's-a-Linux way of linking shlibs uses libtool instead. Do we try to feed back our portability fixes upstream? Of course. Does upstream always want them? Of course not. Are they wrong to not care? We sometimes think so. Are we wrong to patch their code? They sometimes think so. They have their goals, we have ours. If anyone reliably knows best about anything, it's users.

m-r-a-m 13 years ago |

My favorite interesting packaging choice is TeX Live in Fedora 18 [1]. There are about 4500 texlive-* packages (out of around 35000 binary packages in Fedora total). The packagers split up the packages based on upstream metadata.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=949626

bcl 13 years ago | |

s/interesting/insane/

andor 13 years ago | | |

As a TeX user, I find this extremely useful. The Fedora packages map 1:1 to Texlive packages. There is no need to research if a LaTeX package is available and in which Fedora package it is hidden, you can just install "tex-packagename".

m-r-a-m 13 years ago | | |

It does sound insane to have so many packages, but it also follows Fedora's policy of staying as close to upstream as possible. The entire package building process is automated since TeX Live provides the packaging metadata with the source.

They also include the meta packages [1] so you don't have to install every package individually if disk space is not an issue.

[1] https://fedoraproject.org/wiki/Features/TeXLive#Benefit_to_F...

homeomorphic 13 years ago | | |

Why is this insane? Seriously, what problems does it cause, hypothetical or otherwise?

ochs 13 years ago |

The solution is easy: if you fork a project and it becomes incompatible with the upstream, rename it. How is anyone supposed to discriminate between the two versions if they have the same name?

Also, I'd say, if your software needs lots of modified dependencies, you're not communicating with those projects properly.

If every single project were to fork every one of their dependencies, the result would be maintenance nightmare.

Nux 13 years ago |

There's also the security factor (that many devs today like to ignore); using shared stuff simplifies it. Maybe packagers don't know best, but neither does this guy.

kevingadd 13 years ago | |

You say this guy doesn't know better, but given that he's talking about shipping a security sensitive application that relies on custom tuned, tested forks of libraries, how can you say that he's wrong for not wanting his library fork replaced with some arbitrary version on an end-user's machine? How can that possibly be safer?

It's certainly nice to be able to take an existing library an app depends on, patch it to fix a security hole, and drop that in. But that isn't what's happening in this context...

binarycrusader 13 years ago | | |

So the developer wants to reduce their cost of properly engineering and documenting their application's usage of a particular library in exchange for significantly increasing the costs of rebuilding and updating every software package that uses the same libraries onto the OS developer and their customers?

tehwalrus 13 years ago |

I have two projects which follow different ideas on this (mostly due to size).

In both cases, I've basically written "lazy python bindings" for something in C++ (lazy because I only support the features I want in pythonland). Neither of the C++ projects is on github or anything, they're just hosted out there somewhere else (one on SVN, and one only available as archives, I think.)

In the archive case, and since the codebase is small, I just included the whole codebase in my git repo, and added a few small cpp, pyx and py files around it. This library already has a fork, and has the most stars (like, 3) of all my github repos - embedding all the required code and statically linking (indeed, compiling) it as part of my `setup.py` works great, and is easy for 3rd party users too.

In the SVN case, the main project is huge, like a few hundred MB of source (and they use some crazy code generation, so that's not even the half of it.) It also comes with its own very very basic python driver. So, my approach is to give people two or three small patches, build instructions (the project is a nightmare to build correctly,) and then my python code just installs on its own and talks to the project as a normal python library. This version is useless - it's permanently out of date, I can't even get the build instructions I wrote 3 months ago to work when I'm trying to set it up for someone else, and the whole thing is a massive nightmare. If I'd forked it and provided the huge source tree myself, that would be reduced - but that project is also under active development and it'd be great to actually use their latest, least buggy version!

Each of these decisions was made the way it was for real, sensible reasons - I'd hate for a package manager to have to contend with the mess of the second project, and yet apparently that's the way they'd prefer to go with both!

Good job no one needs to use any of my code, really.

binarycrusader 13 years ago |

While I sympathise with some of the complaints the developer has, the idea that every software component should live as an isolated stack that duplicates its entire set of dependencies is misguided.

OS administrators want a maintainable, supportable system that minimises the number of security vulnerabilities they're exposed to and packages software in a consistent fashion. They also want deterministic, repeatable results across systems when performing installations or updates.

Likewise, keeping various components from loading multiple copies of the same libraries in memory saves memory, which helps the overall performance of the system.

Also, statements like this aren't particularly helpful and are factually inaccurate:

  So package maintainers, I know you have your particular
  package manager’s bible codified in 1992 by some grand
  old hacker beard, and that’s cool. However, that was
  twenty years ago, software has changed, hardware has
  changed and maybe it is time to think about these choices
  again. At least grant us, the developers of the software,
  the benefit of the doubt. We know how our software works
  and how it should be packaged. Honest.

Some packaging systems are actually fairly new (< 10 years old), and the rules determined for packaging software with that system have actually been determined in the last five years, not twenty years ago as the author claims. Nor are the people working on them grand, old, bearded hackers.

OS designers are tasked with providing administrators and the users of the administrated systems with an integrated stack of components tailored and optimised for that OS platform. So developers, by definition, are generally not the ones that know how to best package their software for a given platform.

As for documentation not being installed by default? Many people would be surprised at how many administrators care a great deal about not having to install the documentation, header files, or unused locale support on their systems.

Every software project has its own view of how its software should be packaged, and while many OS vendors try to respect that, consistency is key to supportability and satisfaction for administrators.

So, in summary:

* preventing shipping duplicate versions of dependencies can significantly reduce:

- maintenance costs (packaging isn't free)

- support provision costs (think technical support)

- potential exposure to security vulnerabilities

- disk space usage (which does actually matter on high multi-tenancy systems)

- downtime (less to download and install during updates means system is up and running faster)

- potential memory usage (important for multi-tenancy environments or virtualised systems)

* administrators expect software to be packaged consistently regardless of the component being packaged

* some distributors make packaging choices due to lack of functionality in their packaging system (e.g. -dev and -doc packaging splits)

* administrators actually really care about having unused components on their systems, whether that's header files, documentation, or locales

* in high multi-tenancy environments (think virtualisation), a 100MB of documentation doesn't sound like much, until you realise that 10 tenants mean 10 copies of docs which is a wasted gigabyte; then consider thousands of virtualised hosts on the same system and now it's suddenly a bit more important

* stability and compatibility guarantees may require certain choices that developer may not agree with

* supportability requirements may cause differences in build choices developers do not agree with (e.g. compiling with -fno-omit-frame-pointer to guarantee useful core files at minor cost in perf. for 32-bit)

I'd like to see the author post a more reasoned blog entry with specific technical concerns that are actually addressable.

mkhattab 13 years ago |

I love that the OP brought up FreeSWITCH because this is one example where I believe it's most troubling for package maintainers, software engineers and system implementers alike. From a software engineer's perspective, including 3rd party libraries in one source tree it transfers the burden of maintenance and support to one project maintainer. Not reinventing the wheel is good and all, but you still have to maintain its integrity.

From a package maintainer's perspective, especially in the case of Debian, they must ensure that packages are stable and secure. It's their job to make sure security updates are released. In the case of FreeSWITCH, there's no distinction between the main source and its dependencies. Package maintainers might as well not bother with including software like FreeSWITCH in their repos or risk the integrity of their system.

System implementer's are mostly ambivalent about these issues until their distro's FreeSWITCH package includes broken dependencies or until their FreeSWITCH installation has a security exploit due to a library that can't be patched independently.

I love FreeSWITCH but I'm sorry to say that it's poorly architected. However, I'm a system implementer, so I don't care.

mey 13 years ago |

Along the same issue, see Debian (and as a result Ubuntu) and Ruby Gems. Used to drive me up the wall (until I stopped bothering).

claudius 13 years ago | |

Yeah, these Ruby guys releasing incompatible versions every other weekend and expecting to be allowed to blurt their stuff all over the system was rather strange. Good thing Debian provided decent packages.

mwcampbell 13 years ago |

Having read the arguments on this thread, and having seen the pathologies of a single mega-repository of packages as in Debian (e.g. long release cycles, breaking stability policies for the major web browsers), I think that Ian Murdock's former company Progeny was on the right track with its component-based Debian derivative. As I remember it, the idea was to have a small base system, then have separate components for things like GNOME, Firefox, OpenOffice.org (now LibreOffice), etc.

Meanwhile, Ubuntu's split between main and universe/multiverse is a pretty good compromise. I wouldn't be disappointed if Ubuntu jettisoned universe and multiverse, the better to focus on having a solid main repository, and let a thousand small, focused repositories pick up the slack. As long as all of those repositories leave the packages in main alone, as EPEL does with Red Hat-based systems.

malkia 13 years ago |

There is something to be said about the API's themselves. For example sqlite is backwards compatible (interface-wise), but then my recent worst example was the perforce (p4) client library. It uses C++ and the folks keep changing member variables in the exposed interfaces forcing us to recompile.

spullara 13 years ago |

The real issue with bundling software is that you can't pull in a security patch. You actually have the same issue with internal packages at large companies. If you can stay on the current release you can drastically reduce the effect of security bugs.

mwcampbell 13 years ago |

I wonder which packagers the author is griping about primarily. I don't see Riak in Debian.

jeremiahjordan 13 years ago |

This is the reason our company creates their own packages and runs our own repositories.

grigio 13 years ago |

i think the right approch is packages for the operative system layer and bundles for the applications.

it isnt very smart that a user must be root to install a GUI

MostAwesomeDude 13 years ago |

If only Erlang had versioning in modules like other languages do. Modules are hard, most languages get them wrong, and this should be fixed, but you shouldn't blame packagers.

mononcqc 13 years ago | |

Erlang applications get versions, and you can specify them when you build releases, which are the Erlang's way of packaging self-executables, through a mechanism that is platform independent.

The thing is, Erlang assumes that things work from said releases, and find the newest available applications in their library path. This makes sense because it is entirely possible for an Erlang application that was upgraded without ever being shut down to want to roll back to older versions.

When this happens, this application has a path with all the libraries and dependencies it ever needed and can rollback to an older one (without shutting down), or start fresh from the newest one automatically.

Other metadata may be added by each release as required.

The thing is that Erlang developers who are experienced and will write and ship products and Erlang will know this and try to build releases and packages that respect this. Then package managers will (often) undo it to fit whatever pattern they have in mind. They did it, for example, with Ubuntu, removing one of the test frameworks that is part of the standard library and setting it in a different package.

Users who tried the language for the first time couldn't run things that depended on the standard library because it was separated in many different packages.

nosequel 13 years ago | |

You are missing the point. Even if erlang did support versioning of modules the problem would still exist. Package maintainers arbitrarily break things up because they immediately see a dependency and think it needs to be a separate package. They do this completely ignoring the big picture of shipping solid / tested code.

asuffield 13 years ago | | |

On the contrary, the people maintaining packages in the distribution are firmly on the side of shipping solid, tested code. That hacked up duplicate of a library that you copied into your source tree? It does not have one hundredth of the testing that has been applied to the version of the library which every other package on the system uses. You have to think about the system as a whole, not assume it is a bootloader for one application.

mst 13 years ago | |

Erlang releases appear to be their solution to this.

I fail to understand how, exactly, packagers' choice to not use the release is erlang's (or riak's) fault.