Ask HN: What open source project, in your opinion, has the highest code quality? |
Ask HN: What open source project, in your opinion, has the highest code quality? |
Here's bitcoind: https://www.cvedetails.com/product/22744/Bitcoin-Bitcoind.ht...
[Can't speak for the 'highest' part of the qn, but Gensim upholds very high code quality standards]
https://github.com/withspectrum/spectrum Https://spectrum.chat
It is not yet on Github.
http://pari.math.u-bordeaux.fr/git/pari.git
I prefer the early versions, before it was softened up for the vulgo.
https://boringssl.googlesource.com/boringssl
If some portion of the library is overly complex, look into the use case and delete it wherever possible. It maintains a long-term bound on code complexity, which I quite like.
Edit: a nice explanation on the design philosophy here https://www.imperialviolet.org/2015/10/17/boringssl.html
1. Elegant structure 2. Strict code style 3. Project size is not too large 4. Have detailed documentation
The only thing I can say is that with this in mind it's actually a lot better than I'd expect - testament to Linus's iron fist, perhaps.
Software metric: https://en.wikipedia.org/wiki/Software_metric
''' Common software measurements include:
- Balanced scorecard - Bugs per line of code - Code coverage - Cohesion - Comment density[1] - Connascent software components - Constructive Cost Model - Coupling - Cyclomatic complexity (McCabe's complexity) - DSQI (design structure quality index) - Function Points and Automated Function Points, an Object Management Group standard[2] - Halstead Complexity - Instruction path length - Maintainability index - Number of classes and interfaces[citation needed] - Number of lines of code - Number of lines of customer requirements[citation needed] - Program execution time - Program load time - Program size (binary) - Weighted Micro Function Points - CISQ automated quality characteristics measures '''
Category:Software metrics https://en.wikipedia.org/wiki/Category:Software_metrics
in most cases "Hello World" is open-source, but I still don't know if can be named "project"
What does it even hold as a value to be the project of the highest code quality in the world ? How can it exist as a consensus if we can't even agree on best practices ?
If it's for learning purposes, why even look for the ONE project with the HIGHEST quality ? Just go by any GOOD ENOUGH project.
I see this all the time: what's the best editor, the best color scheme, the best font, etc.
How about we just start saying: what's a good enough X for my purpose ?
I'd actually considered making a similar comment on seing the question.
This is great for open source, because you can easily discover and navigate to the part you want, and change it. You might need to understand the plugin interface - or you might not. This flat architecture makes it easy for people to contribute, an important aspect of a successful open source project.
But it's not the ideal architecture for every project. In some cases, a cleverer, harder to understand approach is more elegant, shorter, more efficient, simpler.
Of course... one might argue that ease of understanding is more important than anything else.
i was both amazed by the simplicity of the architecture (a huge single event loop), and the attention to code presentation and indentation.
Twisted is more timeless, more patterned, and more self-aware.
I can imagine Twisted's asyncio reactor becoming its default (and the Twisted flow control slowly declining in importance), but Twisted's protocols, control structures, and execution models becoming more popular.
Twisted has undergone a great resurgence in quality engineering since asyncio became more viable - this was surprising to me, but is actually probably reasonably consistent with the way the historical influence of the standard library.
Overall, I think that Twisted is a great project; I almost always reach for it when my python codebase becomes mature enough to need more thoughtful abstractions around network I/O.
For C: Process Hacker and some similar code that is designed like and written around Windows kernel APIs: https://github.com/processhacker/processhacker/blob/master/p...
For C++: Some of the Boost code, and stuff like it, such as P-Stade Oven: https://github.com/himura/p-stade/blob/master/pstade/pstade/...
For others: (need to look later, I forget)
const createStore = (reducer) => {
let state;
let listeners = [];
const getState = () => state;
const dispatch = (action) => {
state = reducer(state, action);
listeners.map(listener => listener());
};
const subscribe = (listener) => {
listeners.push(listener);
// unsubscribe
return () => {
listeners = listeners.filter(l => l !== listener)
};
};
// populate initial state
dispatch({});
return { getState, dispatch, subscribe };
};
[0]: https://egghead.io/lessons/react-redux-implementing-store-fr...https://github.com/google/leveldb/blob/master/table/table_bu...
https://github.com/ProseMirror/
It's not typical js, but very good none the less
Therefore the highest code quality is likely to be in projects where I do not have to go under the hood, e.g. the Chromium project where all contributors are vastly more educated and capable than myself.
Anyhow, while we await the publication of that book, John has been working at bloomberg. some of the code written there has been published to github[1]. He's also done a five hour lecture series [2] available on safari-online (paid service) that cover the topics of his book, and introduce the open source bloomberg repo as an example of code written in that style.
I can't offer you a review as I've just found this all myself, but I'll be eagerly studying it along with some of the other items mentioned here.
[1] https://github.com/bloomberg/bde [2]https://www.safaribooksonline.com/videos/large-scale-c-livel...
Best source code layout, architecture, maintainability: https://github.com/rg3/youtube-dl
I'd also say GHC is quite good.
And Pandoc as well.
I don't think I can compute enough variables to consider the "highest" though... so the aforementioned are only examples of what I think are good.
I hold Redis codebase as an example of what good C code should be. On the other hand opencv codebase as an example of what C could should not be. Opencv codebase is really inconsistent with quite a bit of unreadable spaghetti sauce.
While I think it has some clear deficiencies, I found a lot of e.g. the optimization passes in GCC a lot easier to read. It's probably above par, but e.g. https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-ssa... is really well explained imo.
Analyzer TheAnalyzer;
but more commonly:
Analyzer A;
with A being utterly unhelpful to read many lines later.
I have heard good things about sqlite, and some day, I plan to read it :-)
That seems like it'd be terrible to try to get running reliably.
e.g. The most recent https://dolphin-emu.org/blog/2018/09/01/dolphin-progress-rep...
I consider the Python stdlib in a similar vein as the C++ stdlib or Boost: Yes, some useful bits in there, but (1) lots of rot (2) you don't want to have your code look anything like it.
The authors have good quality repos :
Hands down Symfony.
TeX (plain TeX, not LaTeX) has phenomenally good logging and error messages IMO — everything you need is there, each error message comes in a “formal” and “informal” form and points you to exactly the place the error happened, and TeX lets you fix things on-the-fly without restarting the program. All this of course assumes you use TeX the way it is described in the manual (The TeXbook). The experience is opposite with LaTeX, so I find it worth giving up all the convenience of LaTeX just for the wonderful experience with TeX.
As for “the TeX language”, there is no such thing. As Knuth has said many times, TeX is designed for typesetting, not programming. Sure it has macros to save some typing, but if you're writing elaborate programs in it (as is nearly inevitable if you're using LaTeX) you're doing something wrong. Knuth said:
> When I put in the calculation of prime numbers into the TeX manual I was not thinking of this as the way to use TeX. I was thinking, “Oh, by the way, look at this: dogs can stand on their hind legs and TeX can calculate prime numbers.”
But of course LaTeX does every such thing imaginable :-)
More on TeX not being a programming language: https://cstheory.stackexchange.com/a/40282/115
On the TeX error experience: https://news.ycombinator.com/item?id=15734980
https://github.com/appleseedhq/appleseed
Even though I can't code C++, I can read it here and understand most of it (besides the maths).
Very well organized code and it feels like they got the project off the ground, fixed bugs for a few months, and now have largely trailed off from maintaining it largely because it just works (I use it) which lends some credibility to their coding style. Of course, I'd like to see the project evolve conceptually, but, right now it does what it says it does reliably for a project that hasn't even cut a single release.
https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf....
ah yes, good quality code
Also big fan of Sidekiq for similar reasons.
..."virgin" TeX...knows just primitive commands, no macros. Plain TeX is the set of macros (developed by Knuth) which makes TeX usable in everyday life of a typist. ... The available commands can be classified into primitive commands and macros. ... The "virgin" TeX knows only the primitive commands. ... Formats (plain TeX, LaTeX, etc.) extend TeX's vocabulary by defining macros. ...For example, plain TeX defines macros \item, \rm, \newdimen, \loop, etc. Plain TeX defines about 600 macros.
https://tex.stackexchange.com/questions/97520/what-is-plain-...
Knuth wrote both the TeX program and the “plain” set of macros; when you start `tex` it is with `plain` that it starts up, and The TeXbook describes both the TeX program and the plain format without being careful to distinguish what comes from where (you have to look at Appendix B to see the proper definition of plain.tex), so when we speak of TeX as Knuth intended/imagined it to be used, it is plain TeX that is meant.
At the same time, the apparent simplicity should not be mistaken for lack of effort; on the contrary, I feel every line oozes with purpose, practicality, and to-the-point-ness, like a well sharpened knife, or a great piece of art where it's not about that you cannot add more, but that you cannot remove more.
Kubernetes is another great example of a project that is so unbelievably complex in its function, it should be completely impenetrable to anyone who isn't a language expert. But, go check it out; its certainly complex and huge, but actually grokable.
[1]: https://medium.com/@arschles/go-experience-report-generics-i...
In short, your comment is wrong from beginning to end. What led you to believe that anything in it was true?
Reading this brought to mind the JDK. All well structured, neatly formatted and well documented. I’ll often just click thru to the source to get the nitty-gritty on a function, I rarely need to consult the actual docs!
For me, code readability is such a high value that, on these grounds alone, I oppose the introduction of generics and hope the current proposals ultimately fail.
and for this reason alone!
https://www.sqlite.org/testing.html
As of version 3.23.0 (2018-04-02), the SQLite library consists of approximately
128.9 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in
other words, lines of code excluding blank lines and comments.)
By comparison, the project has 711 times as much test code and test scripts -
91772.0 KSLOC.Why? I was able to do substantial changes to the kernel when I was a teenager (late 90s), mostly on my first try. There was no giant wall of abstraction I had to climb over or some huge swath of mutually interacting code I had to comprehend. There was also nothing that required fancy code navigation and the creation of something like the ctags database in order to find out what on earth was happening.
No action at a distance or lasagna style dereferencing or mysterious type names that are just typedef'd and #define'd around dozens of times back to something basic like char. No fancy obscure GNU preprocessor extensions or exotic programming patterns.
Nothing had obtuse documentation that tried my patience or required much more than enthusiasm and basic C knowledge.
I did things like got a wireless card working from code written for one with a similar chipset and got various other things like the IrDA transmitter on my laptop at the time to do a slattach and thus work as a primitive wireless network - all in the late 90s.
I likely had no idea what, say, the difference between network byte order and host byte order was at the time or how the 802.11b protocol worked or what a radiotap header was or any of that. The separation of concerns was so good however, that none of that knowledge was actually needed.
Compare that to say, the Qualcomm compatible WWAN I just dealt with over the past few weeks where I needed to have in-depth knowledge of an exhaustive number of things (very specific chipset and network details) to get a basic ipv4 address working. Then I needed to read up on GNSS technology and NMEA data to debug codes over USBmon to get the GPS from the wwan working. Then after I had the qmi kernel modules doing what I wanted and the qmi userland toolsets, I had to write some python scripts to talk to dbus to get the data from the modemmanager that I needed in order to log the GPS. All the maintainers of these pieces were very nice and helpful and I have nothing negative to say. This is just how it usually is these days.
Back then however, I wasn't a good programmer, I was likely pretty terrible in fact but with the NetBSD codebase I was able to knockout whatever I wanted every time, fast, on a 486.
I miss those days.
The performance reached by the game was considered impossible until Carmack did show us otherwise. So I expected lots of ASM and weird hacks, especially as compiler optimization wasnt as good as it is today.
Surprise, surprise, the thing was easy to read, easy to get going, easy to port, reasonablye documented . It has shown me what a goog balance between nice code and usable code is.
If you want tho browse: https://github.com/id-Software/DOOM
https://www.merriam-webster.com/dictionary/surprise,%20surpr...
I can speak English! I learn it from a book!
Though not always very good. Thanks for the bugfix. Many thanks to Bernd Kreimeier for taking the time to clean up the
project and make sure that it actually works. Projects tends to rot if
you leave it alone for a few years, and it takes effort for someone to
deal with it again.On the more serous side, i wanted to say something about the TODOs as example of the balance, but couldnt find any. I thought i was confusing with quake, but the cleanup might explain it better.
+1
Thanks for uploading RCP100. Your comment is a timely one. I wanted to learn how a router works and is built and was looking for a simpler implementation.
Can you recommend any resources from which I could learn more about network programming, so that I could understand RCP100 code better?
Thanks!
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
It is quite clean when you consider the task that it accomplishes.
Being able to compile across multiple architectures/endian-ness,32/64-bit/scale up/down from server/desktop/router/phone while accepting contributions from thousands of people..
Want a clean kernel, go look at the BSDs.
(Sadly, most of those good kernel books were written in the 90's and early 2000's. I don't know if there are any recent kernel hacking books.)
- The tooling is excellent.
- The code is well-documented and readable.
- The core team committed to never needing to introduce breaking changes.
The Elixir community tends to produce work that is actually considered "Done". An elixir package is not stale when it hasn't seen a commit in a few months. Instead, the feeling is: "It's feature complete and only needs maintenance from here on out."
Is this why Elixir seems to have many different ways of doing the same thing though?
For example their Cross Validation documentation is amazing:
http://scikit-learn.org/stable/modules/cross_validation.html...
Often codebases written in C are a a mess to understand, a mess to read. The Redis Source Code is understandable even without deep knowledge of C
Came here to say exactly this - Redis is very cleanly written.
C: Redis, SQLite, LUA.
Java: Joda Time, Guava
After struggling with JVM stdlib time nonsense, JodaTime was a breath of fresh air and actually made programming with time fun.
[1] https://github.com/ARMmbed/mbedtls
However, I opt for jQuery here. It is one of the greatest examples of how constant refactoring and thoughful usage of design pattern get you a very long way.
If you are designing JavaScript libraries, pls have a look at jQuery. So many great design decisions aka great code quality.
The quality of the code is amazing, it's simple to use and even simpler to look through the docs to reason about.
I also want to praise the author of the library (Jeremy Evans), his support through the IRC is second to none, you can talk directly with him pretty much on a daily basis.
And even after 8+ years, the project is still constantly being updated (last commit 4 days ago). I haven't seen too many project of this calibre especially when it is ran mostly by a single person.
But as said, I did not really checked this claim for validness myself...
But anyways, I was talking about the Julia Base library and its numerical routines. I just look at the Julia code and don't touch the build systems.
One look at any of the assembly files and you can get a sense of how properly organized the source code is.
djb is a legend
1. Don't simply list projects.
2. Give some notion of why you're nominating code.
3. A sense of what you consider to be quality.
Enough to spark discussion, inquiry, or comparison. Doesn't have to be much.
This is rudimentay. But affords purchase; https://news.ycombinator.com/item?id=18037815
This does not: https://news.ycombinator.com/item?id=18038047
(Both reference the same project.)
Put another way, there's more to a product that's easy and sensible to work with than code quality.
Projects written in C require a fair amount of care and discipline to be scaled up to larger codebases and teams. PostgreSQL is such a codebase.
I've also seen various parts of Spring's codebase and found all of it to be consistently solid and careful. They take a lot of care to structure carefully and comment immaculately.
Disclosure: I work for Pivotal, which sponsors Spring. Which is why Spring is highly visible in my working life.
Even though it’s a fairly complex transpiler, the authors did a good job modularizing and leaving lots of contextual comments on what each part does.
Also typescript baseline tests are a simple but very effective way to get lots of coverage on the compiler.
I’ve read source code for Babel, typescript, coffeescript and flow. Typescript architecture stands out.
Typescript not only does fascinating things like magical code completion abilities and great tooling for IDEs but their codebase has been an inspiration for me to build better front end code.
I may be a bit biased since I’ve worked at Microsoft before.
Testing code is code which needs to be written, read, maintained, refactored. Very often nowadays I have to wade through tests which test nothing useful, except syntax. Even worse, with developers who adopt the mock-everything approach, I often find tests which only verify that the implementation is exactly the one they wrote, which is even worse: it makes refactoring a pain, because, even if you rewrote a method in a better way which produces exactly the results you wanted, the test will fail.
So, the ratio of testing code vs implementation code is a completely wrong proxy for code quality.
EDIT: I'm not criticising SQLite and their code quality - which I never studie - but the idea that you can judge code quality for a project just by the ratio of test code vs implementation code.
Dr. Hipp said he started really following it when Android came out and included SQLite and suddenly there were 200M mobile SQLite users finding edge cases: https://youtu.be/Jib2AmRb_rk?t=3413
Lightly edited transcript here:
> It made a huge difference. That that was when Android was just kicking off. In fact Android might not have been publicly announced, but we had been called in to help with getting Android going with SQLite. [Actually], they had been publicly announced and there were a bunch of Android phones out and we were getting flooded with problems coming in from Android.
> I mean it worked great in the lab it worked great in all the testing and then [...] you give it to 200 million people and let them start clicking on their phone all day and suddenly bugs come up. And this is a big problem for us.
> So I started doing following this DO-178b process and it took a good solid year to get us there. Good solid year of 12 hour days, six days a week, I mean we really really pushed but we got it there. And you know, once we got SQLite to the point where it was at that DO-178b level, standard, we still get bugs but you know they're very manageable. They're infrequent and they don't affect nearly as many people.
> So it's been a huge huge thing. If you're writing an application deal ones, you know a website, a DO-178b/a is way overkill, okay? It's just because it's very expensive and very time-consuming, but if you're running an infrastructure thing like SQL, it's the only way to do it.
[0]: https://youtu.be/Jib2AmRb_rk?t=677 "SQLite: The Database at the Edge of the Network with Dr. Richard Hipp"
One suite I'm particularly impressed with will run tests from zero bytes with slowly increasing available memory until the program passes. The tests verify that at no point the DB is corrupted by an OOM event.
Some of the test code I've encountered recently has been more voluminous, complex and has taken more man hours to develop and maintain than the application or library it's assigned to.
For the love of God, develop the damned software! It's either going to work or it's not.
https://www.sqlite.org/malloc.html
"SQLite can be configured so that, subject to certain usage constraints detailed below, it is guaranteed to never fail a memory allocation or fragment the heap."
Is there a distinction between the best codebase and the best test suite? Probably.
Again I mostly admire OpenBSD. But OpenNTPD is not the best example of their work.
Maybe I'll check that at home (where I replaced FreeBSD's ntpd with OpenNTPd).
Once had to make some changes to OpenSSH for an internal project and it was surprisingly easy to find the relevant code and make the necessary changes. One of the few times my code worked on the first compile.
https://www.openbsd.org/faq/pf/perf.html
Ah, I see you've also looked at the Linux kernel code.
I don't really use it these days because I need systems that future cheap devs can maintain and once you enter userland it takes commitment and time I simply don't have to stay with netbsd.
Debian permits me to usually not have to care and that's pretty invaluable
To expound on that, designing for testability allows you to sidestep the need for mocks almost entirely, and forces you into easier, more reliable and more consistent code. Then when you choose to test it, the tests are simple, straightforward and valuable.
There are better NTP implementations now. Chrony is great, it's the default in Ubuntu now. NTPsec is coming along although I haven't tried to use it myself. Also good ol' ntpd is greatly improved.
As a heuristic the code versus test-code ratio serves well as an indicator of quality. Just like consistent indentation does. You don't know whether a well-indented program is good. But if the indentation is inconsistent you'll expect worse.
Essentially, git is designed for the "bazaar" development model, while Fossil is designed for the "cathedral" one, which is what SQLite uses, being developed by just three guys working very closely together.
But that is perfectly normal when developing quality code.
There is no rule that says that test code development should take LESS time.
Certainly different applications have different quality requirements. Perhaps the software you are developing doesn't have that high requirements?
I just don't have a high tolerance for needless complexity and gee-whiz-look-what-I-can-do while the clock is running.
:
A single "noop" in a 755 file.A C true would be: https://cvsweb.openbsd.org/src/usr.bin/true/true.c?rev=1.1&c...
Here's a much faster true(1) if you need it: https://github.com/coreutils/coreutils/blob/master/src/true....
I don’t see how those examples are relevant. Why would that last one be faster?
I agree that the OpenBSD code here is good, no more and no less than needed.
I assumed the grandparent was referring to cases where an O(n) algorithm is used where it might be O(log n) or O(1) with just a little more effort. It’s a tradeoff, sure, and in some cases linear searches can work surprisingly well, but in general I think this kind of thing should always be considered in good code.
Micro-optimizations like inline assembly for inner loops may or may not be a good idea, depending on the application. All else being equal, I’d certainly agree that good clean code would not use assembly.
I would expect the openbsd true to be the fastest, it doesn't need to spawn a subshell and it doesn't do more than the posix specification requires (afaik --help/--version should be ignored).
They say in their site that:
> Airbus confirms that SQLite is being used in the flight software for the A350 XWB family of aircraft.
Flight software does not imply safety critical parts of avionics. It can be the entertainment system or some logging that is not critical.
While all that was happening 10+ years ago, I learned about DO-178B. I have a copy of the DO-178B spec within arms reach. And I found that, unlike most other "quality" standards I have encountered, DO-178B is actually useful for improving quality.
I originally developed the TH3 test suite for SQLite with the idea that I could sell it to companies interested in using SQLite in safety-critical applications, and thereby help pay for the open-source side of SQLite. That plan didn't work out as nobody ever bought it. But TH3 and the discipline of 100% MC/DC testing was and continues to be enormously helpful in keeping bugs out of SQLite, and so TH3 and all the other DO-178B-inspired testing and refactoring of SQLite has turned out to be well worth the thousands of hours of effort invested.
The SQLite project is not 100% DO-178B compliant. We have gotten slack on some of the more mundane paperwork aspects. Also, we aggressively optimize the SQLite code base for performance, whereas in a real safety-critical application the focus would be on extreme simplicity at the cost of reduced performance.
However, if some company does call us tomorrow and says that they want to purchase a complete set of DO-178B/C Level-A certification artifacts from us, I think we could deliver that with a few months of focused effort.
Hipp's Hwaci consulting company would probably help to do the work, but it has no relation to the SQLite as a library.
I didn't know that, and that's very cool.
Makes me think that the name SQLite is misnomer.
Recently, it was also shown that Timsort doesn't optimally use the information it has about runs. As an alternative, powersort was proposed, which seems to outperform Timsort both on randomly ordered inputs as well as inputs with long runs: https://arxiv.org/pdf/1805.04154.pdf
Didn't one of the most simple algorithms, binary search, suffered of a bug in a standard library (was it Java?) a few years ago? If IIRC it was a corner case, I should check it because I don't recall the details, but it looked robust code.
Edit: I think it's this one https://ai.googleblog.com/2006/06/extra-extra-read-all-about...
Edit: The article linked in the other comment says the Java dev team didn't even bother to implement the "proper" fix, but merely adjusted how much space is allocated.
time { for i in $(seq 1 10000); do /path/to/true; done; }