Sophia – An embeddable key-value database(sphia.org) |
Sophia – An embeddable key-value database(sphia.org) |
Sophia is a modern [should add comma] embeddable key-value database designed for a high [should hyphenate] load environment.
It has a unique architecture that was born as a result of research and rethinking primary alghorithmical [sic, should say "algorithm"] constraints associated with a [sic] getting popular Log-file based data structures, such as LSM-tree [should say "trees"], it's [sic] variations based on Fractional Cascading ideas and a B-Tree. (see architecture) [run-on, meaning unclear]
It is very fast. (see benchmarks)
it [sic] is easy to use. (see documentation)
Implemented as a small C [should probably hyphenate, or just rewrite] written, BSD [should probably hyphenate] licensed library.
English is not an easy second language to learn as an adult, and technical English doubly so.
To cite just one example, redis is widely considered excellent software, but much of its documentation is written in a very imperfect English.
в следующий раз (кстати с заглавной буквы можно начать) перед тем как насилывать свою клаву, лучше пойди на подкурсы русского или китайского или хотя бы того же английского.
очень не сложно оставлять мусорные комментарии, не относящиеся кстати к теме разговора, и называть себя викинг кодером. гораздо сложнее соответствовать своему имени, и "викинговать" архитектуру, код, мысли, философию создания.
если бы в интернете и/или на хакер ньюз была бы общая корзина для мусора, и можно было бы голосовать за вещи которые туда отправляются, можешь расчитывать на мой голос.
hey VikingCoder, I heard you are a fan of grammar? see above? grammar that.
I know my spoken and written english is far from perfect, but I can live with it.
"Algorithm" isn't an adjective. I accept "algorithmical", although it should probably be "algorithmic".
document.getElementById("main").style.fontFamily = "sans-serif";What really concerns me is why never in benchmarks they perform on already filled database (like, 14G, 28G, 60G)? Because "add 100k random keys into an empty database" is very different from "add 100k random keys into a large database". And that is where more novel algorythms start to shine.
Yes, read speed of leveldb (and, i assume, sophia) with its fancy sst's is lower than of plain old b-trees or hashtables (kctree/kchash), but it is still high enough for most tasks. Write performance of kc* (and btree-based libraries in general) is, however, unacceptable, at least on hard drives, and even with a reasonable-sized database (~90% of RAM) it degrades to a couple of random write per IOPs (so, 200-300 writes per second on a consumer-grade HDD, or up to 1000 on a 2x10k sas hdd in raid-0, if i remember correctly)
It may be reasonable to use kc* on SSD, but i did not test that.
https://groups.google.com/forum/#!topic/h2-database/Q8K-nbCh...
Concurrency:
- No mention of it. There appear to be spin locks in the source. No multi-threaded tests.
Stability and data safety:
- Github has 2 days of history, and 4kLoC of test code. Why should I trust my data to you?
"high load environment":
- So what exactly does it do in a "high load environment"? How do you define "high load" in the first place? CPU load? I/O load from other processes? What shortcomings of the competition under a "high load environment" are you trying overcome?
Backup:
- How do I do hot backup?
Benchmark:
- LevelDB is not a fair comparison as it offers additional non-trivial functionality (snapshots) that cannot be built up on top of Sophia. LevelDB APIs are also safe for concurrent use, which adds overhead. Kyoto Cabinet would have been more suitable as a peer to benchmark with.
- 3 million records with 16-byte keys and 100-byte values is not really an interesting benchmark dataset.
- Iteration over a static database is not interesting, either. Is there any alternative other than locking an entire mutating database for the duration of iteration?
This is a non-trivial effort, and all people do is to complain about the font face or if the punctuation was right?
First, in the benchmarks it just crush leveldb, this is already by itself a great achievement. can you confront the benchmarks? you do it one yourself with a different configuration? no?
Second, if you are not a database expert and can create proper critics (constructive or not), just keep it to yourself.. i wonder how so many people get up with all of this conclusions so fast, without a proper look at the source code and to have a reasonable amount of time to know what are they talking about.
its very hard to create things like this, but very easy to critisize without any background.. dont forget about it
if you have something to say about a small thing, that do not have a direct relation to the product or thing itself, if theres already one comment about it, that enough! do not spam, answering it, or creating new comments about it, this is just so rude and unrespectful..
really, things are getting creepy on HN.. and its not only in this thread
If I'm being kind to you, HN commenters (myself included) should do a better job of commenting politely, and spend more effort making sure their criticism comes off as constructive rather than just whining and aggressive... ...but I think you make it sound like criticism of anything outside of the source code itself is creepy, rude, and disrespectful.
It doesn't appear to be in their primary repo.
I would like to try and do my own test against another embedded data store like Berkeley DB but I want to know more about the conditions on the test. How many threads were used, that kind of thing.
1. Can multiple processes use the same database concurrently? (Separate address space processes, not fork()'d)
2. Have you tested this with uClib/cross compiler? (I would like to use it on a MIPs embedded router)
The reason I ask this is because I recently had the displeasure of having to hack a non-volatile RAM library to work with shared memory / thread safe and something small like this would be a perfect replacement with a lot less pain.
I would avoid using this for realzies if only for the fact that if something broke trying to fix it in that code base would be prohibitive
Memory isn't an adjective either, but the phrase "rethinking primary memory constraints" strikes me as perfectly valid. That's why I concluded that "algorithm" would be the most clear word. But yes, I think "algorithmic" would have been fine as well.
Since you're so good rummage in English grammar, and you so annoying lack of commas and hyphens, tell me how to better articulate and "prufridat" here is the comment, or ... you have this same emptiness in one part of your, grammatically sharpened the brain (where, incidentally, you can put a hyphen) prevents you meet the challenge? next time (by the way with a capital letter, you can start) before its nasilyvat Claudia, better go to the podkursy Russian or Chinese, or at least the same English. not very difficult to leave garbage comments not related to the topic of conversation by the way, and call themselves Viking encoder. much more difficult to live up to its name, and "vikingovat" architecture, code, thoughts, philosophy of creation. if the Internet and / or a hacker would Neuse total waste basket, and it was possible to vote for things that go there, you can count on my vote.
Then, in English:
> ... see above? grammar that.
First time I've heard "grammar" used as a verb. :)
The author's primary (and perhaps only) contact with users is through their documentation. Incorrect capitalization and apostrophe use is distracting and will put off some potential users. The author looks sloppy and uncaring because these types of mistakes are preventable.
Being a non-native speaker is not an excuse for basic errors (excepting novices). Proof-read your docs! If you're not confident in your language skills, ask someone else to!
I doubt it is the lack of time. But natively English-speaking acquaintance to proof-read technical documentation can be very hard to come by. Most projects have to do by with the folks they have. ( I am in need for such a person myself. If anyone has the time Searchdaimon ( https://github.com/searchdaimon/enterprise-search ) could need some help :) )
One trick can be to at least wikify the documentation so other can easier pitch in.
Instead of bashing someone on hnews comments, you could send them an errata patch.
rgbrgb, I agree that it's superficial, but in direct contrast to your praise of the superficial style of the documentation, I was unfortunately distracted by the superficial mistakes in grammar and spelling. I wish someone would take the time to help the author of this code, Dmitry Simonenko, to improve the documentation. To that point, here are my hopefully constructive criticisms specifically of the language used in the documentation. I have performed no audit of the library itself, and cannot speak to its quality.
Does this still come off to you as poor form?
The problem with the docs aren't these small gramattical errors. In most cases you cite the meaning is clear, so this is just pedantry.
What bothers me about the documentation is that the explanation of his algorithms is a bit opaque.
But all told, I'd much rather someone contribute than hold back because of concerns about language mastery.
But of course you want to make it clear that you feel no obligation to contribute despite your criticism. So it's lazy for him, but not for you?
Note I comment exclusively on the author's presentation, not thrir personal behavior: obviously they are not lazy. I have no obligation to contribute, no: ncome from or personal interest in this project. I contribute to the projects which benefit me and people who I care about.