UUID, serial or identity columns for PostgreSQL auto-generated primary keys?(cybertec-postgresql.com) |
UUID, serial or identity columns for PostgreSQL auto-generated primary keys?(cybertec-postgresql.com) |
You know, you think that, but it's never that simple. The field was added incorrectly and nobody noticed until the value is in countless tables that you now need to simultaneously update or the value is something that's supposed to be semi-secret, so now a low level support staff can't reference the row when dealing with a request. Or the table's requirements change and now you need to track two different kinds of data or data that is missing the field.
Me, I always just have the table make its own ID. It is just simpler, even when you think it is overkill.
Except that I knew a coworker who had a duplicate ID. An extremely rare event, they messed up the pre-assignment and there is another dude somewhere with his same ID. So from time to time, some system would tell him that his ID was already registered. A lot of banks and stuff like private healthcare systems like to use the DNI as usernames.
He tried to get his ID changed, but that was such a foreign concept to any of the involved institutions, that he had to give up because there simply is no such procedure. I guess he could have taken it to court, but the guy decided to just live with it (the justice system is quite slow here).
For instance, a driver's license number is printed on the card itself, so a human sees it. Therefore, it's a natural key, just like a name.
When you decide that whatever natural keys already exist aren't good enough for your organization, and you make a new key, it's not good to think of that as a surrogate key. The number will make it out somehow (as a "record locator" in a customer support call or something), and eventually become a natural key.
It's best to just plan for any new key to be a natural key, which means using best practices for natural keys. That means it should be something reasonable to print, read, say, and hear; and it should also follow a pattern so it can be distinguished from other special numbers.
Auto-increment is a shortcut, but usually not great in the long term unless it's something that will be well-contained inside the database as an implementation detail (e.g. a join key designed to refer to rarely-accessed fields of a wide table).
> You know, you think that, but it's never that simple.
It’s that simple if you’re the Social Security Administration and its a table of Social Security Accounts, not people.
Other than that, using SSNs as a primary key is just plain wrong.
Nah, they keep track of duplicate usage: https://www.nbcnews.com/technolog/odds-someone-else-has-your...
> The IRS often knows when this happens, when the imposter pays taxes. The Social Security Administration knows, too, for the same reason. And the nation's credit bureaus usually know, because the imposter often ends up applying for some form of credit. Plenty of financial institutions also have access to this information.
We moved to a new HR system and they have a set of "reserved" employee numbers that cannot be used and we have employee numbers in that range. Arg!
And we were supposed to teach our customers good design principles...
A projection in ES, is more a cache, not your primary store. The primary store is the eventlog. The latter should, obviuosly, never use natural ID's.
Essentially, they observed sizeable performance improvements by using UUID generators that are tweaked to get more sequentia resultsl. It results in better indexes. The articles compares sequences, random UUIDs and 2 kinds of sequentialish UUID generators.
They are not serially incrementing but still sortable. Thus prevent index fragmentation issues observed with UUIDS. Are 8 bytes in length. So index size is smaller compared to UUIDs. So you get all benefits of serial IDs but they are not easily guessable thus preventing sequential access attacks.
What unimaginable scale
Today, a decade later, they're at 1.074B
I don't see how that's true. From reading the article you linked, you only need a valid shard ID (which you can extract from known IDs), the millisecond (which is guessable) and a 10-bit sequence (which you can easily brute-force).
(And that's completely fine if their security model doesn't require unguessable IDs.)
It will results in a very high number of 404s. These can be monitored and the origin IPs can be banned.
So it's good to know that performances are not bad.
Use integer primary keys internally for identifiers and relationships.
Use English/Other Language permalinks for URL's
Use UUID's in places like API's one-time action links and "private" links that you only want to share with other people.
Worked fine for me for many, many years.
IME it's much more often I've quickly made a table with a serial PK and later wished it were uuid; just about never made a uuid and later wished for the compactness or natural clustering of bigint. Maybe for a table of millions and millions of time-ordered events.
Unless you're changing a foreign key, joins will always be correct.
Unless I'm doing something wrong in the last 30 years of using SQL.
Side question: can I get Postgres to throw an error if I try to join on two IDs where neither of the IDs have a foreign key reference to the other?
I end up doing this a lot when I'm trying to figure out how my applications are currently being used.
Strictly incrementing UUIDs can offer the same benefit.
I used to work on a reconciliation system which inserted all its results into the database. Only the most recent results were heavily queried, with a long tail of occasional lookups into older results. We never had a problem with primary key indexes (though this was in MySQL, which uses a clustered index on the primary key for row storage, so it's an even bigger benefit); the MD5 column used for identifying repeating data, on the other hand, would blow out the cache on large customers' instances.
PG will say it's doing a hash look up and you'd think it'd be fast but it will take quite sometime relative to joining two large tables with integer IDs. With UUIDS PG will give up doing a hash look up sometimes and try to do table scans unless you adjust random_page_cost.
In general joining on UUIDs for large tables is a bad idea. It can be great if you are joining a single row to another row.
I eventually came across the idea of using maximal period linear-feedback shift registers to transform an integer variable through every possible value (minus one), but in a non-incremental sequence that depends on the LFSR arrangement.
I never ended up putting the idea to use, but I've always been curious about people who have and how it worked out for them. [Edit to clarify: It was meant for obfuscation, not security against a determined attacker.]
This works well against the German Tank Problem when there's no oracle allowing an attacker to guess lots of IDs quickly (such as when there are reasonable rate limits). It does not provide enough entropy when such an oracle exists (especially an offline one).
For something like a password reset token, it still needs to be paired with suitably random bytes.
So far I haven't encountered any problems in the short term by using the approach described.
1 https://blog.twitter.com/engineering/en_us/a/2010/announcing...
I was thinking of generating random character strings and simply retry when the db throws duplicate key error on insert. No sharding is necessary and I’d like to have efficient foreign keys. Any thoughts?
However, I cannot imagine creating table entries without a datestamp. No matter what else you are doing, or what you index by, I would want YYYY-MM-DD_HH-MM-SS in every row.
Maybe I'm just weird that way ...
Recently I built a new system (typical business-type backend) and forced to use sqlite + C# + dapper. Using this combination I cannot use guid/uuid as dapper cannot properly map it back to c# from sqlite, and my dislike of int's got me thinking. I have a random string generator (have used it for years for things like OTP's and other reference numbers), where I give it an alphabet + length of the desired string. Using 8 to 12 characters, I can get a few million unique permutations. That is, if used as a primary key, few million per database table. Then I hear in the back of my head, guys from work who would argue I would run out of unique combinations or would have to do lookups to see if they exist. So I decided slap the year and month on it as a prefix, so a key might look like this: 2105HSUAMWPA. This gets indexed really well too and there is some inherent information that can be seen from looking at the key: Year 21, Month 5 and then the unique bits.It's basically 4 lines of code that gets called on every new database entity. I think it will be easy to shard/partition the data too if the need arise in the future, by simply looking at the first 4 digits.
Thus to summarize:
Data is sliced by entity type (customer, invoice, etc), then by date (2105 for May 2021) then by unique string.
What do you guys think about this approach? Anyone been burnt by something like this?
It would be nice to see real benchmarking on millions of rows to compare the three, but my gut tells me you use int by default, bigint if you outgrow int, and UUID if you have plenty of money for hardware and need distribution capabilities a UUID would enable.
I'm generally fairly comfortable using int for business relationships, and bigint (long) for transaction data.
For performance, insertion speed often seems to be dominated by 'commit latency' to sync to the disk; rather than by record size. I would agree that record size affects table scan, but for many datamodels keying may often be a relatively small proportion compared to the size of text fields and other data.
I like to model keyspaces to work for 200 years, for the largest forseeable market growth, times at least a factor of 10 for safety.
One of the most memorable anecdotes of my professional career is a production environment going down because we hit maxint on an important (and busy) table. The dirty hack we used to get the site back up (hint: int is _signed_), and the weeks it took to plan, test, and execute the migration.
in one sense I agree with the author that things are generally just easier when you use surrogate primary keys, however they really should note here that the FOREIGN KEY constraint itself is not a problem at all as you can just use ON UPDATE CASCADE.
(Actually, all serials are bigserial’s but the “base type” they add to the table differs, and it’ll always come back to bite you later. Ask me how I know…)
For one thing you can represent a range of data more efficiently by just storing offsets. This means that instead of having to store a 'start' and 'end' at 8 + 8 bytes you can store something like 'start' and 'offset', where offset could be based on your window size, like 2 bytes.
You can leverage those offsets in metadata too. For example, I could cache something like 'rows (N..N+Offset) all have field X set to null' or some such thing. Now I can query my cache for a given value and avoid the db lookup, but I can also store way more data in the cache since I can encode ranges. Obviously which things you cache are going to be data dependent.
Sequential ints make great external indexes for this reason. Maybe I tombstone rows in big chunks to some other data store - again, I can just encode that as a range, and then given a lookup within that range I know to look in the other datastore. With a uuid approach I'd have to tombstone each row individually.
These aren't universal optimizations but if you can leverage them they can be significant.
- isntauuid[1] (mentioned in this thread, I've given it a name here)
- timeflake[2]
- HiLo[3][4]
- ulid[5]
- ksuid[6] (made popular by segment.io)
- v1-v6 UUIDs (the ones we all know and some love)
- sequential interval based UUIDs in Postgres[7]
Just add a UUID -- this almost surely isn't going to be what bricks your architecture unless you have some crazy high write use case like time series or IoT or something maybe.
[0]: http://gh.peabody.io/uuidv6/
[1]: https://instagram-engineering.com/sharding-ids-at-instagram-...
[2]: https://github.com/anthonynsimon/timeflake
[3]: https://en.wikipedia.org/wiki/Hi/Lo_algorithm
[4]: https://www.npgsql.org/efcore/modeling/generated-properties....
[5]: https://github.com/edoceo/pg-ulid
[6]: https://github.com/segmentio/ksuid
[7]: https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...
There's another benefit to UUID - You can generate them anywhere including application side. Doing this on application side would have tremendous batching benefits or inserting objects with relationships at the same time (Vs waiting first insert to return an ID to be used in the FK).
There are a few mitigations but my favourite is the "casino chips" approach: pregenerate them server side, and allocate to clients on demand, including en masse if need be ("here kid, have a few million UUIDs to get you started"). Verify with whatever simple signature scheme comes with your application server framework, or at small scale just toss them in a crude LRU store.
Or, remember where the UUID came from, and apply their organisational scope to any trust you place upon it. This might work particularly for multi-tenanted SaaS. However it requires that all usage is tenant-bounded end-through-end throughout your application. This may be in conflict with a) your framework, b) your zenlike contemplation of simplicity in data management, or c) programmers in a hurry forgetting to scope their queries properly.
Ultimately, relying on UUIDs as intrinsically unguessable security tokens is probably not a great idea, but it's one that remains thoroughly embedded in the programming zeitgeist. As ever, nothing useful comes without a compromise. Keep your eyes open to the systemic consequences of design choices, and don't leave traps for your fellow developers.
Allowing simple caching, easy async, easy handling of offline, or far simpler clientside code for dealing with those objects. etc.
For mobile app development which relies on an online (http) backend, clientside generatable UUIDs offer almost only benefits.
You can also use one sequence for everything on the server, and then you can also pre-create ID based relationships client side.
Clocks aren't reliable enough for timestamps anyways so garbage collection is the only thing you kinda wanna rely on them for.
A good sweet spot seems to be, 32bit milliseconds + 96bit of entropy. This overflows appeoximately every 50 days, allowing for 50 day rolling data retention.
I ended up deciding to use sharded bigints because it enables better indexing, even though there are drawbacks when first writing the instances; The benefit of faster search was more important for me.
[1]: https://instagram-engineering.com/sharding-ids-at-instagram-...
[2]: http://www.livinginthepast.org/2016/02/24/sharding-into-bigi...
Yes, they are guessable but your application should not rely solely on the "secrecy" of the ID to authorize access to a record. If you are worried about someone crawling your public API with wget or curl and an incrementing counter you should re-think whether your data are really public or not, or maybe rate-limit anonymous users, etc.
They also reveal something about the total number of records in your database, I guess that could matter in some contexts but it's never really been an issue in practice for me.
I have definitely used the technique of defining non-overlapping sequences in different shards (with Oracle, not Postgres, but they are very similar in this regard). It worked very well and was easy to reason about.
As a developer, the big issue I have with UUIDs is that they are impossible to read or type. You have to copy/paste and it isn't easy to look at two UUIDs and tell if they are different.
I use integers in general unless the specific use case demands something else.
Any information you give to a potentially malicious actor can help them attack you. If you have a choice between leaking information and not leaking information, I can’t imagine why you would ever intentionally go with the former, unless you didn’t actually have a choice (feasibility, etc.).
As an example, maybe I needed to CSRF someone but needed their ID (say, in a b2b app where IDs are not generally public) - with sequential IDs I have a decent chance of cracking it with a small number of requests, especially if it’s a small userbase. Sure, the CSRF was the main issue, but this is a contrived example to illustrate the point.
Admittedly, IDs are oftentimes public information by necessity - but there’s no need to allow them to leak extra information.
Of course, there are many ways to solve that situation, but UUIDs is one.
They can inherently leak how much data you do or don't have, which you may not want your competitors to know.
Doing this on application side would have tremendous batching benefits or inserting objects with relationships at the same time (Vs waiting first insert to return an ID to be used in the FK).
By separating that out you can get a lot:
1. You can extend your delegate system by modifying the delegate table, rather than having to muddy your data model with authority information
2. You can TTL the mappings in the delegate table
3. You can have many mappings to the same data, but each one can have its own restrictions.
It's a bit more complex but you end up with a system that really hones in on the benefit that you're bringing up.
Are you sure about this? This is pretty poor of a well known solution in the ORM world, SQLite or not.
If you were going for sortability/understandability then I understand slapping your own together, but why not generate v1/v4/v6[0] UUIDs in your application and then send them along to teh database, possibly prefixed with whatever you want it to be sorted by (though IMO you should just add that metadata to the thing being saved and sort on that properly)?
But of a rabbit hole: https://github.com/DapperLib/Dapper/pull/1082
[1] https://en.wikipedia.org/wiki/Birthday_problem#Probability_t...
It's an interesting tradeoff. The UX of the smaller YouTube video id links is probably of some benefit to them. Plus they have private videos for when you really don't want your video to be viewed, with unlisted being the middle ground of easy sharing but also keeping it exclusive.
What’s the use case for this where UUIDv4 or sequential ID isn’t better? Because it sounds like a solution in search of a problem.
> Are there any downsides besides making the application side a tiny bit more complex?
Are there any upsides to warrant the complexity?
Seems too easy to screw up.
https://www.postgresql.org/docs/current/plpgsql-control-stru...
For your use-case you could use incremental IDs.
Or if you're expanding to countries whose SSNs clash with each other?
SSN as PK is wrong regardless of whether you're doing SSN as password.
I’m suspecting you meant “easily guessable” in the human sense, not the cryptographic/security sense. My bad if I misunderstood you. Again, I’m not saying Instagram has any security problem, I’m just saying that this ID scheme in particular isn’t a security feature.
HiLo explanation below:
A client simply gets a range of ids to be used, exclusive to them. Then can use it without any roundtrip to server.
This can be achieved with a "hi" stored on service, and a batchSize that's constant in the system.
Each time a "hi" is requested, it's incremented by 1. Now the client can generate (hi * batchsize, (hi+1) * batchsize - 1).
For example, if some customer ID is used by customers in communication e.g. when calling you on the phone over some billing issue, then there would be strong advantages if your IDs aren't unnecessarily long and if they include some explicit redundancy (e.g. a check-digit with Luhn formula) to protect against communication mistakes.
For another example, if your IDs aren't visible to outsiders but are used in your internal business processes then it may be quite valuable to ensure that IDs of different types (e.g. customer ID vs account ID vs transaction ID) are obviously distinguishable in some way so that someone seeing XXXXXXX knows that it likely is a transaction ID and definitely can't be a customer ID; and it's quite valuable to ensure that you can't have accidental collisions where the same number is a valid ID in different key tables so a bug or miscommunication that confuses them would result in data corruption or information disclosure instead of simply failing.
So "ID design" deserves some attention from UX perspective and blocks of random data aren't optimal UX.
If you have 10 million rows, you're looking at 16MB for the storage of a UUID, vs 8MB for storage of a 64 bit int.
Both of those are entirely cacheable.
Personally I wouldn’t do this. A scenario where for each individual millisecond of elapsed time, 96 bits of entropy is an upgrade over 80 bits of entropy, is fairly extreme. I don't think there are many databases in the world which would ever need more collision mitigation than that.
Individual instances? Maybe not. But for those an autoincrement key would also work. That is not the scenario that ULIDs and GUIDs are advertised for.
The goal is to have an universally/globally unique ID. So whenever you encounter two IDs you can be (resonably, probability wise) sure that they won't collide.
Any such sheme thus must, by definition, serve every single use case now and forever everywhere. That's a tough one.
Also it's not really 80bit vs 96bit (which due to the birthday paradox is already a huge difference) but more 80bits vs. 128bit as the timestamp is recycled with sufficient usage.
I'm actually concerned that 96bit isn't enough, as it relies on the assumption that you'll use this scheme for for data spanning years, in order to properly use the timestamp as entropy.
If you read the linked doc, you'll see an EXCEPT clause. That can be used for a retry loop inserting into a table with a UNIQUE constraint. No read locks are necessary, because the UNIQUE constraint will catch violations safely (regardless of concurrent activity), and the retry loop can simply retry until that succeeds. For instance:
CREATE TABLE u(i INT8 UNIQUE);
-- insert random unique value in the range 0..n
-- into table u, retrying if it's already present
--
-- NOTE: this will not terminate if 0..n are all
-- present
CREATE OR REPLACE FUNCTION insert_uniq(n INT8)
RETURNS VOID
LANGUAGE plpgsql AS $$
DECLARE
x INT8;
BEGIN
<<retry_loop>>
LOOP
BEGIN
x := (random() * n)::int8;
INSERT INTO u VALUES(x);
RAISE NOTICE 'inserted unique value %', x;
EXIT retry_loop;
EXCEPTION
WHEN unique_violation THEN
RAISE NOTICE 'collision with value %; retrying', x;
END;
END LOOP;
END;
$$;
This will obviously loop forever if 0..n are all occupied, but if you choose n as (2::numeric^63 - 1)::int8, that won't happen.Dev experience truly is a case of death by thousand cuts. I avoid every little cut I can like the plague so energy goes into making cool stuff.
> Should develop better tooling to visualize debugging information.
Thing is, why would I spend time overengineering tooling I don't need if I can get away with incrementing ids? When optimizing for value, I'd rather spend time solving business problems.
Your editor is very cool btw. And it's clearly a case where incrementing ids are not optimal.
I really was talking about serious/ complicated systems, where you want some consistency in the components it is made of even if a particular case could just use an incremental id. I my view, you don't spend the mental energy on switching between models. I have just adjusted/ rewritten a part of the system, where we for historical reasons used logins instead of UUIDs. By using logins in this particular case, we had subtle bugs that wouldn't occur with UUIDs. It would never have happened, had we used UUIDs everywhere from the start. Those already have standard validation functions and you don't have to think about stuff that really isn't your business problem. We are also talking engineering days thrown out of the window just for the change from logins to UUIDs later on. I don't think you waste so much time copy&pasting stuff.
I guess, we arrive at the same conclusion but have our differences about the ways leading to it. We both want to focus on business problems as much as possible.
I'm more familiar with MySQL where I don't have anything other than basic, automatically converted types.
But in Postgres, where there's more types, can I create a type that is representationally equivalent to a builtin type but not automatically convertable? So that if I say "select * from donkey left join plant on plant.id = donkey.animal" it says "error: cannot compare PlantId with AnimalId, use an explicit cast if you really want to do this". If this were possible and ergonomic, it would be better than a relatively cryptic null in the case that plants and animals have distinct (UUID) PKs.
Get off my lawn...
But seriously, UUIDs work, they don't need application code to avoid collisions. If you want something a bit more compact/shardable, use ULIDs.
I also cache certain prepared statements in a static concurrent dictionary, so I can short circuit if the it the query/insert/update has been run before for the specific object type. That way all of the string manipulation can be avoid for each query - I just use a previous statement but with different sql parameters. On every deployment the cache gets cleared because the app basically get restarted, so it won't go stale. It's very cool (at least in my mind). The only indirect rule I have is that nothing else should modify the main schema besides my migrations and not while the app is using it (using FluentMigrator code that execute on app startup).
If I may ask, why don't you like C#? I totally get the php thing - I cannot stand it at all either.
I'm just not good at it -- I came in expecting it to be better Java (better async ergonomics, slightly less verbose, etc) -- but after working with it (while working with a client no less) I found that I disliked it just as much as Java and had a relatively rough go of it. There are a lot of things that I think contributed:
- .NET4 was becoming .NET core / .NET standard while this project was happening
- I needed a windows VM to work on the project (dotnet was not ready for prime time yet, so I couldn't depend on it to build the legacy code)
- Packaging was uncomfortable and a source of pain (also had to do with the core/standard transition) -- after you've used NPM/yarn/cargo/stack(haskell) nuget+chocolatey (IIRC?) felt like a huge step backwards, basically had to be IDE driven.
- EF was so incomplete (due in part to the core/standard transition), and so much worse than my then-and-now favorite TypeORM.
- The async paradigm and how they handle results are a little weird, IIRC awaiting a task would mean you received the task, not the result of the computation of the task?
- IIRC there was no Option type. Java learned this lesson in 1.8 IIRC (and it rocked my world a bit, in a good way), but C# not taking it up is weird to me.
- The codebase was just like you'd expect an old Java codebase to be -- i.e. terrible notfun.
- Everything was heavily IDE/visual studio driven which was not fun for me. Visual studio is an amazing tool no doubt, but a lot of it is unintuitive to me at this point after years of straight emacs/vim and occasionally sublime/atom/vscode. This is more personal than any other reason.
I just felt like it was so much worse than Typescript/Javascript for questionable gain -- I had worked with this same client to deliver a JS codebase that was easy for them to work on, on time and at budget. The C# side felt much worse to work on, and I just took it to mean I'm not good at C#, and I have no desire to be. The ecosystem you're almost forced to accept (appveyor, windows machines + powershell, etc) is just not my cup of tea.
I'm spoiled for choice these days -- if I need performance for a backend thing (and need to hand the project off to someone) I can choose Go. If I really need performance I can pick Rust. If I don't need performance-per-say I pick Typescript (it's still generally better perf than Ruby/Python). If I want to get it right (and really craft software) I pick Haskell (and of course it's near impossible to hand that off to most companies).
In the end C# is a great language (whether I like it or not), but it fell short of my expectations as a "better java". If I'm going to do Java I'll just do Java/Kotlin/Scala/Clojure. I think I could go my whole life without ever touching C# again, so was an easy rule to make for myself.
Serial IDs, with some light assumptions, leak information about the total count of items.
Is the value of making that mistake strong enough that you use so much more data to store a UUID value??
Legacy C# projects can be a mess, I've stepped away after a month or from one or two projects in the past simply because it was beyond fixable - I think there is clean way to use c# and there is an over-engineered way (typically java devs that try to use the exact same patterns in c#, which bloats it).
You are right, the past 3 years or with .net core and .net standard was messy too! They should've just named it something different. Same with Entity Framework. I'm still steaming about them using the same names for everything, as there are plenty cases when things don't work the same way as before. It completely screwed googling things.
And any desktop development was broken too for a time (still bugging out with new .net core xaml editor) - so yeah, skip.
I primarily use C# for api's and tooling (my own build server is code I wrote myself in C#, hosting my own git repo's (not using gitlab/bitbucket etc; rather bare repositories!), and my own nuget server). But for anything web/desktop related, meh skip. In C# api land, you can get such clean/unbloated code with full type safety and compiler checks, I love it. Most C# devs use/throw exceptions but they really muddy the water and make it feel like Java. My own code only have try/catch when talking to the outside world (harddrive, network, etc), the rest of the layers are clean and with minimal null checks (cause at those layers I never have nulls). Exceptions are expensive/slow too, so rather catch it as early as possible and return a Result object (not built in, my own envelope) instead of rethrowing multiple times and doing a crazy amount of null checks. So it is possible to write "clean" C# but most commercial projects I've seen is pretty darn ugly.
Just an fyi, I don't use appveyor/windows/powershell/Azure at all. I work primarily from Fedora with Jetbrains products, with most of my tooling being hand rolled. I use the Digital Ocean Api to manage all my boxes and use SSH.NET (nuget package that can open ssh tunnels in C#, I use it for all sorts of things, incl reaching my db servers (ssh.net + Npgsql)) and FluentFtp if needed. So I don't really touch Microsoft's ecosystem at all. My builder code calls the dotnet sdk and I copy the build artifacts around as needed - no third parties involved. In the beginning I stressed about going this way but I see now how great it is - no need to worry about CI minutes or paying Gitlab or Bitbucket, or dealing with 10 different api keys or dealing with Azure etc. So it's totally possible to have sanity - but to be honest I cannot take "my way" to work as everything is hand rolled - people want to use Gitlab/Bitbucket etc else they lose their minds (other developers seems to be my biggest enemy these days - everything needs to match to their exact pattern else they quit). So yeah, own clients = own stack.
On the visual studio side, the IDE has gotten slower and slower... I have a 6 core cpu, 32Gb ram and a Samsung nvme drive, RX580.. yet visual studio lags like crazy, intellisense take 2 minutes to update etc.. that's with only VS open and no extra extensions installed. It also crashes sometimes while doing nothing out of the ordinary. So not sure what the heck they are doing to the project at Microsoft. On the otherhand, the jetbrains products have super slow startup times (try 10 to 30 seconds for Rider and Datagrip) but once in, they work great!
Another cool thing to note, Gitkraken sees my bare git repositories as valid remotes (provided ssh keys are in place) and pushes to them without issue (and near instantly, bitbucket push takes 20s no matter what I do).
Sorry for dumping all of this on you!
> I primarily use C# for api's and tooling (my own build server is code I wrote myself in C#, hosting my own git repo's (not using gitlab/bitbucket etc; rather bare repositories!), and my own nuget server). But for anything web/desktop related, meh skip. In C# api land, you can get such clean/unbloated code with full type safety and compiler checks, I love it. Most C# devs use/throw exceptions but they really muddy the water and make it feel like Java. My own code only have try/catch when talking to the outside world (harddrive, network, etc), the rest of the layers are clean and with minimal null checks (cause at those layers I never have nulls). Exceptions are expensive/slow too, so rather catch it as early as possible and return a Result object (not built in, my own envelope) instead of rethrowing multiple times and doing a crazy amount of null checks. So it is possible to write "clean" C# but most commercial projects I've seen is pretty darn ugly.
That sounds interesting! Yeah I definitely prefer having types these days (basically don't pick languages without them anymore), and errors-as-values is one of the patterns that is a pearl of modern PL development. Languages that come out these days that still rely on exceptions are an instant turn off for me (ex. Dart). I remember something like 8-10 years ago having discussions around whether checked exceptions had meaning in Java and now that I look back all those discussions were so silly.
> Just an fyi, I don't use appveyor/windows/powershell/Azure at all. I work primarily from Fedora with Jetbrains products, with most of my tooling being hand rolled. I use the Digital Ocean Api to manage all my boxes and use SSH.NET (nuget package that can open ssh tunnels in C#, I use it for all sorts of things, incl reaching my db servers (ssh.net + Npgsql)) and FluentFtp if needed. So I don't really touch Microsoft's ecosystem at all. My builder code calls the dotnet sdk and I copy the build artifacts around as needed - no third parties involved. In the beginning I stressed about going this way but I see now how great it is - no need to worry about CI minutes or paying Gitlab or Bitbucket, or dealing with 10 different api keys or dealing with Azure etc. So it's totally possible to have sanity - but to be honest I cannot take "my way" to work as everything is hand rolled - people want to use Gitlab/Bitbucket etc else they lose their minds (other developers seems to be my biggest enemy these days - everything needs to match to their exact pattern else they quit). So yeah, own clients = own stack.
Wow thanks for this level of detail, I exclusively use linux on everything now, and it's great to know that this kind of setup is possible without resorting to a VM. As far as infrastructure/deployment goes, you manage your C# projects with C#, SSHing to them and doing stuff? What are the commands you run like? I'm not super familiar with easily calling C# from the command line, something like `dotnet run deploy.csharp`?
SO I'm a huge proponent of CI (I think GitLab has the best CI out there) but it definitely is more complex than it should be. Setting up tokens, figuring out how things interact is really annoying 90% of the time, and the other 10% is bliss.
I'm actually about to relaunch a product I was working on -- one that makes it cheaper to run CI ($10 for 1000 minutes is what GitLab charges and I'm going to offer $15 for unlimited minutes per dedicated vCore/2GB RAM). More to your problem though, I actually make all my stuff work with Makefiles (so much so that I even deploy over-complicated infrastructure you wouldn't like with Makefiles[0][1]), has that not worked for you as far as fixing CI inconsistency? I find that generally knowing that to build I just need to run `make <target>` (assuming you have the correct system-level libraries installed of course) has fixed most of my issues in this area.
> On the visual studio side, the IDE has gotten slower and slower... I have a 6 core cpu, 32Gb ram and a Samsung nvme drive, RX580.. yet visual studio lags like crazy, intellisense take 2 minutes to update etc.. that's with only VS open and no extra extensions installed. It also crashes sometimes while doing nothing out of the ordinary. So not sure what the heck they are doing to the project at Microsoft. On the otherhand, the jetbrains products have super slow startup times (try 10 to 30 seconds for Rider and Datagrip) but once in, they work great!
This is unfortunate, crazy wild conjecture but do you think it's possible visual studio will be subsumed into VS Code? And yeah what you're describing is why I live in emacs/vim (and even emacs is too slow sometimes). I know that I'm giving up a LOT of creature comforts but for me it fits like a worn glove. I'm also constantly impressed with JetBrains -- the IDE experience has generally been good for me as well (I've had to use it before because everything else was just too painful), and it seems like they just keep their heads down and put out good product.
> Another cool thing to note, Gitkraken sees my bare git repositories as valid remotes (provided ssh keys are in place) and pushes to them without issue (and near instantly, bitbucket push takes 20s no matter what I do).
Interesting, is bitbucket push having an issue with web requests or something I wonder, 20s is a long time to wait! Since I use emacs magit[2] works for me and it is amazing.
[0]: https://vadosware.io/post/using-makefiles-and-envsubst-as-an...
[1]: https://vadosware.io/post/setting-up-mailtrain-on-k8s/#step-...
[2]: https://magit.vc/
Basically yes. C# console apps can be run like any other linux cli application, provided the dotnet runtime or sdk is installed (~40mb vs ~150mb). If you installed the sdk, you can build/publish/pack projects for production. You can specify the input directory and output directory, so you can easily call something like this: "Process.Start("dotnet publish ~/code/app1/app1.sln --output ~/builds");". Since you can start a process from C#, I can also git clone and git checkout like this: "Process.Start($"git clone {repo path} {output}");" where Process.Start lives in the System.Diagnostics namespace. So I built my own tooling as Console Apps that call other things (since I dislike bash scripts, yaml files (insert ci dsl) - the c# compiler protects me against a whole bunch of things and their System.IO namespace is awesome when working with files/directories).
So I can clone it, build it, open tunnel somewhere, copy it, straight from c# (although I have to setup ssh keys everywhere when needed, I use some trickery with "ssh-copy-id" that comes with ssh to add keys to new servers). So far I have 3 tools: a builder, a server manager (for droplets) and a tool that makes backups, but I'm looking at wrapping borg in the future as it is waaay better than mine. Next I need to build a tiny api to call "git init --bare" for new repositories, as I have to run it manually at the moment to make new project repo's.
> I'm going to offer $15 for unlimited minutes per dedicated vCore/2GB RAM
Careful, don't get abused by the crypto community; there has been some more incidents lately where miners abuse CI systems.
The concepts he's talking about are required for cqrs. Which is a popular concept applied with mostly DDD or microservices.
I’ve also personally experienced UUID collisions due to badly set up VM environments under Windows. It isn’t a good idea to blindly trust any value - and that includes supposedly ‘never collide’ id’s like UUID.
For what it’s worth, I also had the joy of debugging someone’s distributed hash table that was using md5 as the hash bucket key (this was... 2 decades ago?) and had no way to handle collisions because obviously that is impossible.
Eg. I use guids ( .net) and i have never seen an issue.
Maybe I'm just thinking of OAuth where there are multiple hops involved?
That should also make it easier to, say, verify and unwrap the request at the gateway to the server, before sending it to the rest of the application-proper.
It's whether the UUIDs in your system can be reliably presumed to be unguessable - including the UUIDs that were generated by code which was written after you wrote your query that assumes unguessability.
Today you might say "Oh, this SecureRandom-based UUID generator is unguessable and meets all of our requirements". Tomorrow you might say "Ah, this SecureRandom-based UUID generator is too slow, let's generate them in our Android app instead of on the server". But now the UUIDs stored in your database aren't reliably unguessable, because you accept whatever your client API tells you without verification. How plausible is it, within the timeframes you actually get, to review every query for whether it assumes the trustworthiness of the UUID generation? Better to assume UUIDs have some convenient properties, than to assume that they're unguessable just because the API is cryptographically secure today.
> Careful, don't get abused by the crypto community; there has been some more incidents lately where miners abuse CI systems.
Will do! I’ll have to do some detection of the workloads. I know other platforms are fighting this as well.
Unfortunately that doesn't mean much.
Since nodejs is a server side language and can handle that package too. And it's not "solely" for js/spa's.