Show HN: Transform a CSV into a JSON and vice versa(jsonmatic.com) |
Show HN: Transform a CSV into a JSON and vice versa(jsonmatic.com) |
https://www.npmjs.com/package/web-vitals/v/0.1.0 https://www.npmjs.com/package/@fingerprintjs/fingerprintjs
Harvesting user's data, most likely...
https://github.com/erikmartinjordan/jsonmatic/blob/master/sr...
I get the fingerprint as a UID (which is like a random number for me). I don't harvest any user's data. Code is open-source, you can verify what I'm saying if you wish.
Putting analytics in a deployed app is your prerogative. Putting it in what touts itself as a reusable component is at best frowned upon.
PyPi: https://pypi.org/project/flattentool/
Source: https://github.com/OpenDataServices/flatten-tool
Docs: https://flatten-tool.readthedocs.io/en/latest/
It converts JSON to CSV and vice versa but also Spreadsheet files, XML ...
It has recently had some work to make it memory efficient for large files.
Work, BTW, is an Open Data Workers Co-op working on data and standards. We use this tool a lot directly, but also as a library in other tools. https://dataquality.threesixtygiving.org/ for instance - this is a website that checks data against the 360 Giving Data Standard [ https://www.threesixtygiving.org/ ].
For instance I note with interest their examples on nested data and arrays. We have various different ways you can work with arrays, so you can design user-friendly spreadsheets as you want and still get JSON of the right structure out: https://flatten-tool.readthedocs.io/en/latest/examples/#one-... (Letting people work on data in user-friendly spreadsheets and converting it to JSON when they are done is one of the big use cases we have)
mlr --c2j cat documents.csv > documents.jsonl
Converts a CSV file to a JSONL fileThis reminds me of something my boss at a previous job would say: "I am morally opposed to CSV."
Why? Because we worked at an NLP company, where we would frequently have tabular data featuring commas, which means if we used CSV we'd have a lot of overhead involving quoting all our CSV data. Instead my boss preferred TSV (T = tab) as our preferred tabular data format, which was much simpler for us to parse since we didn't really deal with any fields that had \t in them.
https://jqplay.org/s/ub-WvXCcPn
... from there it's just a row->column rotation to CSV.
Plus you have the type looseness for both and lack of standards for CSV.
A trivial 2-D case is handled well by Python library such as Pandas. Here OP could be an alternative.
When I say trivial I mean flat 2 dimensional data, such as you would get from Mockaroo or similar source.
However in real life - data is messy.
As you get into 3,4 and deeper hierarchies on JSON you can't really translate that into nice flat 2d CSV.
Then you have missing keys, mixed up types and you end up rolling you own hand written converters.
https://codepen.io/theprojectsomething/pen/OwppWW
Note: click the Toggle Info to read the "spec" (groan) :)
If there is any data that is a) not publicly accessible or b) contains personal information, I cannot authorise the use of a web based third party tool. There is just too much risk that some bad actor uses this as a method to soak up data.
I would love to verify /validate that all of the processing is local and have some way to certify if this hasn't changed.
Shameless plug: a similar solution, working all client side, not imposing to use a key as first column, and with options regarding CSV format.
also, clearly javascript is a bit too ambitious for the job when e.g. PHP could provide the intended functionality with two lines of code: foreach($arrays as $values){echo implode(',', $values) . "\n";} echo json_encode($arrays);
also, CSV is more for storing rigidly-structured uniform columns and rows, whereas JSON is more for storing loosely-structured varying objects, otherwise you're redeclaring column headings in every array, which wouldn't make much difference for gzipped transport but still wasteful and verbose nonetheless. column headings are usually the first line of a CSV
Personally I’d recommend jsonlines over regular CSVs these days but I’ve had so many issues with CSV parsers being incompatible over the years that I’d welcome anything which offers stricter formatting rules.
I’d definitely recommend you check it out. https://jsonlines.org
i wonder if it requires a "key" column to serve as the dictionary key
EDIT: i kind of wouldn't mind a HN discussion on which is better... ruby is my go-to scripting language, so i find this structure very natural, i've found when trying to do the same things with Go, i much prefer things to be structured more like an array of objects
would be interesting to hear the merits of both
cat file.csv | ConvertFrom-Csv | ConvertTo-JsonFirst attempt is to simply read each line in as raw and split on `,` - sort of does the job of, but it isn't the array of arrays that you might expect:
$ echo -e "foo,bar,quux\n1,2,3\n4,5,6\n7,8,9" > foo.csv
$ jq -cR 'split(",")' foo.csv
["foo","bar","quux"]
["1","2","3"]
["4","5","6"]
["7","8","9"]
Pipe that back to `jq` in slurp mode, though: $ jq -R 'split(",")' foo.csv | jq -cs
[["foo","bar","quux"],["1","2","3"],["4","5","6"],["7","8","9"]]
And if you prefer objects, this output can be combined with the csv2json recipe from the jq cookbook[0], without requiring `any-json` or any other external tool: $ jq -cR 'split(",")' foo.csv | jq -csf csv2json.jq
[{"foo":1,"bar":2,"quux":3},
{"foo":4,"bar":5,"quux":6},
{"foo":7,"bar":8,"quux":9}]
Note that this recipe also keeps numbers as numbers!In the reverse direction there's a builtin `@csv` format string. This can be use with the second example above to say "turn each array into a CSV row" like so:
$ jq -R 'split(",")' foo.csv | jq -sr '.[]|@csv'
"foo","bar","quux"
"1","2","3"
"4","5","6"
"7","8","9"
And to turn the fuller structure from the third example back into CSV, you can pick out the fields, albeit this one is less friendly with quotes and doesn't spit out a header (probably doable by calling `keys` on `.[0]` only...): $ jq -cR 'split(",")' foo.csv | jq -csf csv2json.jq | \
> jq -r '.[]|[.foo,.bar,.quux]|@csv'
1,2,3
4,5,6
7,8,9
I don't consider myself much of a jq power user, but I am a huge admirer of its capabilities.[0] https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-f...
And to educate! To show each other. To make knowledge discoverable. That’s the reason websites like this are honestly a very good thing.
Some users don't like to install too many desktop applications and rather use simple web apps...
https://github.com/erikmartinjordan/jsonmatic/blob/525b7fbc9...
I read https://css-tricks.com/almanac/properties/o/outline/ and it says
> The outline property in CSS draws a line around the outside of an element. It’s similar to border except that:
> 1. It always goes around all the sides, you can’t specify particular sides
> 2. It’s not a part of the box model, so it won’t affect the position of the element or adjacent elements (nice for debugging!)
> […]
> It is often used for accessibility reasons, to emphasize a link when tabbed to without affecting positioning and in a different way than hover.
I guess this is why you said outline should be used instead in this case.
You could approximate CSV in a JSON response like:
{
"columns": ["a", ..., "z"],
"rows": [[1, ..., 26], ..., [11, 2266]]
}
Or: {
"a": [1, ..., 11],
...
"z": [26, ..., 2266]
}
which I've never seen, but would save space, and sort of enforced in the sense that if you trust your serialiser for it as much as you trust an equivalent CSV serialiser, it's fine. (But the same argument could be made for more usual JSON object lists. Only arguable difference is that there's more of an assertion to the client that they should be expected to be homogeneous.)The fundamental problem with CSV is that it has no canonical form or formal construction. Even the RFC documents it by example and historical reference, rather than from first principles, and does so with liberal use of "maybe". Consequently being very easy to fling about as a human but much harder to reason about in the abstract, and this is most evident when you get bogged down in the gritty details of writing a CSV importer for your application.
For this reason, your second structure is how I serialize code I’m interacting with in C.
We use JSON Schema heavily to enforce structure in JSON data. https://json-schema.org/
If you’re rendering the table in the DOM, the response size is the least of your issues.
Where it becomes relevant is if each record is stored as a separate document so you can't just compress them all together. Compressing each record separately won't eliminate the duplication, so you're better off with either a columnar format (like a typical database) or a schema-based format (like protobuf.)
I usually need a less dense version, e.g. to send to a jinja2 template, so mostly use to_dict(orient='index').
{ header: [...columnNames], rows: [...values2DArray]}I see an appropriate beauty in your username representing an information propagation model – of modern waves in modern habitats.
I do hope that my comment about a “rut” and “little formats” doesn’t disparage the work. I try to speak for enlightenment but sometimes I fall into lamenting the darkness.
Nothing’s perfect, of course: you can’t stream records in such a format, so no convenient Unix-style tooling.
[1]: http://www.equi4.com/metakit.html [2]: https://yandex.com/dev/clickhouse/
For data, if we are going to use reserved characters, maybe we just use protobuf and let the serialisation code take the strain.
This commit deletes any kind of data harvesting (removes fingerprint and visit counting):
https://github.com/erikmartinjordan/jsonmatic/commit/7f3fa89...
What was the rational for such enormous single payloads?
I wonder which parses faster. I guess CSV does but then the consuming code would still have to parse the strings into JS primitives...
I'd love it if someone did a benchmark though.
You could store a CSV in JSONL very easily. In fact, jsonlines' website shows it as its first example: https://jsonlines.org/examples/
And if you wanted something that is json file-wide, you can just add some commas and wrap in [] for a list of rows.
» ps aux | grep root | head -n5 | format jsonl
["root","87596","0.0","0.0","4359648","116","??","Ss","10:01am","0:00.02","com.apple.cmio.registerassistantservice"]
["root","81777","0.0","0.0","4321932","88","??","Ss","Wed12am","0:00.01","PlugInLibraryService"]
["root","71784","0.0","0.1","4365572","10504","??","Ss","Tue11pm","0:25.88","PerfPowerServices"]
["root","42906","0.0","0.0","4321572","88","??","Ss","Tue09am","0:00.01","com.apple.ColorSyncXPCAgent"]
["root","47415","0.0","0.0","4303172","88","??","Ss","Sat04am","0:00.01","aslmanager"]
It has the readability of CSV but the stricter formatting of JSON. Win win.but for some reason just this post made me realize something... i've always been irritated by CSV log files (or space delimited), but would prefer something more structured like JSON, but JSON log files are pretty obnoxious for the reasons mentioned in the comments, a lot of data duplication...
JSON lines for log files seems like a great fit, not sure why i didn't realize this until now, i suppose it was the context of the discussion!
So you just have an array of arrays. Or even a huge array and every X elements, it’s a new record.
If each one has 2 keys,
[
{
key1: ‘a’,
key2: ‘b’
},
{
key1: ‘a’,
key2: ‘b’
}
]
Can become, [
[
‘a’,
‘b’
],
[
‘a’,
‘b’
]
]
Or just every 2 will be a new record, [
‘a’,
‘b’,
‘a’,
‘b’
]Trying to point out that the original structure allows for more flexibility.
If you only cared about space, this compresses better anyway and uncompressed, it still occupies less space.
["Name", "Session", "Score", "Completed"]
["Gilbert", "2013", 24, true]
["Alexa", "2013", 29, true]
["May", "2012B", 14, false]
["Deloise", "2012A", 19, true]
Plenty of tools, including some I've written myself, support it.1. The whitespace is optional. It's just put there for illustrative purposes.
2. Whitespace in CSV can actually corrupt the data where some parsers make incompatible assumptions vs other CSV parsers. Eg space characters used before or after commas -- do you trim them or include them? Some will delimit on tabs as well as commas rather than either/or. Some handle new lines differently
3. Continuing off the previous point: new lines in CSVs, if you're following IBM's spec, should be literal new lines in the file. This breaks readability and it makes streaming CSVs more awkward (because you then break the assumption that you can read a file one line at a time). jsonlines is much cleaner (see next point).
4. Escaping is properly defined. eg how do you escape quotation marks in CSV? IBM's CSV spec states double quotes should be doubled up (eg "Hello ""world""") where as some CSV parsers prefer C-style escaping (eg "Hello \"world\"") and some CSV parsers don't handle that edge case at all. jsonlines already has those edge cases solved.
5. CSV is data type less. This causes issues when importing numbers in different parsers (eg "012345" might need to be a string but might get parsed as an integer with the leading space removed. Also should `true` be a string or a boolean? jsonlines is typed like JSON.
The entire reason I recommend jsonlines over CSV is because jsonlines has the readability of CSV while having the edge cases covered that otherwise leads to data corruption in CSV files (and believe me, I've head to deal with a lot of that over my extensive career!)
I do get the convenience of CSV and I've used it loads in the past myself. But if ever you're dealing with data of which the contents of it you cannot be 100% sure of, it's safer to use a standard that has strict rules about how to parse control characters.
I'm not saying your use case is isn't appropriate though. eg if you're exporting from a DB who's records have already been sanitised and wanting to do some quick analysis then TSV/PSV is probably fine. But if you aren't dealing with sanitised data that doesn't contain \n, \" or others, then there is a good chance that your parser will handle them differently to your expectations -- and even a slim chance that your parser might just go ahead and slightly corrupt your data rather than warn you about differing column lengths et al. So it's definitely worth being aware that TSV and PSV suffer from all the same weaknesses as CSV.