Excel.vim(github.com) |
That's a quite sad statement to make nowadays. I guess the old binary format might be worse regarding character sets, but at least the newer ones should use Unicode exclusively which makes this a very odd restriction.
Taking a piece of software and making all of the UI language localized is one thing. Making sure that your program doesn't blow up if it encounters UTF-8 is another thing. Nowadays if your program chokes on UTF-8, I think it's safe to just consider it broken.
In any case, looks like this is really where the issue may lie:
# for non-English characters
def getRealLengh(str):
length = len(str)
for s in str:
if ord(s) > 256:
length += 1
return length
and: for val in shn.row_values(n):
try: val = val.replace('\n',' ')
except: pass
val = isinstance(val, basestring) and val.strip() or str(val).strip()
line += val + ' ' * (30 - getRealLengh(val))
vim.current.buffer.append(line)
In accounting for the fix-width layout of non-ASCII characters.Like other people have pointed out, handling unicode properly does not mean internationalization. Handling utf-8 isn't even difficult if you just keep it in mind.
The notion of non-ASCII characters in user Excel documents IS NOT something rare even for English speaking nations. There are tons of people with foreign names, addresses and other personal information which is commonly stored in Excel documents. And the funny thing is: it's usually not alot of additional work to support UTF-8 if you START correctly.
> For vim 7.3 and less, it works well for almost all kinds of file formats,
> ie. .xls,.xlam,.xla,.xlsb,.xlsx,.xlsm,.xltx,.xltm,.xlt etc
Someone already pointed it out (https://github.com/yakiang/excel.vim/issues/5) on github: xlrd does not support the XLSB format (and the xlrd authors expressed no interest in building it)
The problem you will encounter is that most programs (Numbers, Google Docs) do not support XLSB.
Shameless plug: https://github.com/SheetJS/js-xlsx supports both XLSX and XLSB (AFAICT the only liberally licensed project that handles the format)
http://vim.wikia.com/wiki/Working_with_CSV_files
I'm not sure if it's what you're looking for, but I've found it very useful.
Edit: To clarify, I think this shows what hacking is all about. Playful cleverness and curiosity.
By the way, it's not utf-8, it's UTF-8. Whatever happened to taking pride in your writing and making it the best it can be?
Everyone has their own criteria for quality, and you can't hope to satisfy everyone. Everyone with even a mildly successful project in open source knows this. Scratch your own itch, make it work, accept any request that meets with your vision, and keep a permissive licence so those that don't can fork. Otherwise, the arrogance being asserted, that you can somehow determine if my contribution is worth of existing, is baffling.
I'd rather people who build stuff for themselves release it to the rest of us than keep it to themselves.
Regardless, I think people should be applauded for releasing their work instead of shamed. And of course, if it's not a lot of work, a pull request would likely be appreciated ;)
> A Show HN needn't be complicated or look slick. HN users are comfortable with work that's at an early stage.
> Be respectful. Anyone sharing work is making a contribution, however modest.
> When something isn't good, you needn't pretend that it is. But don't be gratuitously negative.
I'm not saying it people shouldn't be open to criticism, but I think terms such as downright shameful fall under the category of gratuitously negative.
Here is the master list of codepages used by Excel: https://github.com/SheetJS/js-codepage/blob/master/excel.csv (disclaimer: I built this as part of the in-browser XLS parser https://github.com/SheetJS/js-xls)
Consider this: A medical device that people's lives depend on. It only fails in 1/1000 cases causing death. Many people could state, "it works for me, can't be broken!" On the other hand, the families of the dead could argue that it is broken. Who is right?
Obviously this isn't such an extreme case. No one has their life depending on a Vim plugin, but it illustrates a point. "Works for me" doesn't necessarily imply "isn't broken."
The author of this plugin isn't trying to make a spreadsheet competitor, they just released it publicly because other people might find it useful or interesting.
I'd be willing to bet money that at least some of the formats in question aren't UTF-8, they are likely ASCII encoded against a character set or code page.
Then you have to read that codepage, and convert the necessary characters to their Unicode equivalents, and from there do you downcode to utf-8?
Does the language this library is written in support that translation? Are there modules to do that? Is the license for those module(s) necessary compatible?
Who's going to go through the different document versions to confirm, and adjust for the various encodings for non-ascii characters?
It's not as simple as saying "don't choke on unicode".
Length-prefixed byte arrays encoded using various code pages. There are a small number that excel uses: https://github.com/SheetJS/js-codepage/blob/master/excel.csv (the columns are CP#, mapping, single/double-byte)
> Does the language this library is written in support that translation? Are there modules to do that? Is the license for those module(s) necessary compatible?
If we can put together an Apache2-licensed module in JS in an afternoon (https://github.com/SheetJS/js-codepage) it can be done in python.
> Who's going to go through the different document versions to confirm, and adjust for the various encodings for non-ascii characters?
Someone already did that: https://github.com/SheetJS/test_files/tree/master/biff5 has artifacts for every language type
I thought Python 2 was Unicode-unfriendly. So not as easy as JS.
It's written in Python, which comes with support for pretty much every major encoding¹ out of the box, so yes.
¹: https://hg.python.org/cpython/file/cb94764bf8be/Lib/encoding...
In ten years it might be better.
Unicode isn't even hard: Use UTF-8. Don't try to measure the length of a string unless you're rendering that string and measuring the length in screen units like pixels. If you do those two things, that's 90% of the effort of making Unicode-safe software.
I think both views are valid. Those who don't know how to write Unicode-safe software shouldn't feel shamed into learning Unicode before releasing open source work. Those who already know Unicode should feel happy that they're making other people's lives easier.
As far as I know, UTF-8 will work 100% of the time, and is almost always the best internal representation for software you write due to how simple and uniform it is. If something is encoded in some other format, you can probably find a conversion function online.
I'm not saying that it's really all that hard, but there are multiple document formats, and versions of those formats. The author obviously didn't need unicode support, so didn't test for it. I'm sure test cases, and a pull request would be welcome.