Excel? Just save it as a .csv! Noooooooo!

So you want to make it possible for your site user to download some data for use in Excel? Well just save it as a .csv! You know; a comma separated values file. No problems.

Yes problems

The .csv trick works fine. Unless, of course, you want to use any special characters.

First fail: utf-8

The natural reaction is to use utf-8. So you make sure that the Content-Type of the response is correct.

    res.setHeader('Content-Type', 'text/csv; charset=utf-8');

Wrong! Excel doesn't like utf-8. It wants utf-16. And not any utf-16 - it wants utf-16le. That is: utf-16, little endian.

Second fail: utf-16le

So now you change the response header:

    res.setHeader('Content-Type', 'text/csv; charset=utf-16le');

..and write your content. Well it doesn't work. The file is still downloaded as utf-8 or ascii.

Third fail: Buffer.toString('utf16le')

So now we go low level. First we write the header and then the data as a buffer:

    res.writeHead(200, {
        'Content-Type': 'text/csv; charset=utf-16le; header=present;'
        });
    res.write(new Buffer(myData, 'utf8').toString('utf16le');
    res.end();

...and now you have a file filled with something that looks like Chinese characters.

Fourth fail: Iconv

So I found an npm lib called Iconv (it should be familiar if you run Linux or OS X). It helps with converting between charsets:

    var Iconv = require('Iconv').Iconv;
    var iconv = new Iconv('utf8', 'utf16le');
    var buffer = iconv.convert(myData);
    res.writeHead(200, {
        'Content-Type': 'text/csv; charset=utf-16le; header=present;'
        });
    res.write(buffer);
    res.end();

Now the content of the file looks (almost) right. But the file is downloaded as an octet/stream. Why?

First win: BOM (hedgehog?)

Turns out you have to add a BOM for the system to understand that it is downloading a file in utf-16le format. So the (almost) final code looks like this:

    var Iconv = require('Iconv').Iconv;
    var iconv = new Iconv('utf8', 'utf16le');
    var buffer = iconv.convert(myData);
    res.writeHead(200, {
        'Content-Type': 'text/csv; charset=utf-16le; header=present;'
        });
    res.write(new Buffer([0xff, 0xfe]);
    res.write(buffer);
    res.end();

Why is everything in the same column?

Well I said almost, didn't I? It just so happens that when you use unicode in Excel, csv no longer translates as comma separated value file. It now means TAB separated value file. Duh! So just replace all commas with \t and you're fine!

Author

Johan Öbrink

Started programming on my VIC 20 in 1982. Programmed JavaScript, Java, ActionScript, C# and then some. Love TDD, XP and agile development. Enjoy talking (a lot), building drones and playing with LEGO.

comments powered by Disqus