Tuan-Anh Tran
June 1, 2020

An extremely fast streaming SAX parser for Node.js

Posted on June 1, 2020  •  3 minutes  • 483 words

TLDR: I wrote a SAX parser for Node.js. It’s available here on GitHub : https://github.com/tuananh/sax-parser

I got asked about complete XML parsing with camaro from time to time and I haven’t yet managed to find time to implement yet.

Initially I thought it should be part of camaro project but now I think it would make more sense as a separate package.

The package is still in alpha state and should not be used in production but if you want to try it, it’s available on npm as <code>@tuananh/sax-parser</code> .

Benchmark

The initial benchmark looks pretty good. I just extract the benchmark script from node-expat repo and add few more contenders.

sax x 14,277 ops/sec ±0.73% (87 runs sampled)
@tuananh/sax-parser x 45,779 ops/sec ±0.85% (85 runs sampled)
node-xml x 4,335 ops/sec ±0.51% (86 runs sampled)
node-expat x 13,028 ops/sec ±0.39% (88 runs sampled)
ltx x 81,722 ops/sec ±0.73% (89 runs sampled)
libxmljs x 8,927 ops/sec ±1.02% (88 runs sampled)
Fastest is ltx

ltx package is fastest, win by almost 2 (~1.8) order of magnitude compare with the second fastest (@tuananh/sax-parser). However, ltx is not fully compliant with XML spec. I still include ltx here for reference. If ltx works for you, use it.

module ops/sec native XML compliant stream
node-xml 4,335
libxmljs 8,927
node-expat 13,028
sax 14,277
@tuananh/sax-parser 45,779
ltx 81,722

API

The API looks simply enough and quite familiar with other SAX parsers. In fact, I took the inspiration from them (sax and node-expat) and mostly copied their APIs to make the transition easier.

An example of using @tuananh/sax-parser to prettify XML would be like this

const { readFileSync } = require('fs')
const SaxParser = require('@tuananh/sax-parser')

const parser = new SaxParser()

let depth = 0
parser.on('startElement', (name) => {
    let str = ''
    for (let i = 0; i < depth; ++i) str += '  ' // indentation
    str += `<${name}>`
    process.stdout.write(str + '\n')
    depth++
})

parser.on('text', (text) => {
    let str = ''
    for (let i = 0; i < depth + 1; ++i) str += '  ' // indentation
    str += text
    process.stdout.write(str + '\n')
})

parser.on('endElement', (name) => {
    depth--
    let str = ''
    for (let i = 0; i < depth; ++i) str += '  ' // indentation
    str += `<${name}>`
    process.stdout.write(str + '\n')
})

parser.on('startAttribute', (name, value) => {
    // console.log('startAttribute', name, value)
})

parser.on('endAttribute', () => {
    // console.log('endAttribute')
})

parser.on('cdata', (cdata) => {
    let str = ''
    for (let i = 0; i < depth + 1; ++i) str += '  ' // indentation
    str += `<![CDATA[${cdata}]]>`
    process.stdout.write(str)
    process.stdout.write('\n')
})

parser.on('comment', (comment) => {
    process.stdout.write(`<!--${comment}-->\n`)
})

parser.on('doctype', (doctype) => {
    process.stdout.write(`<!DOCTYPE ${doctype}>\n`)
})

parser.on('startDocument', () => {
    process.stdout.write(`<!--=== START ===-->\n`)
})

parser.on('endDocument', () => {
    process.stdout.write(`<!--=== END ===-->`)
})

const xml = readFileSync(__dirname + '/../benchmark/test.xml', 'utf-8')
parser.parse(xml)
Follow me

Here's where I hang out in social media