This is ericpony's blog

Tuesday, March 4, 2014

Node.js: Reading a Large Text File Line by Line

We all know the famous slogan of Perl: "there is more than one way to do it", which encourages a Perl programmer to do things in a creative way using Perl's exuberant syntax. On the other hand, the advocates of Python take a minimalist approach and argue that "there should be one—and preferably only one—obvious way to do it", making a point of programming discipline and code readability. As a beginner to adopt Node.js in my daily projects, I found Node.js blends these two opposite philosophies in a funny manner: there are many ways to do it, but only one or two of them are preferable. However, telling the good one from the others is far from obvious even for veteran programmers.
For example, if you want to read and process a large text file one line at a time, in Perl you can do it like this:
while (<>) { chomp; process_line($_); }
This line of Perl code does almost everything you expect it to do, including
  • has no limit on file size or number of lines
  • has no limit on length of lines
  • can handle full Unicode in UTF-8
  • can handle *nix, Mac and Windows line endings
  • does not use any external libraries not included in the core language distribution
Surprisingly, it turns out that there is no simple solution in Node.js fulfilling all of the above requirements. Node.js questions concerning routine jobs of this kind recur in forums like StackOverflow, and the suggested solutions are often either pointed out to be inefficient by other reposters, or far less intuitive than one would've expected for a scripting type language.
As to reading a large text file line by line, there are many third-party modules off the shelf to do this job. However, many of the modules are either out-of-date and abandoned by their developers, or reported to have horrifying bugs such as missing the last line, leaking massive memory, etc. Even when de-facto standard modules are present, they are usually under rapid development without proved reliability.

(I would finish the rest of this post some day.)

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...