The Relicans

loading...

Alternative (and Not Necessarily Practical) Ways to Read a File in Ruby

wyhaines profile image Kirk Haines ・7 min read

TMTOWTDI

There's More Than One Way To Do It.

A question was asked the other day:

image

So, in the interest of TMTOWTDI, let's explore this a bit.

Calling the class method read() on File, like this -- File.read("filename") -- reads the content of the named file and returns it as a String.

contents = File.read("foo.txt")
Enter fullscreen mode Exit fullscreen mode

This article would be very boring if that were the end of it, though, so let's look at a few other ways that one might achieve the same effect.

Ruby -p

Most of the time, when people use Ruby, they are running scripts or significant pieces of software. However, Ruby has a number of command-line options which make it a viable tool for doing a variety of simple file processing and manipulation tasks.

❯ ruby -h
Usage: ruby [switches] [--] [programfile] [arguments]
  -0[octal]       specify record separator (\0, if no argument)
  -a              autosplit mode with -n or -p (splits $_ into $F)
  -c              check syntax only
  -Cdirectory     cd to directory before executing your script
  -d              set debugging flags (set $DEBUG to true)
  -e 'command'    one line of script. Several -e's allowed. Omit [programfile]
  -Eex[:in]       specify the default external and internal character encodings
  -Fpattern       split() pattern for autosplit (-a)
  -i[extension]   edit ARGV files in place (make backup if extension supplied)
  -Idirectory     specify $LOAD_PATH directory (may be used more than once)
  -l              enable line ending processing
  -n              assume 'while gets(); ... end' loop around your script
  -p              assume loop like -n but print line also like sed
  -rlibrary       require the library before executing your script
  -s              enable some switch parsing for switches after script name
  -S              look for the script using PATH environment variable
  -v              print the version number, then turn on verbose mode
  -w              turn warnings on for your script
  -W[level=2|:category]     set warning level; 0=silence, 1=medium, 2=verbose
  -x[directory]   strip off text before #!ruby line and perhaps cd to directory
  --jit           enable JIT with default options (experimental)
  --jit-[option]  enable JIT with an option (experimental)
  -h              show this message, --help for more info
Enter fullscreen mode Exit fullscreen mode

Take a close look at the -e, -n and -p options above:

-e 'command'    one line of script. Several -e's allowed. Omit [programfile]
-n              assume 'while gets(); ... end' loop around your script
-p              assume loop like -n but print line also like sed
Enter fullscreen mode Exit fullscreen mode

The -e option is used to provide Ruby code right on the command line, instead of loading it from a file.

❯ ruby -e 'puts Time.now'
2021-04-16 12:35:01 -0600
Enter fullscreen mode Exit fullscreen mode

The -n option assumes that a while gets(); ... end wraps any code that Ruby is executing. This allows one to provide some code that will handle successive lines from STDIN, without having to write the STDIN handling loop explicitly.

And the -p option does the same thing as the -n option, except that it also prints each line.

So, if one passes a blank program to Ruby, via an empty string argument with the -e flag, and one also specifies the -p flag, Ruby will act more or less like the cat utility -- it will dump all of the contents of a file, a line at a time, to STDOUT.

ruby -pe'' < filename
Enter fullscreen mode Exit fullscreen mode

For all of the following tests, the assumption is that we are working with a very simple data file that looks like this:

abc
def
ghi
jkl
mno
pqr
stu
vwx
yz
Enter fullscreen mode Exit fullscreen mode

So, let's try this!

❯ ruby -pe''<abc.txt
ruby: no code specified for -e (RuntimeError)
Enter fullscreen mode Exit fullscreen mode

Yeah, there is a gotcha there. Ruby doesn't actually appreciate a true blank line as a program. You can get away with a line that does nothing, though, so this will actually work:

❯ ruby -pe'#' < abc.txt
abc
def
ghi
jkl
mno
pqr
stu
vwx
yz
Enter fullscreen mode Exit fullscreen mode

From there, it is only a short additional step to use this as a technique for reading a file within a Ruby program:

contents = `ruby -pe'#' < abc.txt`
Enter fullscreen mode Exit fullscreen mode

Magic. TMTOWTDI. This one leverages Ruby file processing in a separate external process to replace File.read(). It may not be the most practical thing, but it works.

A Command of a Different Flavor

There are, of course, many different ways to dump the contents of a file to STDOUT just from the command line. The most obvious one is probably cat, as it works from any Unix-like operating system, as well as Windows Powershell (though not cmd.exe, where one needs to use something like more).

There is another command that is a lot of fun, though. It is called dd.

There are a lot of rumors about what dd could stand for, from the descriptive data duplicate to more menacing suggestions such as disk destroyer or data delete, but the truth of the matter is that it was named after an IBM System/360 Dataset Definition (DD) command. The Unix dd command also features a very non-Unix-like command syntax because of its origin, being inspired by the IBM System/369 JCL dd cards.

As one might glean from the alternative names for dd, one does have to be careful with the command, as careless use of it certainly can completely wipe out important data, or your entire filesystem.

Our example here, though, is not nearly so dangerous, as, in its simplest form, dd can be used to simply read data from somewhere else and return it on STDOUT, just like the cat command.

This usage looks like this:

dd if=abc.txt
Enter fullscreen mode Exit fullscreen mode

The if=abc.txt tells the command to read from the abc.txt file, and without any other commands, it defaults to writing the read data to STDOUT. Et voila!

dd if=abc.txt
abc
def
ghi
jkl
mno
pqr
stu
vwx
yz
0+1 records in
0+1 records out
35 bytes copied, 4.19e-05 s, 835 kB/s
Enter fullscreen mode Exit fullscreen mode

There is a little wrinkle there, as dd reports via STDERR how much data was duplicated, and how long it took.

Ruby can deal with that, though, because inside of a Ruby program, STDERR can be sent to another destination.

def dd_read(filename)
  original_stderr = $stderr.clone
  $stderr.reopen(File.new("/dev/null","w")) # Dump STDERR
  output = `dd if=#{filename}`
  $stderr.reopen(original_stderr)
  output
end
Enter fullscreen mode Exit fullscreen mode

Using it would look like this:

3.0.0 :008 > puts dd_read("abc.txt")
abc
def
ghi
jkl
mno
pqr
stu
vwx
yz
 => nil 
Enter fullscreen mode Exit fullscreen mode

Do It All Inside Ruby

Ruby is a very flexible language, and many of the core capabilities of Ruby are just a nice UI layered over the lower level LibC functions. This is particularly true for file IO operations, many of which are just a nice Ruby veneer to make the C calls prettier. One can take advantage of this, however, when it comes to reading files.

The first step when rolling your own file read method is to open the file. To keep things as close to the bare metal as possible, one can use IO.sysopen("filename") to open a file. This will call the underlying C library open() function to open the file.

Once that is done, one can turn to the IO.sysread() method for the actual reading of the data. This method accepts an integer specifying the maximum number of bytes to read. There is no guarantee that the full number of bytes will be read, and if there is no more data available, an EOFError will be thrown.

So, to use it to read a file, one needs to read chunks of data, adding them to a buffer, until the EOFError is thrown, at which point the end-of-file has been reached, and the entire file has been read.

The simplest implementation would look something like this:

 def file_read(filename)
   io = IO.new(IO.sysopen(filename))
   buffer = ""
   buffer << io.sysread(8192) until false
 rescue EOFError
   io.close
   buffer
 end
Enter fullscreen mode Exit fullscreen mode

Although like the earlier implementations, there is little practical value to this implementation of a file reading function, it does veer very close to something a bit more interesting.

Imagine that you want to read the file a line at a time, yielding each line to a block. If this sounds similar, that's because it is what [#each_line](https://ruby-doc.org/core-3.0.1/IO.html#method-i-each_line) does. You can build similar functionality with just a slight variation of the above code.

The primary difference is that the method must be able to yield to a block, and it must identify the end of a line within the buffer, yielding each line to the block. The basic structure of the method won't change, though.

 def file_read_each(filename)
  io = IO.new(IO.sysopen(filename))
  buffer = ""
  loop do
    buffer << io.sysread(512) until buffer.include?($/)
    line, buffer = buffer.split($/, 2)
    yield line
  end
rescue EOFError
  io.close
 end
Enter fullscreen mode Exit fullscreen mode

TMTOWTDI was popularized in the Perl community of the 1990s, but the sentiment is alive and well in Ruby today. And while many of the ways to do it are non-ideal in one way or another, every time you think of a different way to do it, you stand a good chance of learning something.

Good coding practice suggests that reimplementing File.read() probably isn't the best use of a developer's time, but even there, the learning and the understanding that can come from experimentation and exploration can lead a person down new paths of understanding. Exploration is seldom time wasted, so maybe go find something simple, and see if you can come up with another way to do it sometime?


I stream on Twitch for The Relicans. Stop by and follow me at https://www.twitch.tv/wyhaines, and feel free to drop in any time. In addition to whatever I happen to be working on that day, I'm always happy to field questions or to talk about anything that I may have written.

Discussion (0)

pic
Editor guide