Misframe

May 18, 2014

Prepending to a file.

Prepending to a file is relatively expensive, but to what extent? I decided to check. I wrote 2^23 integers to a file, in ASCII, separated by a newline. On my ThinkPad with spinning rust, it takes a little longer than 12 seconds. The file size is around 31 MB.

In order to “prepend” an integer to the beginning, we essentially have to shift all of the other elements forward and then write what we need at the beginning. So we write everything again, and then some. I wrote a small Go program that does this, and here are the results:

  āˆ‚ [-]: go run middle_insert.go 
Time to insert 4194304 integers: 12.202411374s
Time to prepend an int to the file: 579.472752ms

Prepending is 21 times faster! Dat cache.

A few notes worth mentioning:

package main

import (
	"fmt"
	"os"
	"syscall"
	"time"
)

const N = 1 << 22

func exitOnError(err error) {
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

func main() {
	f, err := os.Create("scratch.txt")
	exitOnError(err)
	defer f.Close()

	start := time.Now()
	// Write a bunch of ints
	for i := 1; i <= N; i++ {
		fmt.Fprintln(f, i)
	}
	f.Sync()
	fmt.Println("Time to insert", N, "integers:", time.Now().Sub(start))

	fStat, err := f.Stat()
	exitOnError(err)

	toBeInserted := fmt.Sprintln(0)

	start = time.Now()
	os.Truncate(f.Name(), fStat.Size()+int64(len(toBeInserted)))

	// Mmap! Let the OS do the work.
	sl, err := syscall.Mmap(int(f.Fd()), 0, int(fStat.Size())+len(toBeInserted),
		syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_SHARED)
	exitOnError(err)

	// This feels like cheating :-P
	copy(sl, append([]byte(toBeInserted), sl...))
	f.Sync()
	syscall.Munmap(sl)
	fmt.Println("Time to prepend an int to the file:", time.Now().Sub(start))
}
Next read these:
Feb 24, 2016