[Author: Ben] List a directory containing 8 million files [not include * in subdirectory]






 Source:

  • https://stackoverflow.com/questions/466521/how-many-files-can-i-put-in-a-directory
  • http://be-n.com/spw/you-can-list-a-million-files-in-a-directory-but-not-with-ls.html
  • https://man7.org/linux/man-pages/man2/getdents.2.html


Rewrite:

You can list a directory containing 8 million files! But not with ls.

I needed to list all files in a directory, but lsfind, and os.listdir all hung. This is my story.

NOTE: there is no good reason that you should ever have 8 million files in the same directory, but if you do, this is your solution :-).

TLDR: Write a C program that calls the syscall getdents directly, with a large buffer size, ignore entries with inode == 0.

Why doesn't ls work?

ls and practically every other method of listing a directory (including python os.listdir, find .) rely on libc readdir(). However readdir() only reads 32K of directory entries at a time, which means that if you have a lot of files in the same directory (i.e. 500M of directory entries) it is going to take an insanely long time to read all the directory entries, especially on a slow disk. For directories containing a large number of files, you will need to dig deeper than tools that rely on readdir(). You will need to use the getdents() syscall directly, rather than helper methods from libc.

How to quickly list a directory with 8 million files

The trick is to understand getdents(), the low level system call that reads directory entries from disk, and returns a directory entry (dirent) data structure.

GETDENTS (filehandle for directory entries, *directory entry pointer, number of bytes to read)

Luckily the man page has a lot of detail, and provides C source code to list all the files in a directory. Read it here and copy the source code to a file, like listdir.c. There are two modifications you will need to do in order quickly list all the files in a directory. First, increase the buffer size from X to something like 5 megabytes.

#define BUF_SIZE 1024*1024*5

Then modify the main loop where it prints out the information about each file in the directory to skip entries with inode == 0. I did this by adding

if (dp->d_ino != 0) printf(...);

In my case I also really only cared about the file names in the directory so I also rewrote the printf() statement to only print the filename.

if(d->d_ino) printf("%s\n ", (char *) d->d_name);

Compile it (it doesn't need any external libraries, so it's super simple to do)

gcc listdir.c -o listdir

Now just run

./listdir [directory with insane number of files]

Presto, you should see all the files in your insanely large directory :-). In my case I did

./listdir [directory with insane number of files] > output.txt

and then used the contents of output.txt to process the files that were previously "unlistable".

Why did we run into this problem in the first place?

Without going into too many details, we needed a simple datastore where we could find an entry based on a key, append values to each key, and expire keys that were older than a certain threshold, while performing a cleanup operation on the stored values. The filesystem actually works really well for this cases as long as there are not that many keys active at the same time. (i.e. on the file system listing all keys is the slowest operation, so if there are not many keys there are nothing to worry about). Something got screwed up, and caused the expiration action to fail. So instead of keeping a small number of files in the directory, we started generating a lot of entries that were not being cleaned up. The next time the cleanup operation ran, it could no longer list all the keys in a reasonable amount of time (reading 32K of directory entries at a time). This compounded the problem and lead to a state where new files were being created, but old files were not cleaned up. Soon we had 8 million files in a single directory. We were able to fix the root cause fairly quickly by resolving a bug in our "sweeper" and creating a new cache directory, but we still have 8 million files that needed to be processed. Keep in mind the timescale here is a matter of hours. (Luckily this problem happened on a weekend, was caught fast, and had very little effect on our customers). One important thing to keep in mind, is at this point we had no idea how many files needed to to be processed, all we knew was that ls and os.listdir() were both hanging when trying to list all the files in a directory. I asked in IRC, and tried searching google without much luck. Other people had run into this problem before, but no one had a good solution. In fact I recall running into a similar problem listing a mailqueue in Qmail many years ago, but I have no idea how we solved it. (If you like solving this sort of problem, we are hiring: jobs@olark.com)

Diagnosing the unknown

Whenever I see something hang without debugging output I turn to strace. strace lets you watch the system calls made during program execution and let's you see what is going on when no output is being printed. I tried strace find . strace ls and strace python import os os.listdir(".") All of these functions produced basically the same strace output: First they open the file containing information about a directory

open(".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5

Then they make the getdents syscall which returns directory entries

getdents(5, /* 586 entries */, 32768) = 32752

It took me a while to figure this out, but basically the getdents call http://www.kernel.org/doc/man- pages/online/pages/man2/getdents.2.html looks like this:

int getdents(unsigned int fd, struct linux_dirent *dirp, unsigned int count);

where count is really size of buffer. (in this case 32K) Of course I didn't notice this until someone in IRC mention that you could do

ls -dl

to see the size of the file storing directory entries. In my directory I got:

drwxr-xr-x 2 root root 537919488 Jul 29 04:55 . (513M)

Putting two and two together I could see that the reason it was taking forever to list the directory was because ls was reading the directory entries file 32K at a time, and the file was 513M.  So it would take around 16416 system calls of getdents() to list the directory.  That is a lot of calls, especially on a slow virtualized disk. This lead me to the solution above, increase the read buffer size for getdents() to decrease the number of system calls and speed up all the disk access :-).

Takeaways:

  1. It is possible to list a directory with 8 million files in it.
  2. strace is your friend
  3. Don't be afraid to compile code and modify it (hell, simple C compiles so fast it could be interpreted)
  4. There is no good reason to have 8 million files in a directory :-), but this was a good learning experience (and possibly a good interview question).

Appendix:

After all this I was still a little curious about where the 32K buffer size in ls came from so I downloaded to source to coreutils (contains ls) and poked around. (NOTE: this is mostly just some notes I took, and not a detailed analysis) Search through the code, and you will find that ls called readdir() with a directory pointer.

while (1) {
/* Set errno to zero so we can distinguish between a readdir failure and when readdir simply finds that there are no more entries. */
errno = 0;
next = readdir (dirp);
if (next)

readdir() is defined in libc. So I continued my search and downloaded the source to libc to figure out how the buffer size for readdir() was determined. getdents() is called inside of readdir() (as expected)

bytes = __GETDENTS (dirp->fd, dirp->data, maxread);
if (bytes <= 0)

And the byte size comes from the size of the directory entry struct, or the dirp->allocation. Given that we were reading multiple entries, I am pretty sure the maxread variable was being set from dirp->allocation.

/* Fixed-size struct; must read one at a time (see below). */
maxread = sizeof *dp;
#else maxread = dirp->allocation;
#endif

If you poke around in

sysdeps/unix/dirstream.h sysdeps/unix/opendir.c

You can see where this value gets set. However, it certainly doesn't appear to changed based on the size of the directory entry file. (Perhaps the buffer should be dynamically set based on the size of the directory entry file) Finally, to learn about the trick for skipping deleted files, I noticed a line:

 if (dp->d_ino == 0) ..

inside of ls, which is used to filter out deleted files from appearing when a directory is listed.


Code:

#define _GNU_SOURCE
#include <dirent.h> /* Defines DT_* constants */
#include <err.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <unistd.h>

struct linux_dirent
{
    unsigned long d_ino;
    off_t d_off;
    unsigned short d_reclen;
    char d_name[];
};

#define BUF_SIZE 1024
// #define BUF_SIZE 1024*1024*5

int main(int argc, char *argv[])
{
    int fd;
    char d_type;
    char buf[BUF_SIZE];
    long nread;
    struct linux_dirent *d;

    fd = open(argc > 1 ? argv[1] : ".", O_RDONLY | O_DIRECTORY);
    if (fd == -1)
        err(EXIT_FAILURE, "open");

    for (;;)
    {
        nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);
        if (nread == -1)
            err(EXIT_FAILURE, "getdents");

        if (nread == 0)
            break;

        // printf("--------------- nread=%ld ---------------\n", nread);
        // printf("inode#    file type  d_reclen  d_off   d_name\n");
        for (size_t bpos = 0; bpos < nread;)
        {
            d = (struct linux_dirent *)(buf + bpos);

            // custom
            if(d->d_ino) printf("%s\n", (char *) d->d_name);

            // printf("%8lu  ", d->d_ino);
            // d_type = *(buf + bpos + d->d_reclen - 1);
            // printf("%-10s ", (d_type == DT_REG) ? "regular" : (d_type == DT_DIR) ? "directory"
            //                                               : (d_type == DT_FIFO)  ? "FIFO"
            //                                               : (d_type == DT_SOCK)  ? "socket"
            //                                               : (d_type == DT_LNK)   ? "symlink"
            //                                               : (d_type == DT_BLK)   ? "block dev"
            //                                               : (d_type == DT_CHR)   ? "char dev"
            //                                                                      : "???");
            // printf("%4d %10jd  %s\n", d->d_reclen, (intmax_t)d->d_off, d->d_name);

            bpos += d->d_reclen;
        }
    }

    exit(EXIT_SUCCESS);
}

Comments