Inside Python: understanding os.listdir()
Posted on 08 May 2011 in Articles • 2 min read
To answer this question, one has to get inside Python's source. The stable Python 3.2 could be found here. The os.py module in source archive's Lib directory doesn't contain the listdir() function. Yet, the very place to look comes from:
1 | from posix import * |
Let's take a look on Modules/posixmodule.c:
Note
Pay attention to the comments!
1 2 3 4 5 6 7 | static PyObject * posix_listdir(PyObject *self, PyObject *args) /* line 2323 */ { /* POSIX-related code, supposed to start from line 2574 */ /* ... */ dirp = opendir(name); /* Opening directory for which os.listdir() was called */ /* ... */ |
The opendir() function opens a directory stream corresponding to the directory name, and returns a pointer to the directory stream. The stream is positioned at the first entry in the directory.
—Linux opendir() man page
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | /* continuing posix_listdir() */ /* ... */ for (;;) { ep = readdir(dirp); /* A crucial readdir() call */ /* ... */ /* ... */ if (ep->d_name[0] == '.' && /* skipping '.' and '..' */ (NAMLEN(ep) == 1 || (ep->d_name[1] == '.' && NAMLEN(ep) == 2))) continue; if (arg_is_unicode) v = PyUnicode_DecodeFSDefaultAndSize(ep->d_name, NAMLEN(ep)); else v = PyBytes_FromStringAndSize(ep->d_name, NAMLEN(ep)); if (v == NULL) { Py_CLEAR(d); break; } if (PyList_Append(d, v) != 0) { /* appending found path to the return list */ Py_DECREF(v); Py_CLEAR(d); break; } /* ... */ } |
The readdir() function returns a pointer to a dirent structure representing the next directory entry in the directory stream pointed to by dirp. It returns NULL on reaching the end of the directory stream or if an error occurred.
—Linux readdir() man page
In Linux, the dirent structure is defined as follows:
1 2 3 4 5 6 7 | struct dirent { ino_t d_ino; /* inode number */ off_t d_off; /* offset to the next dirent */ unsigned short d_reclen; /* length of this record */ unsigned char d_type; /* type of file; */ char d_name[256]; /* filename */ }; |
As you can see, readdir() loops through a list of dirent structures, and there is no quarantie that the structures will be somehow sorted.
So, how one can act when a sorted os.listdir() behaviour is required? Pretty simple:
1 2 3 4 5 6 7 8 9 | lst = sorted(os.listdir(path)) # sorted files only files = sorted(f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))) # sorted dirs only dirs = sorted(d for d in os.listdir(path) if os.path.isdir(os.path.join(path, d))) |
Another Python mystery revealed!