Working with filesystems is one of the most trivial tasks in programming. Surprisingly, many of us still get it wrong since we tend to represent file paths as strings. This is fundamentally wrong and one of the most common anti-patterns that for sure you have probably already seen in many different Python repositories.
In today’s article we will discuss why it’s a bad idea to use strings (or even the os
module) for representing paths on filesystems. Furthermore, we will discuss best practices and see in action how to use pathlib
package to properly code file paths in Python. Let’s get started!
Why using strings to represent paths is a bad idea
Different operating systems use different naming conventions when it comes to representing paths on their file systems. For example, Unix makes use of a forward slash /
as the directory separator while Windows use backslashes \
# Unix (e.g. Linux, OSX, etc.)
/home/this/is/a/path/to/a/directory# Windows
C:\home\this\is\a\path\to\a\directory
Code portability refers to a set of principles that enable source code to run on multiple different environments with the same behaviour. Therefore, path representation with strings would not make this possible, unless we handle paths differently, based on the operating system the source code runs.
But even in that case, we would make our code messy and unnecessarily complex.
# This is a bad practice
import platformif platform.system() == 'Windows':
filepath = 'C:\home\this\is\a\path\to\a\directory'
else: # e.g. 'Darwin' for OSX or 'Linux'
filepath = '/home/this/is/a/path/to/a/directory'
Further operations on strings representing paths will also become more complex. Suppose you would like to concatenate two paths — plain string concatenation might result in an invalid path, especially if one or more strings contain special characters such as forward or backward slashes.
path_1 = '/this/is/a/path/'
path_2 = '/another/path'# filepath = '/this/is/a/path//another/path'
filepath = path_1 + path_2