in this tutorial, you will learn how to work along with Python's os module.
Table of Contents
Introduction
Basic Functions
List Files / Folders in Current Working Directory
Change working Directory
Create Single and Nested Directory Structure
Remove Single and Nested Directory Structure Recursively
Example with Data Processing
Conclusion
Introduction
Python is one of the most frequently used languages in recent times for various tasks such as data processing, data analysis, and website building. In this process, there are various tasks that are operating system dependent. Python allows the developer to use several OS-dependent functionalities with the Python module os. This package abstracts the functionalities of the platform and provides the python functions to navigate, create, delete and modify files and folders. In this tutorial one can expect to learn how to import this package, its basic functionalities and a sample project in python which uses this library for a data merging task.
Some Basic Functions
Let's explore the module with some example code.
Import the library:
importos
Let's get the list of methods that we can use with this module.
print(dir(os))
Output:
['DirEntry', 'F_OK', 'MutableMapping', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'PathLike', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_execvpe', '_exists', '_exit', '_fspath', '_get_exports_list', '_putenv', '_unsetenv', '_wrap_close', 'abc', 'abort', 'access', 'altsep', 'chdir', 'chmod', 'close', 'closerange', 'cpu_count', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fdopen', 'fsdecode', 'fsencode', 'fspath', 'fstat', 'fsync', 'ftruncate', 'get_exec_path', 'get_handle_inheritable', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getenv', 'getlogin', 'getpid', 'getppid', 'isatty', 'kill', 'linesep', 'link', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read', 'readlink', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sep', 'set_handle_inheritable', 'set_inheritable', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'st', 'startfile', 'stat', 'stat_float_times', 'stat_result', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sys', 'system', 'terminal_size', 'times', 'times_result', 'truncate', 'umask', 'uname_result', 'unlink', 'urandom', 'utime', 'waitpid', 'walk', 'write']
Now, using the getcwd method, we can retrieve the path of the current working directory.
print(os.getcwd())
Output:
C:\Users\hpandya\OneDrive\work\StackAbuse\os_python\os_python\Project
List Folders and Files
Let's list the folders/files in the current directory using listdir:
print(os.listdir())
Output:
['Data', 'Population_Data', 'README.md', 'tutorial.py', 'tutorial_v2.py']
As you can see, I have 2 folders: Data and Population_Data. I also have 3 files: README.mdmarkdown file, and two Python files namely, tutorial.py and tutorial_v2.py.
In order to get the entire tree structure of my project folder, let's write a function and then use os.walk() to iterate over all the files in each folder of the current directory.
# function to list files in each folder of the current working directorydeflist_files(startpath):forroot, dirs, filesinos.walk(startpath):# print(dirs)ifdir!='.git': level = root.replace(startpath,'').count(os.sep) indent =' '*4* (level) print('{}{}/'.format(indent, os.path.basename(root))) subindent =' '*4* (level +1)forfinfiles: print('{}{}'.format(subindent, f))
Call this function using the current working directory path, which we saw how to do earlier:
startpath = os.getcwd()
list_files(startpath)
Output:
Project/
README.md
tutorial.py
tutorial_v2.py
Data/
uscitiesv1.4.csv
Population_Data/
Alabama/
Alabama_population.csv
Alaska/
Alaska_population.csv
Arizona/
Arizona_population.csv
Arkansas/
Arkansas_population.csv
California/
California_population.csv
Colorado/
Colorado_population.csv
Connecticut/
Connecticut_population.csv
Delaware/
Delaware_population.csv
...
Note: The output has been truncated for brevity.
As seen from the output, the folders' names are ended with a / and the files within the folders have been indented four spaces to the right. The Data folder has one csv file named uscitiesv1.4.csv. This file has data about population for each city in the United States. The folder Population_Datahas folders for States, containing separated csv files for population data for each state, extracted from uscitiesv1.4.csv.
Change Working Directory
Let's change the working directory and enter into the directory of data with the state of "New York".
os.chdir('Population_Data/New York')
Now let's run the list_files method again, but in this directory.
list_files(os.getcwd())
Output:
NewYork/NewYork_population.csv
As you can see, we have entered the New York folder under Population_Data folder.
Create Single and Nested Directory Structure
Now, let's create a new directory called testdir in this directory.
os.mkdir('testdir')
list_files(os.getcwd())
Output:
NewYork/NewYork_population.csv testdir/
As you can see, it creates the new directory in the current working directory.
Let's create a nested directory with 2 levels.
os.mkdir('level1dir/level2dir')
Output:
Traceback (most recentcalllast):File"<ipython-input-12-ac5055572301>", line1,in os.mkdir('level1dir/level2dir')FileNotFoundError: [WinError3] Thesystemcannot find thepathspecified:'level1dir/level2dir'
Subscribe to our Newsletter
Get occassional tutorials, guides, and reviews in your inbox. No spam ever. Unsubscribe at any time.
Subscribe
We get an Error. To be specific, we get a FileNotFoundError. You might wonder, why a FileNotFound error when we are trying to create a directory.
The reason: the Python module looks for a directory called level1dir to create the directory level2dir. Since level1dir does not exist, in the first place, it throws a FileNotFoundError.
For purposes like this, the mkdirs() function is used instead, which can create multiple directories recursively.
os.makedirs('level1dir/level2dir')
Check the current directory tree,
list_files(os.getcwd())
Output:
NewYork/NewYork_population.csv level1dir/ level2dir/ testdir/
As we can see, now we have two subdirectories under New York folder. testdir and level1dir. level1dir has a directory underneath called level2dir.
Remove Single and Multiple Directories Recursively
The os module also had methods to modify or remove directories, which I'll show here.
Now, let's remove the directories we just created using rmdir:
os.rmdir('testdir')
Check the current directory tree to verify that the directory no longer exists:
list_files(os.getcwd())
Output:
NewYork/NewYork_population.csv level1dir/ level2dir/
As you can see, testdir has been deleted.
Let's try and delete the nested directory structure of level1dir and level2dir.
os.rmdir('level1dir')
Output:
OSError Traceback (most recentcalllast) in()----> 1 os.rmdir('level1dir')OSError: [WinError145] Thedirectoryisnotempty:'level1dir'
As seen, this throws a OSError and rightly so. It says level1dir directory is not empty. That is correct because it has level2dir underneath it.
With the rmdir method it is not possible to remove a non-empty directory, similar to the Unix command-line version.
Just like the makedirs() method, let's try rmdirs(), which recursively removes directories in a tree structure.
os.removedirs('level1dir/level2dir')
Let's see the directory tree structure again:
list_files(os.getcwd())
Output:
NewYork/NewYork_population.csv
This brings us to the previous state of the directory.
Example with Data Processing
So far we have explored how to view, create, and remove a nested directory structure. Now let's see an example of how the os module helps in data processing.
For that let's go one level up in the directory structure.
os.chdir('../')
With that, let's again view the directory tree structure.
list_files(os.getcwd())
Output:
Population_Data/
Alabama/
Alabama_population.csv
Alaska/
Alaska_population.csv
Arizona/
Arizona_population.csv
Arkansas/
Arkansas_population.csv
California/
California_population.csv
Colorado/
Colorado_population.csv
Connecticut/
Connecticut_population.csv
Delaware/
Delaware_population.csv
...
Note: The output has been truncated for brevity.
Let's merge the data from all of the states, iterating over the directory of each state and merging the CSV files likewise.
importosimportpandasaspd# create a list to hold the data from each statelist_states = []# iteratively loop over all the folders and add their data to the listforroot, dirs, filesinos.walk(os.getcwd()):iffiles: list_states.append(pd.read_csv(root+'/'+files[0], index_col=None))# merge the dataframes into a single dataframe using Pandas librarymerge_data = pd.concat(list_states[1:], sort=False)
Thanks in part to the os module we were able to create merge_data, which is a dataframe containing population data from every state.
Conclusion
In this article, we briefly explored different capabilities of Python's built-in os module. We also saw a brief example of how the module can be used in the world of Data Science and Analytics. It is important to understand that os has a lot more to offer, and based on the need of the developer a much more complex logic can be constructed.