Up to this point, a few things have happened:
- Google authorisation is set up,
- We created access to the drive (for read/write access), and
- The Pydrive package is available to navigate on the drive
Hopefully, when you are following along and running the code, you will see the image on the right, after having refreshed the panel. You can see the Shortcut in the image as a folder under “__Shared”, and we don’t see the “Shared with me” section, but because we have the Shortcut, we don’t need to see the “Shared with me” files.
Google Drive works differently than the file management in the local Operating Systems, the physical location of the files is not important because the objects are managed by ID in an unstructured DataLake, and we can access the files and folders by the ID.
Unfortunately, while os.path
(in Python) has walk functions to go over the file system, a similar method doesn’t exist for Google Drive (or I am not aware of this method). However, we can use the pydrive library, and walk manually through the folders in the directory tree, and luckily, we know where we want to go from the path of the folder. So, we don’t need to walk through the whole structure, but we can use the folder names of the data path to go deeper into the folder tree.
So, we loop over the small list (in this example, three items) to find the ID and use this ID to go to the next level. Note that the fourth level is commented out; we will get to this level in the second part of the file handling section of this notebook.
# File handling testing:
# There are in this example three folder levels:
# /content/gdrive/MyDrive/__Shared/<your Project>/DataDevelopment# Update these to your structure:
folderList1 = ["__Shared", your_Project ,"DataDevelopment"] #, "ExternalData"]
The loop, in the code block below, starts in the root, and when it finds an item in the list, the loop will use the ID of the object to go to the next level on the list, and if an item is not found, the code will prompt that the folder is not found and will not look for any folder deeper in the structure. The loop concludes with either the ID of the Shortcut folder or prompts that the folder is not found.
# Trying to copy the created dummy file:
boo_foundFolder = False
fileID = "root"
level = 0# View all folders and file in your Google Drive
# First loop over the list:
print("File and Folder structure - check with IDs")
for folderName in folderList1:
print(f"Checking: {folderName}")
if boo_foundFolder or fileID == "root": #first run
boo_foundFolder = False
fileList = drive.ListFile({'q': f"'{fileID}' in parents and trashed=false"}).GetList()
for file in fileList:
# Testing the name:
if(file['title'] == folderName):
fileID = file['id']
boo_foundFolder = True
level += 1
# end if
# end for
if boo_foundFolder == False:
print(f"folder not found")
break
# end if
# end if
# end for
print(f"Did we find the folder: {boo_foundFolder}")
if boo_foundFolder:
print(fileID)
ShortCutID = fileID
else:
ShortCutID = 0
At this moment, we have the local file ID for the working folder, but before we can look for files in this location we need to match this local ID with the target ID of the shared folder. To find this information, we have to look deeper into the Google infrastructure, and to do this, we need a helper: the drive_service. We activated the helper while we were loading the project, and we didn’t get a warning, which means we have access to the service by using the API, and requesting information by ID.
The details we need are best collected through a simple function, like the findTargetID
function in the next code block. In this function, the fileID
is the Shortcut ID we found by looping over the names in the folders, and by calling drive_service.files().get
and specifying the fields, we get the target ID of the folder (this will be the same ID as in the URL of the web interface of Google Drive (see Figure 1).
def findTargetID(fileID, drive_service):
# The ID of the shared file you want to get ShortcutDetails from
file_id = fileIDtry:
# Get the file details
file = drive_service.files().get(fileId=file_id,
fields="id, shortcutDetails").execute()
# Check if the file is a shortcut
if 'shortcutDetails' in file:
shortcut_details = file['shortcutDetails']
print("Shortcut Details:")
print(f"Target ID: {shortcut_details['targetId']}")
print(f"Target MIME Type: {shortcut_details['targetMimeType']}")
else:
print("The file is not a shortcut.")
# end if
except Exception as e:
print(f"An error occurred: {e}")
return shortcut_details['targetId']
if boo_foundFolder:
targetID = findTargetID(fileID, drive_service)
print(targetID)
else:
print("Folder not found")
# end if
With this target ID, we have access to the actual shared folder on the Google Data Server, and we are not working on the shortcut folder anymore.
To recap, the reason we created the shortcut folder was to be able to see the folder in our mounted list of folders. The category “Shared with me” is not mounted, but the shortcuts are. So with this new ID we can look for files.