Posts

Showing posts with the label Path

A convenient FileHandler to read text from local files and files on AWS S3


A convenient FileHandler to read from local and S3 files with cloudpathlib.AnyPath

How can we build one class to handle reads from both, local files as well as files in AWS S3? cloudpathlib is a nice package that can handle S3 paths (see my post on cloudpathlib.CloudPath). cloudpathlib.AnyPath is a polymorphic class that will automatically instantiate a CloudPath or a pathlib.Path object - whatever is appropriate based on the input string. In this example, we build a pydantic.BaseModel with one field file of type AnyPath. This pydantic model is our FileHandler to read content from files - be it from a local file system or AWS S3. The nice part is that AnyPath does all the heavy lifting for us since it will instantiate the appropriate Path or S3Path objects with a common interface. That's why we can just always call .read_text() in the get_content method of our FileHandler. To test our FileHandler class we use moto.mock_s3 (see my post on using moto.mock_s3) to mock calls to AWS S3 and tempfile.NamedTemporaryFile to create a temporary local file. The simplicity of the FileHandler class is unfortunately a bit buried in the setup for testing it in this example.


Github gist with code

dependencies: python3.9, boto3 ==1.24.51, cloudpathlib==0.10.0, moto==3.1.18, pydantic==1.9.2

How to get all files in a directory and delete them?


Obtain all files in a pathlib.Path directory and delete them


If we have a directory as a Path object we can call .iterdir() on it to obtain a generator that will yield all files present in that directory as Path objects. And if we want to delete all files in a given directory we can .unlink on every Path object returned by iterdir(). Using missing_ok=True is to avoid any race-conditions.


Github gist with code

dependencies: python3.9

How to create a directory and all of it's parent directories if none or some of them do not exist yet?



If you want to create a directory and all of it's parent directories, working with pathlib.Path objects this can be achieved by using Path.mkdir with parents=True. Setting exist_ok=True ensures that no FileExistsError is raised if the directory already exists. This way we can run it over and over again and always end up with the same result.


Github gist with code

dependencies: python3.9