Posts

Showing posts with the label cloudpathlib

A convenient FileHandler to read text from local files and files on AWS S3


A convenient FileHandler to read from local and S3 files with cloudpathlib.AnyPath

How can we build one class to handle reads from both, local files as well as files in AWS S3? cloudpathlib is a nice package that can handle S3 paths (see my post on cloudpathlib.CloudPath). cloudpathlib.AnyPath is a polymorphic class that will automatically instantiate a CloudPath or a pathlib.Path object - whatever is appropriate based on the input string. In this example, we build a pydantic.BaseModel with one field file of type AnyPath. This pydantic model is our FileHandler to read content from files - be it from a local file system or AWS S3. The nice part is that AnyPath does all the heavy lifting for us since it will instantiate the appropriate Path or S3Path objects with a common interface. That's why we can just always call .read_text() in the get_content method of our FileHandler. To test our FileHandler class we use moto.mock_s3 (see my post on using moto.mock_s3) to mock calls to AWS S3 and tempfile.NamedTemporaryFile to create a temporary local file. The simplicity of the FileHandler class is unfortunately a bit buried in the setup for testing it in this example.


Github gist with code

dependencies: python3.9, boto3 ==1.24.51, cloudpathlib==0.10.0, moto==3.1.18, pydantic==1.9.2

How to handle AWS S3 paths using cloudpathlib?



You might be aware that pathlib.Path cannot properly deal with S3-like paths. Cloudpathlib is an easy to use package that does handle AWS S3 paths as well as other cloud provider paths. In this example we use cloudpathlib.CloudPath instead of pathlib.Path to instantiate a S3Path object from a S3 URL string. The interface of a S3Path is the same as for a PosixPath object. For example, we can call .parts on it to obtain all components of the S3Path. 


Github gist with code

dependencies: python3.9, cloudpathlib==0.10.0