A convenient FileHandler to read text from local files and files on AWS S3


A convenient FileHandler to read from local and S3 files with cloudpathlib.AnyPath

How can we build one class to handle reads from both, local files as well as files in AWS S3? cloudpathlib is a nice package that can handle S3 paths (see my post on cloudpathlib.CloudPath). cloudpathlib.AnyPath is a polymorphic class that will automatically instantiate a CloudPath or a pathlib.Path object - whatever is appropriate based on the input string. In this example, we build a pydantic.BaseModel with one field file of type AnyPath. This pydantic model is our FileHandler to read content from files - be it from a local file system or AWS S3. The nice part is that AnyPath does all the heavy lifting for us since it will instantiate the appropriate Path or S3Path objects with a common interface. That's why we can just always call .read_text() in the get_content method of our FileHandler. To test our FileHandler class we use moto.mock_s3 (see my post on using moto.mock_s3) to mock calls to AWS S3 and tempfile.NamedTemporaryFile to create a temporary local file. The simplicity of the FileHandler class is unfortunately a bit buried in the setup for testing it in this example.


Github gist with code

dependencies: python3.9, boto3 ==1.24.51, cloudpathlib==0.10.0, moto==3.1.18, pydantic==1.9.2

Comments