Best Way to Download Large Files Python

January 31, 2022 Post a Comment

How To Upload and Download Files in AWS S3 with Python and Boto3

Introduction

In this How To tutorial I demonstrate how to perform file storage management with AWS S3 using Python'south boto3 AWS library. Specifially I provide examples of configuring boto3, creating S3 buckets, as well as uploading and downloading files to and from S3 buckets.

Creating a Boto3 User in AWS For Programmatic Access

Equally a start footstep I brand a new user in AWS's management console that I'll use in conjunction with the boto3 library to access my AWS account programmatically. Its considered a best practise to create a split up and specific user for utilise with boto3 equally it makes it easier to track and manage.

To start I enter IAM in the search bar of the services menu and select the menu item.

Following that I click the Add user button.

On the following screen I enter a username of boto3-demo and brand sure only Programmatic admission item is selected and click the side by side button.

On the side by side screen I attach a permission policy of AmazonS3FullAccess and so click the next button

So click next until the credentials screen is testify as seen below. On this screen I click the Download .csv button. I will need these credentials to configure Boto3 to let me to access my AWS account programmatically.

Installing Boto3

Before writing any Python code I must install the AWS Python library named Boto3 which I volition use to collaborate with the AWS S3 service. To accomplish this I fix a Python3 virtual environment every bit I feel that is a best exercise for any new project regardless of size and intent.

          $ mkdir aws_s3 $ cd aws_s3  $ python3 -k venv venv $ source venv/bin/actuate (venv) $ pip install boto3

Configuring Boto3 and Boto3 User Credentials

With the boto3-demo user created and the Boto3 package installed I can now setup the configuration to enable authenticated access to my AWS account. There a few different ways to handle this and the one I like best is to store the admission key id and secret access key values as environment variables and then use the Python os module from the standard library to feed them into the boto3 library for hallmark. In that location is a handy Python packet called python-dotenv which allows yous to put environs variables in a file named .env then load them into you Python source lawmaking and so, I'll brainstorm this section by installing information technology.

          (venv) $ pip install python-dotenv

Post-obit this I make a .env file and place the 2 variables in it as shown below merely, obviously you'll want to put in your own values for these that you downloaded in the earlier step for creating the boto3 user in AWS console.

          AWS_ACCESS_KEY_ID=MYACCESSKEYID AWS_ACCESS_KEY_SECRET=MYACCESSKEYSECRET

Next I make a Python module named file_manager.py so inside I import the os and boto3 modules every bit well as the load_dotenv function from the python-dotenv package. Following that I call the load_dotenv() function which will autofind a .env file in the aforementioned directory and read in the variable into the environment making them attainable via the bone module.

Then I create a function named aws_session(...) for generating an authenticated Session object accessing the ecology variables with the bone.getenv(...) function while returning a session object.

          # file_manager.py  import os import boto3  from dotenv import load_dotenv load_dotenv(verbose=True)  def aws_session(region_name='u.s.a.-e-i'):     render boto3.session.Session(aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),                                 aws_secret_access_key=bone.getenv('AWS_ACCESS_KEY_SECRET'),                                 region_name=region_name)

I will then utilize this session object to interact with the AWS platform via a high-level abstraction object Boto3 provides known as the AWS Resource. When used in conjunction with my aws_session() function I can create a S3 resource like then.

          session = aws_session() s3_resource = session.resource('s3')

Creating an S3 Bucket Programmically with Boto3

I can now motion on to making a publically readable saucepan which will serve as the top level container for file objects inside S3. I will do this within a function named make_bucket as shown below.

          def make_bucket(name, acl):     session = aws_session()     s3_resource = session.resource('s3')     render s3_resource.create_bucket(Bucket=name, ACL=acl)  s3_bucket = make_bucket('tci-s3-demo', 'public-read')

The key point to note hither is that I've used the Resources grade'due south create_bucket method to create the saucepan passing it a string name which conforms to AWS naming rules along with an ACL parameter which is a string represeting an Admission Control List policy which in this case is for public reading.

Uploading a File to S3 Using Boto3

At this betoken I tin upload files to this newly created buchet using the Boto3 Saucepan resource class. Beneath is a demo file named children.csv that I'll be working with.

          proper name, age Kallin, 3 Cameron, 0

In conjunction with practiced exercise of reusability I'll again brand a function to upload files given a file path and bucket name as shown below.

          def upload_file_to_bucket(bucket_name, file_path):     session = aws_session()     s3_resource = session.resources('s3')     file_dir, file_name = os.path.dissever(file_path)      bucket = s3_resource.Bucket(bucket_name)     saucepan.upload_file(       Filename=file_path,       Key=file_name,       ExtraArgs={'ACL': 'public-read'}     )      s3_url = f"https://{bucket_name}.s3.amazonaws.com/{file_name}"     return s3_url  s3_url = upload_file_to_bucket('tci-s3-demo', 'children.csv') print(s3_url) # https://tci-s3-demo.s3.amazonaws.com/children.csv

Hither I use the Bucket resource class'southward upload_file(...) method to upload the children.csv file. The parameters to this method are a footling confusing so let me explain them a little. First y'all have the Filename parameter which is actually the path to the file you wish to upload so there is the Key parameter which is a unique identifier for the S3 object and must confirm to AWS object naming rules similar to S3 buckets.

The upload_file_to_bucket(...) function uploads the given file to the specified bucket and returns the AWS S3 resource url to the calling code.

Uploading In-Retentivity Data to S3 using Boto3

While uploading a file that already exists on the filesystem is a very common utilise example when writing software that utilizes S3 object based storage there is no demand to write a file to disk just for the sole purpose of uploading it to S3. You can instead upload any byte serialized data in a using the put(...) method on a Boto3 Object resource.

Below I am showing another new resuable office that takes bytes information, a bucket name and an s3 object fundamental which it then uploads and saves to S3 as an object.

          def upload_data_to_bucket(bytes_data, bucket_name, s3_key):     session = aws_session()     s3_resource = session.resources('s3')     obj = s3_resource.Object(bucket_name, s3_key)     obj.put(ACL='private', Body=bytes_data)      s3_url = f"https://{bucket_name}.s3.amazonaws.com/{s3_key}"     render s3_url   data = [   'My name is Adam',   'I live in Lincoln',   'I have a beagle named Doc Vacation' ] bytes_data = '\northward'.bring together(data).encode('utf-8') s3_url = upload_data_to_bucket(bytes_data, 'tci-s3-demo', 'about.txt') print(s3_url) # 'https://tci-s3-demo.s3.amazonaws.com/about.txt'

Downloading a File from S3 using Boto3

Next I'll demonstrate downloading the same children.csv S3 file object that was just uploaded. This is very similar to uploading except you utilise the download_file method of the Bucket resource class.

          def download_file_from_bucket(bucket_name, s3_key, dst_path):     session = aws_session()     s3_resource = session.resources('s3')     bucket = s3_resource.Bucket(bucket_name)     saucepan.download_file(Key=s3_key, Filename=dst_path)  download_file_from_bucket('tci-s3-demo', 'children.csv', 'children_download.csv') with open up('children_download.csv') as fo:     print(fo.read())

Which outputs the following from the downloaded file.

          name,age Kallin,3 Cameron,0

Downloading a File from S3 to Retentiveness using Boto3

There will likely be times when you are only downloading S3 object data to immediately process then throw abroad without ever needing to save the data locally. Downloading information in this fashion still requires using some sort of file-like object in binary mode but, luckily the Python linguistic communication provides the helpful streaming grade BytesIO from the io module which handles in retentiveness stream handling lke this.

To download the S3 object data in this way you will want to employ the download_fileobj(...) method of the S3 Object resource class as demonstrated below past downloading the near.txt file uploaded from in-retentivity information perviously.

          def download_data_from_bucket(bucket_name, s3_key):     session = aws_session()     s3_resource = session.resources('s3')     obj = s3_resource.Object(bucket_name, s3_key)     io_stream = io.BytesIO()     obj.download_fileobj(io_stream)      io_stream.seek(0)     data = io_stream.read().decode('utf-viii')      return data   about_data = download_data_from_bucket('tci-s3-demo', 'most.txt') print(about_data)

Prints the post-obit information.

          My proper name is Adam I live in Lincoln I have a beagle named Physician Holiday

Resources for Learning More

Python Tricks: A Buffet of Awesome Python Features is truly a fantastic collection of Python productivity hacks
Fluent Python: Clear, Concise, and Effective Programming is a must read for any Python programmer serious about code adroitness
Amazon Spider web Services in Action is a pragmatic book full of splendid use cases of the many services provided by AWS

thecodinginterface.com earns commision from sales of linked products such as the books above. This enables providing continued free tutorials and content and so, thank you for supporting the authors of these resources as well every bit thecodinginterface.com

Conclusion

In this How To article I have demonstrated how to gear up upward and apply the Python Boto3 library to access files transferring them to and from AWS S3 object storage.

For completeness here is the complete source code for the file_manager.py module that was used in this tutorial.

          # file_manager.py  import os import boto3 import io  from dotenv import load_dotenv load_dotenv(verbose=True)   def aws_session(region_name='us-east-1'):     render boto3.session.Session(aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),                                 aws_secret_access_key=os.getenv('AWS_ACCESS_KEY_SECRET'),                                 region_name=region_name)   def make_bucket(proper noun, acl):     session = aws_session()     s3_resource = session.resource('s3')     render s3_resource.create_bucket(Saucepan=name, ACL=acl)   def upload_file_to_bucket(bucket_name, file_path):     session = aws_session()     s3_resource = session.resources('s3')     file_dir, file_name = os.path.split(file_path)      bucket = s3_resource.Bucket(bucket_name)     bucket.upload_file(       Filename=file_path,       Key=file_name,       ExtraArgs={'ACL': 'public-read'}     )      s3_url = f"https://{bucket_name}.s3.amazonaws.com/{file_name}"     render s3_url   def download_file_from_bucket(bucket_name, s3_key, dst_path):     session = aws_session()     s3_resource = session.resource('s3')     bucket = s3_resource.Bucket(bucket_name)     saucepan.download_file(Fundamental=s3_key, Filename=dst_path)    def upload_data_to_bucket(bytes_data, bucket_name, s3_key):     session = aws_session()     s3_resource = session.resource('s3')     obj = s3_resource.Object(bucket_name, s3_key)     obj.put(ACL='private', Torso=bytes_data)      s3_url = f"https://{bucket_name}.s3.amazonaws.com/{s3_key}"     return s3_url   def download_data_from_bucket(bucket_name, s3_key):     session = aws_session()     s3_resource = session.resource('s3')     obj = s3_resource.Object(bucket_name, s3_key)     io_stream = io.BytesIO()     obj.download_fileobj(io_stream)      io_stream.seek(0)     data = io_stream.read().decode('utf-eight')      return data

As always, I give thanks y'all for reading and experience free to ask questions or critique in the comments department below.

Share with friends and colleagues

DOWNLOAD HERE

Posted by: garzahassaid.blogspot.com

Garza Hassaid