Amazon ​S3: ​Direct ​Browser ​File ​Uploads

“If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility… The computer utility could become the basis of a new and important industry.”

– John McCarthy, speaking at the MIT Centennial in 1961

Indeed, his statement may have seemed a little over-ambitious and far-fetched at the time it was made, but today, as we look around and see cloud-native apps being deployed by fully leveraging cloud services and virtually no on-premise or self-managed infrastructure, do we realize the genius behind his thought. As per his prediction and further, computing, storage, networking, API interfaces, security services, have all become public utilities that are served on-demand with negligible configuration effort needed.

And the big daddy of the “utility companies” is Amazon – with its cloud services wing, Amazon Web Services (AWS) providing every computing utility imaginable, in the cloud and accessible to all.

 

Amazon S3: The mighty bucket

Amazon Simple Storage Service, as the name suggests, is a very clean abstraction for storing and retrieving files on cloud storage. But what lies underneath is an extremely scalable, fault-tolerant, high performance distributed storage service. It provides “buckets” – which are containers of data and can be individually configured for security and ownership. The buckets contain “objects” – which is an abstraction of files and folders.

Using S3 is as simple as configuring credentials and sending HTTP requests to store or retrieve the objects. The API is ReSTful as in each object is exposed as a resource with a suitable path and the usual HTTP verb semantics. Amazon also provides it’s SDK in various languages and platforms so that you get another layer of abstraction over making HTTP calls directly.

 

Filling the bucket: Uploading files to S3

Won’t it be magical if all your file upload/download tasks are outsourced to someone, and you just need to call an API to get it done securely and fast? Let’s take a look at how one can upload files programmatically to S3. We will see the approaches one by one, and then zero in on our desired one to explain it further.

  1. Server-to-S3 upload: This is probably the simplest one, wherein the file gets uploaded to your application server in the traditional way (using multipart data or as a binary blob), from where it is uploaded to S3 using the SDK. This approach, while more secure, is really not an ideal one because we are adding to the work done by our server – handling the file upload and then transfer to S3. For small files, this can work, but if the file size grows to the order of several megabytes, it can quickly make the server sluggish – unnecessarily
  2. Direct browser uploads using HTTP forms: This approach, removes the overhead stated above by allowing one to directly upload files from the browser to S3 using HTML forms. However, it’s slightly tedious. It requires creating an Access Control List on the client, and a policy, then encoding the policy using a signature. All this information needs to be arranged in HTML form inputs as per Amazon docs and then finally POSTing the form to the S3 URL. It also requires an extra network call
  3. Direct browser uploads using signed URL: This one gives us the best of both worlds – no server overhead of the first approach and extremely easy to use, unlike the second. Let’s explore this a bit further.

 

Signed URL: Your (time-limited!) key to the bucket

This approach works by providing the client, a unique URL for each file upload. The client then uses this URL to upload the data. This can be done as an AJAX call – no HTML forms needed. The URL is secure and time-limited – as in it won’t accept files that don’t match the one it was intended to receive and only be open for a limited time. This signed URL is generated by a service in your application – and that’s the only interaction needed with our app server, in this flow.

If we were to explain the flow of this approach in simple steps:

The client application sends the file name, and it’s type, to our app server. This is done using a simple AJAX call.

GET /my/route/signed_url?file_name=test.txt&file_type=text/plain

Our app server, (in our case, a lambda function) runs with IAM role which can access S3. It connects to S3 using the SDK, gives it the information about the file we want to upload and then receives a pre-signed URL from S3, which it sends in the response of the above request.

const params = {
    Bucket: BUCKET_NAME,
    Key: FILE_NAME,
    Expires: 6000,
    ContentType: FILE_TYPE
};
const s3 = new AWS.S3();
s3.getSignedUrl('putObject', params, callback);

 

The callback passed to s3.getSignedUrl will be called with the signed URL, by the S3 SDK. Your app should return the same as a response to the request in the first step. Now that the client has a direct way of uploading the file (that is, the URL returned by the above response), it can go ahead and add the file to the bucket by sending a PUT request to the returned URL:

PUT <signed_url>
Content-Type: <the file type which was sent in step 1>
.. other headers ..

<request body with the file>

And that’s it! It works with lightning speed, and S3 returns an HTTP 200 OK status if everything went well. If however, if something does go wrong, the service returns a very helpful error message. Let’s see some common errors and gotchas which we faced and how we overcame them. But first, a neat illustration of the above steps:

Direct browser upload to S3 - illustration

Why this approach is better:

Going with the signed URL approach has many benefits which makes the overall application snappy while making the developer’s life easy:

  • No commitments: You don’t need to give out any credentials in the client app.
  • Full control: In step (2) in the above illustration, before asking for the signed URL, the app itself can perform sanity checks and business logic validations. This adds a highly customizable intermediary step where you can decide whether to allow the upload or not.
  • Secure: S3 ensures that the file uploaded by the client was the same file for which the signed URL was generated, by generating a signature and storing it in the signed URL, and then on the upload, re-calculating the signature from the actual file and comparing the two.
  • Time limited: You can specify the timeout to be enforced. Thus, there is no “always open URL” and you always remain in control.
  • No app server effort required: Other than getting the signed URL generated and passing it to the client, that is

The Gotchas

There’s always a chance of things going awry when doing network programming, especially when you have to conform to the strict security requirements of Amazon. We encountered a couple of bummers where the solution was simple but not very intuitive. Here are those:

The Infamous “SignatureDoesNotMatch”

As described earlier, S3 creates a signature by looking at the file type and the file name, and inserts this in the URL. Then, on upload, it looks at this signature in the URL, and recalculates a fresh signature by looking at the file uploaded. If the signatures do not match, it doesn’t accept the file. For this to work, take care of the following details:

  • Pass the file’s type in the Content-Type header when sending it
  • The request should be HTTP PUT and not POST
  • The file type and name should exactly match the one which was provided while asking for the signed URL.

However, S3 does return a very helpful error message which makes debugging easier:

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>SignatureDoesNotMatch</Code>
<Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>
<StringToSignBytes>50 55 bla bla bla...</StringToSignBytes>
<RequestId>F7A8F1659DE5909C</RequestId>
<HostId>q+r+2T5K6mWHLKTZw0R9/jm22LyIfZFBTY8GEDznfmJwRxvaVJwPiu/hzUfuJWbW</HostId>
<StringToSign>PUT
    image/png
    1387565829
    x-amz-acl:authenticated-read
    /mybucketname/icons/f5430c16-32da-4315-837f-39a6cf9f47a1</StringToSign>
<AWSAccessKeyId>myaccesskey</AWSAccessKeyId></Error>

Bucket CORS

Since the actual file upload will be a Cross-Origin request naturally, we need the CORS to be configured from the server side (in this case, the S3 side). For that, go to Permissions > CORS Configuration and edit the policy to include your domain:

 

<!-- Sample policy -->
<CORSConfiguration>
    <CORSRule>
        <AllowedOrigin>*</AllowedOrigin>
        <AllowedMethod>GET</AllowedMethod>
        <AllowedMethod>PUT</AllowedMethod>
        <MaxAgeSeconds>3000</MaxAgeSeconds>
        <AllowedHeader>Authorization</AllowedHeader>
        <AllowedHeader>Content-Type</AllowedHeader>
    </CORSRule>
</CORSConfiguration>

Take care of the following details:

  • The PUT method must be added in the AllowedMethod section, since we want to be able to upload a file.
  • The Content-Type header must be allowed as we need to pass the same, in order to pass the signature verification.

 

S3e you!

Now you know how to leverage S3 for your file storage needs, and how to do the uploads effortlessly. Go get yourself a piece of the cloud action!

About CauseCode: We are a technology company specializing in Healthtech related Web and Mobile application development. We collaborate with passionate companies looking to change health and wellness tech for good. If you are a startup, enterprise or generally interested in digital health, we would love to hear from you! Let's connect at bootstrap@causecode.com

Leave a Reply

Your email address will not be published. Required fields are marked *

STAY UPDATED!

Do you want to get articles like these in your inbox?

Email *

Interested groups *
Healthtech
Business
Technical articles

Archives