recursive.codes

recursive.codes

recursive.codes


The Personal Blog of Todd Sharp

Controlling Your Cloud - Uploading Large Files To Oracle Object Storage

Posted By: Todd Sharp on 1/2/2019 10:42 GMT
Tagged: Cloud, Java, Open Source

In my last post, we took an introductory look at working with the Oracle Cloud Infrastructure (OCI) API with the OCI Java SDK.  I mentioned that my initial motivation for digging into the SDK was to handle large file uploads to OCI Object Storage, and in this post, we'll do just that.  

As I mentioned, HTTP wasn't originally meant to handle large file transfers (Hypertext Transfer Protocol).  Rather, file transfers were typically (and often, still) handled via FTP (File Transfer Protocol).  But web developers deal with globally distributed clients and FTP requires server setup, custom desktop clients, different firewall rules and authentication which ultimately means large files end up getting transferred over HTTP/S.  Bit Torrent can be a better solution if the circumstances allow, but distributed files aren't often the case that web developers are dealing with.  Thankfully, many advances in HTTP over the past several years have made large file transfer much easier to deal with, the main advance being chunked transfer encoding (known as "chunked" or "multipart" file upload).  You can read more about Oracle's support for multipart uploading, but to explain it in the simplest possible way a file is broken up into several pieces ("chunks"), uploaded (at the same time, if necessary), and reassembled into the original file once all of the pieces have been uploaded.

The process to utilize the Java SDK for multipart uploading involves, at a minimum, three steps.  Here's the JavaDocs for the SDK in case you're playing along at home and want more info.

  1. Initiate the multipart upload
  2. Upload the individual file parts
  3. Commit the upload

The SDK provides methods for all of the steps above, as well as a few additional steps for listing existing multipart uploads, etc.  Individual parts can be up to 50 GiB.  The SDK process using the ObjectClient (see the previous post) necessary to complete the three steps above are explained as such:

1.  Call ObjectClient.createMultipartUpload, passing an instance of a CreateMultipartUploadRequest (which contains an instance of CreateMultipartUploadRequestDetails)

To break down step 1, you're just telling the API "Hey, I want to upload a file.  The object name is "foo.jpg" and it's content type is "image/jpeg".  Can you give me an identifier so I can associate different pieces of that file later on?"  And the API will return that to you in the form of a CreateMultipartUploadResponse.  Here's the code:

So to create the upload, I make a call to /oci/upload-create and pass the objectName and contentType param.  I'm invoking it via Postman, but this could just as easily be a fetch() call in the browser:

So now we've got an upload identifier for further work (see "uploadId", #2 in the image above).  On to step 2 of the process:

2.  Call ObjectClient.uploadPart(), passing an instance of UploadPartRequest (including the uploadId, the objectName, a sequential part number, and the file chunk), which receives an UploadPartResponse.  The response will contain an "ETag" which we'll need to save, along with the part number, to complete the upload later on.

Here's what the code looks like for step 2:

And here's an invocation of step 2 in Postman, which was completed once for each part of the file that I chose to upload.  I'll save the ETag values along with each part number for use in the completion step.

Finally, step 3 is to complete the upload.

3.  Call ObjectClient.commitMultipartUpload(), passing an instance of CommitMultipartUploadRequest (which contains the object name, uploadId and an instance of CommitMultipartUploadDetails - which itself contains an array of CommitMultipartUploadPartDetails).

Sounds a bit complicated, but it's really not.  The code tells the story here:

When invoked, we get a simple result confirming the completion of the multipart upload commit!  If we head over to our bucket in Object Storage, we can see the file details for the uploaded and reassembled file:

And if we visit the URL via a presigned URL (or directly, if the bucket is public), we can see the image.  In this case, a picture of my dog Moses:

As I've hopefully illustrated, the Oracle SDK for multipart upload is pretty straightforward to use once it's broken down into the steps required.  There are a number of frontend libraries to assist you with multipart upload once you have the proper backend service in place (in my case, the file was simply broken up using the "split" command on my MacBook).  



Related Posts

Note: Comments are currently closed on this blog. Disqus is simply too bloated to justify its use with the low volume of comments on this blog. Please visit my contact page if you have something to say!