How to actually deploy your own cloud storage solution
15 minute read 09 Apr 2020I’ve been entertaining a particular thought for a very long time now: should I be hosting my own personal cloud storage? In this post, we’ll be exploring the reasons behind my trail of thought, as well as walk through steps I followed (and the lessons I had to learn) in order to deploy my very own Nextcloud instance, aiming to spend as little money as possible.
It’s a journey filled with surprising lessons about cloud infrastructure and the idiosyncrasies of the Nextcloud platform, many of which I haven’t seen properly documented in any of the “beginner” guides out there. So here we are, a post on how to actually deploy your own cloud storage solution.
For those in a hurry, we’ll be using AWS’ Lightsail service as our compute environment and Ubuntu 18.04 as our Linux distro, but the instructions should be fairly similar across all cloud providers / Linux distributions.
I’m also going to assume you’re somewhat familiar with IAAS providers, cloud technology and the terminal. There’s nothing here I’d consider advanced (or even intermediate), but I’m not going to re-explain the basics here (as it’s been done to death on every other blog already).
But why roll your own?
Cloud storage solutions a la Dropbox, Google One, etc. are widely available, generally successful, affordable and easy to use. So why would you bother going through the technical effort of hosting your solution?
This is a question you’ll have to answer carefully for yourself. Even if you’re a technical person with a lot of experience deploying and managing web apps, it still requires a bit of your time (which is valuable) to maintain your own solution. And of course, if things break, you’ve got to fix it yourself. For a couple bucks per month, it generally makes sense to just pay for it to be somebody else’s problem. Especially if you value your time.
But what if you value more than just your time?
For me personally, the motivation for managing my own cloud storage is more philosophical in nature: I want control and longevity.
Let me explain what I mean.
Control
I want complete control over where my data lives, and who has access to it. In particular, I’m not comfortable with my data existing in a service, such as Dropbox or on Google Drive, where there is zero transparency on how things are arranged, and who has access to my data. I’m forced to hand over all my data, and trust that this third party isn’t going to do anything nefarious (or employ a nefarious individual). I don’t want my data to be mined, used for machine learning, or my usage patterns sold to the highest bidder through some cryptic EULA. I don’t care if my data, generated or otherwise, is anonymized.
What are the odds of this happening in practice? Probably fairly small. Probably. But I don’t want any of my data being accessible to anyone for whatever reason. People do bad things all the time, whether intentional or unintentional. I believe that the best custodian of my data is me, and so I want to keep that role for myself alone.
This begs the question: If I use an IAAS provider, such as AWS or Azure, to run my service and store my data, doesn’t that mean I’ve simply exchanged one potential evil for another? Well, yes and no. Yes, technically my data is stored by a third-party. But the service is much more generic – it’s only infrastructure. It’s not obvious that I’m running a service that stores personal data, and I have full control over how my data is stored, whether it’s encrypted or not, it’s geographical location, etc. Sure, someone can still go pull hard drives out of a server in a datacenter somewhere. But that’s an entirely different class of problem.
AWS’ business model doesn’t solely revolve around storing people’s personal and business data as a remote backup option. I’m a lot more comfortable with my data existing in some generic stratified infrastructure storage service than inside an opaque dedicated service that tells me nothing about the way my personal data is handled.
Longevity
I also desperately want to maintain the longevity of the service. We’ve all had it happen to us – a service is suddenly shut down, or acquired, or have its pricing model change, or have a critical feature removed, or be intentionally crippled, or be intentionally compromised due to external pressure. Each of these scenarios either results in frantically searching for a viable alternative, or (worse) having your data held ransom de facto. I want to mostly guarantee that my cloud storage will continue running for as long as possible, unfettered from executive boards, business plans, government pressures and entrepreneurial pivots. And also be easily accessible should anything go sideways. This, of course, means running open source software (more on this later).
At the end of the day, for me personally, those two reasons – control and longevity – are why I want my own service.
But that doesn’t mean I’m going to be paying out the wazoo, oh no. We’ll be doing this cheap. I’d like to have my metaphorical cake by trading in some of my time , not by spending more money. Let’s get on with the technical bit.
Hosting your own cloud storage the right way
Welcome to the practical part of this post. We’re going to be doing the following:
- Select a cloud storage solution (Nextcloud).
- Install Nextcloud on an Ubuntu 18.04 instance in the cloud
- Stop your Nextcloud install from imploding when opening a folder with a lot of image files
- Set-up S3 as a cheap external storage medium (and stop you from bankrupting yourself in the process)
1. Selecting a cloud storage solution
My user-requirements are relatively simple. I want:
- Folder-syncing (a la Dropbox)
- S3 or another cloud storage solution as an external storage option
- Easy installation / set-up
- A web interface
- An open source project
I originally became of aware of Nextcloud in 2016 in a Reddit thread discussing
the much-publicized split/hard-fork from
Owncloud,
and earmarked it for exactly a project like this. So for me, the choice was
almost immediate. It fulfils each of my user-requirements, particularly easy
installation (thanks to snap
), which is what ultimately drove my adoption.
I briefly investigated other solutions like SyncThing and Seafile. But neither of them were exactly what I was looking for. I recommend taking a look at both of them if you’re curious for something other than Nextcloud.
We now have our weapon of choice. Let’s get to deployment.
2. Install Nextcloud in the an AWS Lightsail instance
First things first, we’ll need to choose a compute environment. The official docs suggest a minimum of 512MB RAM, so you could technically go for the smallest AWS Lightsail instance (1vCPU, 512MB RAM) for $3.50 per month. This is what I tried originally, but it turned out to be a massive headache running Nextcloud on such tight constraints (lots of instability). To save you the suffering, I’d highly recommend using a compute environment with at least 1GB of RAM, which I’ve found to be the practical minimum for a stable deployment. This runs me $5 per month on AWS Lightsail. You also get a lovely 40GB SSD as part of the deal, which is nice (even though we’ll be using S3 as an additional external storage option).
I love Digital Ocean. I have used them in the past, and will continue to do so in the future. And, despite using AWS Lightsail for this particular deployment (since I want to avoid network charges when syncing to S3), Digital Ocean still has the best Nextcloud installation instructions on the internet.
So, to install Nextcloud on your compute environment, please follow the instructions on their tutorial (and consider supporting them in the future for their investment in documentation). Here’s an archive.today link should it ever disappear from the internet.
Just a note, I don’t have a domain name (did I mention I’m trying to do this on the cheap?), so I settled for setting up SSL with a self-signed certificate.
3. Stop your Nextcloud from imploding when viewing images
The first thing you’ll notice is that if you navigate to a folder that contains lots of images for the first using the web interface, your Nextcloud deployment will become non-responsive and break.
I know, right.
This forces you to ssh
back in and restart Nextcloud (sudo snap restart
nexcloud
, is the command you’ll need).
What happens (and this took me a long time to diagnose) is that when viewing a folder containing media files for the first time on the web interface, Nextcloud will attempt to generate “Previews” in various sizes for each of the images (certain sizes are for thumbnails, others for the “Gallery” view, etc.). I don’t know what the hell is going on internally, but this on-the-fly preview generation immediately saturates the CPU and fills up all the RAM within milliseconds (I suspect Nextcloud tries to spin up a separate process for each image in view, or something along those lines). This throttles the instance for a few minutes before the kernel decides to kill some Nextcloud processes in order to reclaim memory.
Here’s how to fix it. There’s an “easy” way and a “better” way.
The easy way is just to disable the preview generation altogether. If you’re not someone who’ll be viewing lots of images or relying on the thumbnails to find photos on the web interface, this is the fastest option.
SSH into your instance and open the config.php
with your favourite text editor
(don’t forget sudo
), and append 'enable_previews' => false
to the end of the
list of arguments at the bottom of the file. If you installed using snap
(as
per the Digital Ocean tutorial), the config file should be accessible at:
/var/snap/nextcloud/current/nextcloud/config/config.php
. Save and exit
(there’s no need to restart the service, config.php
is read each time a
request is made, I’m told). Problem solved, albeit without thumbnails or
previews.
Your config.php
should look something like this:
<?php
$CONFIG = array (
'apps_paths' =>
array (
0 =>
array (
'path' => '/snap/nextcloud/current/htdocs/apps',
'url' => '/apps',
'writable' => false,
),
1 =>
array (
'path' => '/var/snap/nextcloud/current/nextcloud/extra-apps',
'url' => '/extra-apps',
'writable' => true,
),
),
'supportedDatabases' =>
array (
0 => 'mysql',
),
'memcache.locking' => '\\OC\\Memcache\\Redis',
'memcache.local' => '\\OC\\Memcache\\Redis',
'redis' =>
array (
'host' => '/tmp/sockets/redis.sock',
'port' => 0,
),
'passwordsalt' => 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
'secret' => 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
'trusted_domains' =>
array (
0 => 'localhost',
1 => 'ip.ip.ip.ip',
),
'datadirectory' => '/var/snap/nextcloud/common/nextcloud/data',
'dbtype' => 'mysql',
'version' => '17.0.5.0',
'overwrite.cli.url' => 'http://localhost',
'dbname' => 'nextcloud',
'dbhost' => 'localhost:/tmp/sockets/mysql.sock',
'dbport' => '',
'dbtableprefix' => 'oc_',
'mysql.utf8mb4' => true,
'dbuser' => 'nextcloud',
'dbpassword' => 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
'installed' => true,
'instanceid' => 'XXXXXXXXXXXX',
'loglevel' => 2,
'maintenance' => false,
'enable_previews' => false, // <-- add this line
);
The better solution (and the one I chose) requires us to do two things: limit the dimensions of the generated previews, and then to generate the image previews periodically one-by-one in the background. This more-controlled preview generation doesn’t murder the tiny compute instance by bombarding it with multiple preview-generation requests the second users open a folder with images.
Here’s how to set this up (deep breath).
Edit your config.php
file again. We’ll be making sure previews are enabled,
but limiting their size to a maximum width and height of 1000 pixels, or a
maximum of 10 times the images’ original size (whichever occurs first). This
saves both on CPU demand, and also storage space (since these previews are
persisted after they’re generated).
Make sure the following three lines appear at the end of the argument list at
the bottom of your config.php
:
'enable_previews' => true,
'preview_max_x' => 1000,
'preview_max_y' => 1000,
'preview_max_scale_factor' => 10,
It should now look something like this:
<?php
$CONFIG = array (
'apps_paths' =>
array (
0 =>
array (
'path' => '/snap/nextcloud/current/htdocs/apps',
'url' => '/apps',
'writable' => false,
),
1 =>
array (
'path' => '/var/snap/nextcloud/current/nextcloud/extra-apps',
'url' => '/extra-apps',
'writable' => true,
),
),
'supportedDatabases' =>
array (
0 => 'mysql',
),
'memcache.locking' => '\\OC\\Memcache\\Redis',
'memcache.local' => '\\OC\\Memcache\\Redis',
'redis' =>
array (
'host' => '/tmp/sockets/redis.sock',
'port' => 0,
),
'passwordsalt' => 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
'secret' => 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
'trusted_domains' =>
array (
0 => 'localhost',
1 => 'ip.ip.ip.ip',
),
'datadirectory' => '/var/snap/nextcloud/common/nextcloud/data',
'dbtype' => 'mysql',
'version' => '17.0.5.0',
'overwrite.cli.url' => 'http://localhost',
'dbname' => 'nextcloud',
'dbhost' => 'localhost:/tmp/sockets/mysql.sock',
'dbport' => '',
'dbtableprefix' => 'oc_',
'mysql.utf8mb4' => true,
'dbuser' => 'nextcloud',
'dbpassword' => 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
'installed' => true,
'instanceid' => 'XXXXXXXXXXXX',
'loglevel' => 2,
'maintenance' => false,
'enable_previews' => true, // <-- change to true
'preview_max_x' => 1000, // <-- new
'preview_max_y' => 1000, // <-- new
'preview_max_scale_factor' => 10, // <-- new
);
Next, login with your admin account on the Nextcloud web interface and install and enable the “Preview Generator” app from the Nextcloud appstore. The project’s Github repo is here.
Head back to the terminal on your instance. We’ll need to execute the
preview:generate-all
command once after doing the installation. This command
scans through your entire Nextcloud and generates previews for every media file
on your Nextcloud (this may take a while if you’ve already uploaded a ton of
files). I say again, we only need to run this command once. The command is
executed using the Nextcloud occ
command (again, assuming you installed using
snap
):
sudo nextcloud.occ preview:generate-all
Next, we need to set-up a cron
job to run the preview:pre-generate
command
periodically. The preview:pre-generate
command generates previews for every
new file added to Nextcloud. Let’s walk through this process step-by-step. If
you’re unfamiliar with cron
, this is a great beginners
resource.
A few notes before we setup the cron job. The command must be executed as root
(since we installed using snap
), so we’ll have to make sure we’re using the
root user’s crontab. We’ll set it to run every 10 minutes, as recommended.
Add the service to the root crontab using:
sudo crontab -e
In the just-opened text editor, paste the following line:
10 * * * * /snap/bin/nextcloud.occ preview:pre-generate -vvv >> /tmp/mylog.log 2>&1
Save and close. Run sudo crontab -l
to list all the scheduled jobs, and make sure
our above command is in the list.
The above job instructs cron
to execute the preview:pre-generate
command
every 10 minutes. The -vvv
tag causes a verbose output, which we then log to a
file. If we see output in this log file that looks reasonable, we know our cron
job is set up correctly (otherwise we’d just be guessing). Upload a few new
media files to test and go make yourself a cup of coffee.
Once you’re back, and have waited at least 10 minutes, inspect the
/tmp/mylog.log
file for output:
cat /tmp/mylog.log
If you see something along the lines of:
2020-04-13T19:10:04+00:00 Generating previews for <path-to-file>.jpg
2020-04-13T19:10:05+00:00 Generating previews for <path-to-file>.jpg
2020-04-13T19:10:06+00:00 Generating previews for <path-to-file>.jpg
2020-04-13T19:10:07+00:00 Generating previews for <path-to-file>.jpg
2020-04-13T19:10:08+00:00 Generating previews for <path-to-file>.jpg
2020-04-13T19:10:09+00:00 Generating previews for <path-to-file>.jpg
2020-04-13T19:10:10+00:00 Generating previews for <path-to-file>.jpg
then everything is all set. Every 10 minutes, any new file will have its previews pre-generated. These generated previews will now simply be served on the web interface, no longer wrecking our tiny compute instance.
4. Set-up S3 as a cheap external storage medium (and stop you from bankrupting yourself in the process)
Our final step is to add an S3 bucket as external storage. It’s simple enough - but there’s an absolute crucial setting – “check for changes” or “filesystem check frequency” – that you need to turn off to prevent you from burning a hole in your wallet. We’ll get there in a moment, but first things first, let’s add S3 as an external storage option.
To set up external storage, we’ll need to enable the “External Storage” app, create an S3 user with an access key and secret on AWS, and then add the bucket to Nextcloud. This is well-documented in the official nextcloud manual, so I’m not going to rehash covered ground here. Just make sure to place your S3 bucket in the same location as your Lightsail instance to save on networking in/egress fees.
What you need to do next is set the “Filesystem Check Frequency” or “Check for changes” to “Never”.
It’ll be on “Once per direct access” by default, which will cost you a tremendous amount of money. To understand why, take a look at the AWS S3 pricing page. Pay particular attention to the cost of “PUT, COPY, POST and LIST” requests in comparison to the “GET, SELECT and all other requests”. What you’ll notice is that the former is 1000x more expensive than the latter. By leaving the “Filesystem Check Frequency” to “Once per direct access”, Nextcloud will constantly perform LIST requests on your bucket and stored objects. Nextcloud checks whether the Objects stored on S3 haven’t changed (perhaps due to being uploaded or modified by an additional service connected to your bucket). The constant barrage of LIST requests tally up the costs fast. In my case, it took Nextcloud less than a week to make over 1.4 million LIST requests. Ouch. So, unless you really have a need for Nextcloud to constantly scan S3 for changes (which is unlikely to be the case if your S3 bucket is only connected to Nextcloud), turn the option off.
Fortunately I made this mistake on your behalf.
Ever since flipping the switch, Nextcloud has only made a handful of queries to S3 (< 100) in the week since. Great!
Conclusion
Whew! That was a bit more nitty-gritty than our usual content.
Together, we walked through why you should consider deploying your own cloudstorage solution. For me personally, this amounted to to control and longevity. If this resonated with you, we explored the installation process of Nextcloud on a tiny AWS Lightsail instance and how to prevent the thing from falling over by pre-generating our image previews and reducing their size. Lastly, we went over attaching an S3 bucket as an external storage option to your Nextcloud instance, and how to disable one sneaky setting to prevent yourself from blowing a hole in your pocket.
All in all, I hope it’s been useful. It definitely was for me.
Until next time, Michael.