Translated from Chinese version 7ce78f8.
When someone notice that I am running something resembling Visual Studio Code on an iPad, their reactions mostly fall into two categories: some of them are surprised and then keep asking me how it is done, while some of them just leave me a sharp comment:
What’s the point of doing this? 🤣
The story starts from here. I am a math student. When I first got into college, I bought an iPad with a fancy stylus and started using it for both note-taking and scratch paper. The life of a math student is quite different from those studying natural science or some kind of engineering: our working setup is centered on something you can write on, and you can barely do “math stuff” if you don’t have a scratch paper to function as your external working memory. This is vastly different from another extreme, computer science students, who can probably conduct some serious work whenever a keyboard is available.
Some math students and professionals are proud of being able to do everything they need to do without the help of a computer, as if they are intellectual craftspeople repelling any intellectual power tools for their pride. (I personally think this kind of culture is poisonous). But sometimes we still need to work on computers, especially when it comes to typesetting your work in $\LaTeX$. Therefore my working setup becomes somewhat funny: I have to find a big enough table, and place my laptop and my iPad side by side, vertically.
Since iPad is advertised heavily as a versatile device, I started to look into how can I do the typesetting (or even coding) on my iPad. In two years I tried all kinds of weird solutions (and thus becoming a weird person in the eys of other people).
Early Attempts
Berkeley Open Computing Facility and Emacs
Open Computing Facility is a student organization in UC Berkeley that provides computational services (open source mirrors, printing, Web hosting, databases and HPC) to students and student organizations. They are also running a computer lab for use of Berkeley students, as well as a place to share stuff about Linux and Open Source Software.
I was lucky enough to obtain a shell account from them in 2022
for access to their public server.
It was my first time to use a real multi-user *nix host
(it felt so cool when you run who
and find out there are 60 other users online).
At that time I usually study in my apartment instead of going to the library because of the pandemic
(We are required to wear masks in the library,
but I just can’t focus with a mask on),
and my desk was too small to fit my full setup,
so I started to try working on my homework on the iPad alone.
This should be the simplest way to code on iPad:
find a remote host with your desired toolchain,
connect to its shell using an ssh
client,
and launch your editor of choice
in the command line.
We cannot install additional software ourselves on the OCF server,
but the sysadmin had prepared almost everything you need,
which has a somewhat retro feeling.
This software is not installed on this host. Please contact your system administrator.
At that time I was quite into Emacs, so I started working by installing some $\LaTeX$ plugins, and the experience was surprisingly good (except for my pinky finger). Emacs even has a half-done server/client usage, but the documented usage is only limited to local instance re-use for improved launch speed. If this architecture could be separated and place onto different hosts, it would become some kind of antique-tech VS Code remote development.
Runestone and A-Shell
It would always be better to be able to work offline.
This complaint also applies to a lot of online services nowadays, where some apps with strong offline nature are forced to be used with an Internet connection, and some computation are done on the server side while they can be done totally fine in client side (browser). It feels bad to pay a lot of money for a advanced desktop setup but ends up waiting for the overwhelmed server to complete the computation. Well, if this is making me uncomfortable, there are a lot to come. Because I have to pay a tax to lord Apple (irony) for just installing my own app on my iPad, Web has become the main approach for me to “smuggle” home-made app onto iPad.
The bigger problems comes from the mobile nature of iPad.
Protocols like ssh
are designed long before the era of mobile computing,
so they are always assume stable network connection with semi-static IP addresses.
They will generate a lot of complaint if you are moving around with your iPad:
ssh: packet_write_wait: Connection to xxx.xxx.xxx.xxx port 22: Broken pipe
Although an alternative program called Mosh
did some optimization for mobile ssh
sessions,
but not so many client app on iPad support it.
Since people are always complaining about iPad being overpowered,
it would be neat if we can run our workload locally.
Then I have to mention a weird while mind-blowing work:
A-Shell.
I was shocked when I first saw that it can run clang
compiler
in a native shell on iPad.
What, Apple have given up their AppStore
terms
about dynamic code execution?
After a closer look, I found out that the compilation target was W e b A S M.
Well… now we have the new order of the age of cyberspace.
A-Shell packed some common Unix tools and some language toolchains
and let them run in a real, native iPadOS shell.
Fun fact: iOS/iPadOS originate from macOS, while macOS is a genuine Unix® at its core. So, technically, Apple costumers are able to claim them as the true succession of Unix user group. 🤣
But because Apple’s conservative terms on Apps with dynamic behavior, to produce WebASM executable is the best thing they can do. A-Shell is pre-configured to execute them in the shell, just like normal executables. Interpretive languages like NodeJS, Python and Lua are not influenced by the term, so it would work pretty well if you are just writing some script without linking external binaries. (Well, you can also pack more programs into A-Shell itself and compile it yourself, but then you need to start fighting with AppStore sideloading).
The most bizarre thing is that A-Shell packed a (somewhat incomplete) TeXLive suite, which can be downloaded on demand.
TeXLive has a absurdly large installer package, as large as several gigabytes, and to make it worse, its installer is a bunch of
tcl/tk
scripts. So it usually takes hours to download and install on a normal PC setup, giving off a sense of scholarly grandeur. It should be engraved on a CD for a proper appreciation.
Unfortunately, there is no XeTeX
available
because the author had some trouble figuring out how to pack it into the app,
so there would be a lot of troubles when it comes to multilingual support.
(But it shouldn’t be a problem for you Americans, right?)
LuaTeX should be a good enough alternative if you are dealing solely with latin characters.
To make the setup more usable,
we can find a editor with graphical interface.
A-Shell comes with Vim installed
(but no Emacs, here starts the holy war, in the name of the Church of Emacs!),
but it does not provide proper way to re-map the Esc key,
rendering it unusable.
I was introduced to an editor called Runestone by a MacStories article.
This new-ish app was developed strictly adhering to the desktop class iPad App standard
introduced in WWDC 2022,
so it can work seamlessly with toolchains in A-Shell,
just like on a real *nix environment.
Since a long time ago, the way iOS stores App data has been like putting them in separate buckets, where access of other Apps’ file was restricted. This innovative way of managing data is fundamental for the upcoming mobile computing era, but the price is, sharing data across apps becomes pretty hard, which is always the center of critics for iOS and iPadOS. Apple have thought of many ways to solve this problem without breaking this framework, like that sharing menu approach, known for being anti-intuitive, and the later
security-scoped URL
approach. For the latter one, the app will ask iPadOS to provide the user with a file picker to select a file or a directory, resulting a restricted URL for the app to access outside resources. In this way, the working directory of Runestone, or even a folder provided by some cloud storage app, can be linked to A-Shell for ease of use.
Blink Code
Blink provides a restrictive native iPadOS shell the same way as A-Shell, with a full-fledge ssh client, mosh client and some usual Unix tools. (How geeky is that?) This app discovered these geeky users’ weird need to do hacking on iPad and does a lot of improvement-of-life design, among which there is a breakthrough towards the dream of doing proper work on iPad: it comes with a frontend for Code. This frontend supports many backends, as well as being used as a simple editor for local files. Since Blink has fairly good support for security-scoped URL, you can use the simple editor together with A-Shell and even get Git support using Working Copy. One of the improvement-of-life design is that Blink allows re-mapping the ESC key, so it would be very comfortable to code with a Vim plugin.
(However, OneDrive and GoogleDrive’s app do not have proper support for security-scoped URL, so they cannot be linked directly into Blink. It is their fault.)
Blink + Cloud Develop Environment
As described before, Blink Code can connect to Code backend on a remote server, allowing us to make use of the full-fledge dev environment on it.
You may or may not know that Visual Studio Code is built with Web technology, so converting it back into a Web App to provide some flexibility is a natural idea. In 2019, some people with big ambition implemented this and made it open source, which is later named code-server. This is a Plug and Play single-user Code instance, and is also available as Docker image for quick deployment. Soon after this, Gitpod extended this idea based on heavy use of containers and turned it into a Web service: you can spin up a Docker instance with their own implementation of Code server somewhat pre-configured according to your need, and then pull in your git repository. The container will be stopped and destroyed if it has been inactive for some time, since this service is billed by the amount of time the container is running. In this way, we can build a automatic and reproducible development environment whenever in need. After seeing this concept being verified by Gitpod’s success, Microsoft also presented their demo, which may then become GitHub Codespaces. In early 2023, Blink also provided their similar but more lightweight solution, now still in Beta. Apart from these general-purpose products, code-server and its variants are also seen in the Web interfaces of most major cloud computing providers, as a way to provide quick code editing environment in specific contexts. This category of products/services is called Cloud Development Environment (CDE).
When I first saw Microsoft’s demo, I was amazed, but I didn’t start to dive into this solution because if I use it in a browser on iPad, the frame of the browser will take away a lot of precious screen estate (Well, probably at that time I also thought typesetting on iPad is kinda pointless. I would just use Overleaf and spend more time on actual work.) Blink Code solves this problem by providing an embedded WebView for Code, which can provide a great full-screen experience. It also include some improvement-of-life features, like allowing you to re-map the ESC key. This finally makes Code on iPad somewhat usable. Blink Code support Gitpod and Github Codespaces. I had some experience with the former one because it has a free plan (the first 50 hours of usage is free per month). To use it, you only need to register, create a workspace by cloning your Git repository, and use it in the browser after the container is started.
To use it in Blink, copy the URI (which uniquely identifies your workspace) and run the following command in Blink: (replace ❄️ with whatever you’ve got)
code `https://❄️❄️❄️❄️.gitpod.io`
And here we go.
This setup does have some inconveniences.
Gitpod will spin up a Docker container every time you start your workspace,
and then destroy the container after 30 minutes of inactivity,
so you have to wait ~30s for Docker to re-start the container.
Only the files under /workspace
will survive the destruction.
Gitpod provides two ways for customization of your dev environment,
you either use the default all-in-one image and use a config file
to execute additional setup commands,
or to use customized images from Docker Hub, or even bring your own by providing a Dockerfile.
Such way of creating and destroying environment on container level according to usage
may be pretty helpful for avoiding all kinds of “why it doesn’t work on my machine” type problem.
For $\LaTeX$ environment, Gitpod suggests
writing your own Dockerfile to add TeXLive on top of
the all-in-one image,
and it will get cached for future use to avoid re-downloading the HUGE TeXLive distribution
every time the container is starting.
However, in my opinion, even for those serious development project, using Docker containers to insure the reproducibility of the dev environment is still an overkill. Docker’s approach achieves reproducibility and isolation of environment, with the price of poor composability and performance lost. But in the case of development environment, we do not really need the strong isolation provided by virtualization. OK, you are asking about my solution? Try NixOS.
Self-hosted code-server
Because the frequent construction and destruction of containers,
which consumes a lot of energy, generate a lot of pollution and therefore contributing to global warming,
makes me feel bad,
I started to host my own code-server.
(Truth: I was just too nervous about that 50-hour free limit.)
I first got into renting servers at 2013 or 2014 when I was trying to host a modded Minecraft server with my friends. At that time renting a personal VPS was a remote dream, and we have to rely on donation from players or find sponsorship to keep the server running. 10 years later, VPS rental has become a common business, and the computer hardware has developed to the stage of being able to fit the whole Twitter infrastructure onto one machine. Increasing density of computing and storage hardware, like this absurd 20 terabyte disk or this CPU with 192 cores in its package, contributes to further drop on the rental costs. Some bigger cloud provider are even providing free plans.
Among these cyberspace charities (not serious), Oracle is one of the most generous. (Maybe they feel guilty about earning too much money by just sleeping on their big pile of patents, so they decided to give back to the tech community, who knows.) Oracle’s free plan is probably the only one that provides “always free” resources in major cloud service providers (others usually give you a trial period and some credits), and you can run up to 6 VPSs without spending a penny. Two of them are micro x86 instances, while the bizarre thing happens on their free ARM machines (yeah, these instances are running on that insane 192-core platform):
I cannot quite believe these numbers until I saw someone hosting a GTNH game server on it… (Modded minecraft servers are knowing for being performance hungry, especially for those larger ModPacks.) Alongside free VPSs, Oracle also provides some misc cloud services for free, like provisioned database, PKI, load balancing, monitoring and auditing, and some object storage space. To get all of these, all you need is to register a proper credit card.
According to the tutorial provided by code-server team, you can quickly spin up a code-server instance on your server by using their installation script, per-built package or docker image. You can synchronize your work using Git, or mount your favorite cloud drive if you don’t want to abuse Git.
Next, here comes the most important part: we need to enable access to this service from iPad, properly.
At first I just want to see it up and running immediately so I just exposed the newly installed code-server to public internet, with default configuration… By default, code-server only has a password authentication and no HTTPS, making this authentication vulnerable to man-in-the-middle attack.
Never, ever expose your service to the public internet without appropriate security setup, even for a short amount of time! Modern IPv4 internet is scarry place where there are monsters nearby all the time.
At that time, after getting it online, I went to sleep, satisfied. I hadn’t realized how serious the problem was until I mentioned it with one of my friends. code-server provides shell access to Web users, and the user running it was a sudoer. If I left this setup run for a longer time, my VPS may be turned into Mordor. Who knows? I had to terminate this instance and create a new one.
To expose this service safely to the Internet, we need to do the following hardenings:
- An authentication method to make sure it is only available to designated people;
- an encryption method to make sure nobody can steal a valid identity to bypass the authentication;
- if this Web App has to be exposed to the public Internet, we will need an anti-penetration mechanism. code-server itself may not be safe as a Web server, so we need to put a reverse proxy server to “sanitize” the incoming HTTP traffic. Using a reverse proxy server can also provide many convenience, but we are not going to dive into that.
code-server itself is a somewhat simple Web application, which only supports that simple passphrase authentication (actually this is becoming the trend in real world Web Apps –why bother to implement additional authentication if we can offload this job to dedicated auth services?). Although we can simply expose it to the public Internet after setting up HTTPS and reverse proxy, but this passphrase may suffer from brute-force attack if someone have the time (and interest). code-server’s documentation provides several fancier ways to add authentication, including
- Use a reverse proxy server called OAuth2 Proxy and use it to enforce authentication by checking visitors’ OAuth token.
- Use Cloudflare Access. This is the free service provided by yet another Internet charity (still not serious). It works like a bigger version of reverse proxy server, running distributively on the edge networks around the world. Since the original server and Cloudflare are also communicating through the public Internet, Cloudflare provide a way to setup a VPN tunnel for this to avoid someone bypassing this proxy/auth mechanism.
- Use Pomerium, a open source alternative for Cloudflare Access that you can host on your own server. However, it may not be possible to build our own edge network, so we can only use it to replace Access but not the other benefits provided by Cloudflare.
SSH Tunnel
Besides the three fancier ways mentioned above, there are a it-just-work method to satisfy all three safety requirements. Blink provide us with a full fledge ssh client, and we can use its tunning feature to forward several ports from the server to the iPad in order to access the code-server behind a firewall on the server:
- Authentication is done by ssh’s public key / private key authentication;
- traffic encryption by tls provided by ssh protocol;
- from a outside view, the only active port on this server is port 22, with a battle-proven OpenSSH server behind it, which does not advertise any visible vulnerabilities.
In the whole process the code-server is hidden behind the firewall, unreachable from the outside.
In practice, suppose code-server is listening http://localhost:8080
and we want it to be available on iPad at address http://localhost:2333
,
we can create a Blink terminal tab and run
ssh -L localhost:2333:localhost:8080 <domain or IP of your server>
# The bunch of gibberish behind -L means mapping localhost:8080 to iPad's localhost:2333
# We will see later why should we specify the hostname when we are just forwarding ports
# Talking about bad notations...
# Can they use a different notation to separate remote and local parts, rather that a column?
# It is mixing up with the port number notation.
By the way, the way Blink does things is pretty unique compared to other ssh clients. You don’t need to find a setting hidden somewhere to enable port forwarding. Just pull up a new terminal and run the ssh port forwarding command. Pretty straightforward.
Such ssh tunneling thing works pretty well, and because of it, my Code on iPad setup became usable for its first time. But the disadvantage of this approach is something we have mentioned before: ssh is not comfortable with a mobile setting, either the client getting killed by iPad when idling, or the connection getting broken by change of network environment. For the first case Blink provides a…hacky solution. You can let Blink request geographical info from the system to keep the app running in background. Moreover, you can set it to automatically exit when you get out of certain geographical area. (Well…some giant Chinese tech companies also f@$k their users by keeping their apps running in the back using this method). But there are nothing we can do to solve the disconnecting problem if you carry your iPad to another network.
I encountered a funny thing when I first put ssh tunneling to use. Recall how the port forwarding works:
[local endpoint hostname:]<port>:<remote endpoint hostname>:<port>
Under the context of local port forwarding (
-L
), the remote endpoint hostname and port number indicates which host’s which port to be forwarded to local machine; the local endpoint hostname is a little bit more subtle, which acts like a filter, indicating packets sent to which host and port should be forwarded by the tunnel. By setting it tolocalhost
, we can restrict that only packets sent by programs on iPad will be forwarded (because only programs on iPad have access to iPad’slocalhost
), which is some kind of safety measure. Omitting the local endpoint hostname is equivalent to setting it to0.0.0.0
, i.e. “gotta forward ’em all”. The funny thing was, at first I forgot to set this, so everybody in the same LAN were able to access my code-server, without any authentication…
Tailscale
Now that we are already abusing ssh as a tunnel, why not just switch to some serious tunnel?
I believe almost everybody have the experience of being tortured by the VPN software provided by your school or company. What these tools do is similar to what we want to accomplish using ssh port forwarding: connect a computer to a trusted network without physically connecting to it, via a encrypted tunnel. Since the authentication and encryption is already done after establishing the tunnel, we can access the resources in the trusted network without additional authentication. This is not so obvious in the previous example, but actually the black box on the left and the iPad on the left forms a trusted virtual network together, and the traffic is un-authenticated and un-encrypted.
However, the ecosystem of open source VPN solutions was kind of crappy during the previous few years. Some of them does not have mobile network connection in mind when designing, resulting in timeouts whenever the network is not very stable; some has a somewhat 90s style retro-futurism taste, using some Ars Magica like IPSec, resulting in all kinds of troubles during deployment and usage, and sometimes even made out of order by some hostile networking infrastructure. Beside this, there are plenty of commercial solutions, which act almost like malware.
After a while, fortunately, someone made a modernized VPN implementation, WireGuard (2015), based on the great progress on networking and cryptography in the recent years. It even provides implementation in the form of kernel modules for its Linux implementation, making its performance comparable to networking without a tunnel. Besides, WireGuard is designed with mobile devices in mind to support roaming, which is exactly what we need. Moreover, a company called Tailscale provides a control plane for WireGuard connections, implementing so-called Zero Trust safety model. This abstraction layer for WireGuard is what we are going to use in this section.
Zero Trust?
We just mentioned that ordinary VPN is designed to be the virtual extension of a trusted physical network, where users in untrusted public network can connect through it to reach a gateway server located in the trusted network, and access any protected resources using the gateway as a proxy. Since by design, every host in the trusted network is either physically present or virtually present trusted host, these resources do not need to deploy any security measure. But later on people started to realize problems of this approach: firstly, since the traffic of all external hosts are going through the gateway, sometimes the gateway will be overwhelmed even though with some bandwidth unused on the Internet link of the internal network; secondly, if one of the hosts on the internal network is penetrated, which is especially possible for those accessing through a tunnel, then the attacker can go anywhere horizontally in the internal network. This problem have become more serious when nowadays computing systems are becoming more and more distributed, since if servers in the same system are distributed in data centers around the world, it will be not so practical to maintain this “physical boundary–VPN access” model. One solution to this is to forget about the distinction between internal network and public network, treat every network as untrusted, and perform authentication for each traffic. This is the Zero Trust model that I am talking about.
There is a detailed introduction to how Tailscale works, whose overall idea is to establish tunnels only between hosts that are allowed to access each other, and if a host does not have access to another host, that host is simply unreachable on the virtual network.
When put into practice, we can connect our iPad to our server just by installing Tailscale on both ends and log into the same account, basically an open-box experience. When added to the Tailscale network, every host will be assigned a static private IP address that only makes sense in the virtual network. (They are actually using carrier grade NAT address range, which is beyond the scope of this post). On a connected host, you can simply use this address to connect to the corresponding host, even though they are not visible to the public network (behind a NAT or a firewall, or both). It feels like putting all hosts virtually into a LAN.
Although Tailscale is run in a fairly open style, with free access to all basic functionalities and access to the client software as free and open software, some people are feeling uneasy for its SaaS nature (well, although these people are doing well right now, what if one day they have run out of the money they got from their investors?) To this end, someone re-implemented the control plane server of Tailscale as an open source software for anyone in need, which is called Headscale (well…you guys do have a very distinctive tastes when naming things). Tailscale is pretty supportive to this project, even providing integration to these third-party control plane servers in their clients.
Actually quite a long time before Tailscale came into being (2019),
there has been an older solution called ZeroTier (2011).
It also allows user to set up their own control plane service.
Tailscale and ZeroTier has slight differences on technical approaches
(which also sometimes becomes religious wars on the Internet).
In general Tailscale focus more on the out-of-box experience,
and does better in a hostile network environment,
while ZeroTier allows higher level of customization
without needing to purchase a pro license.
Recently there is a newer alternative to ZeroTier’s approach called
NetMaker (2021),
which switches to the beloved WireGuard protocol,
with kernel module support for improved performance.
(Unfortunately ZeroTier didn’t have the chance
to adopt WireGuard but to implement their own,
because WireGuard was non-existent back then).
By the way, Tailscale has a blog post explaining the earliest motivation of this product. The computer network has become a completely different place from the perfect places for hacking and experimenting with crazy ideas, where there are all kinds of complicated and sensitive technology and infrastructure, as well as tons of weird, arcane technical legacies (well, look at the stuff I’m writing about! I think I am trying my best to talk about these things in a modern viewpoint, but the historical burden is unavoidable). You cannot just simply experiment with sending information to another computer over the Internet without crazy amount of knowledge about the underlying technological stack, and Internet has long derived from its original vision of being about to support arbitrary point-to-point communication to any host with an IP address, because of the existence of all kinds of weird boxes like firewalls and NAT devices, and those sneaky ISPs. To make things worse, as we mentioned before, there are always monsters nearby. I think for most of us, beside being forced to do so at gun point during an undergraduate computer science course, nobody still has the courage to actually play with networking themselves. So when I saw Tailscale’s vision of bringing back the “experimenting within a LAN” by the means of a secure virtual LAN over Internet, I was very excited.
Cloudflare Access and Public Internet Access
I used the Tailscale solution described in the previous section to complete most of my writings in the first half of 2023. Tailscale works great for this kind of usecase. During the Spring recess, when I was traveling at 60mph on a coach bus with intermittent mobile network, it still worked smoothly.
Even during the time I went back home in May, I can still get a fair experience across the Pacific. (Well, except the Vim plugin was not working. That’s a weird architecture. Why does it send all my keystrokes to the server side to process the key combination logics? You are not writing PHP!)
I thought this is the end of the story. Anyways, for a totally private service like code-server, making it accessible on public Internet is completely unnecessary. It would be totally fine for me to keep it in a tunnel if I am only using the Internet as a communication and distribution tool for a Web App, and I am using no ordinary tunnels, it is a post-modern advanced secure tunnel!
What changed the situation is that during 2023 Fall, my advisor suggested I build a academic homepage. I thought I can’t just host the page on those crappy VPSs I currently have (which sometimes even have the risk of being terminated). This is too far from being highly available, although I think no so many people are interested in some nobody’s homepage like mine, if one professor does have the interest to do so, it will be too bad if that webpage happens to be down. So I searched for free static web hosting services available nowadays. The result includes GitHub Pages (kinda familiar), Netlify, and Cloudflare Pages. To ensure my homepage is accessible throughout the globe, I chose Cloudflare Pages. My static website generator of choice is Hugo, which is said to be the State-of-Art™ solutions nowadays (maybe…). I spun up the website for just 10 minutes (even during a lesson…), but I was hooked on playing with Cloudflare services.
Why? Because I was amazed by what is made possible by Serverless Done Right™. Cloudflare has a interesting style of providing services: they have a lot of data centers and networking infrastructures, but not like other traditional cloud service providers, they never sell VPSs. I even suspect this is some kind of technical taste.
How does this relate with Code? We mentioned before that if we want to expose code-server to public Internet, we need to satisfy three security requirements. If we put a Cloudflare CDN between our code-server and the public Internet, then item 2 and 3 are satsified; in addition if we use the authentication service integrated in the CDN called Cloudflare Access, then item 1 is also satisfied.
To use Cloudflare Access, first we need to transfer the management of our domain to Cloudflare’s DNS server. When Cloudflare Access is enabled on a domain, Cloudflare will use its DNS server to temporarily re-direct the session to a authentication page to let the user log in.
After logging in, the user will be re-directed back to Cloudflare’s CDN server, and then the user can access the Web service that was originally locked down. As we mentioned before, since the server and Cloudflare’s infrastructure are communicating through the public Internet, they need to do this via a security tunnel to avoid someone bypassing the authentication, and the original host is hidden behind a firewall.
The tunnel is theoretically very easy to deploy, but in practice, because Cloudflare just reformed their tunnel service into their Zero Trust platform, (well, zero trust again… together with Federation and Serverless, my blog is becoming an exhibition of post-modern networking technologies…) the documentation is still updating (as of 2023), so it is kind of confusing. Before the reformation, when Cloudflare tunnel was called Argo tunnel, you need to fill out a config file (tutorial in Chinese) in order to properly spin up a tunnel endpoint. Now a new way of configuration is available: just install a client secret on the original host’s endpoint, then you can do all the configuration on the Web dashboard. The original config file approach is still available.
OK, back to actual setup. First we need to lock down a subdomain in Cloudflare Access’s dashboard. This should be done first! Don’t forget this order! Or, your Web App will be accessible to the public without any authentication, efficiently, using Cloudflare’s global network. Then we need to add some access rules.
It may require some explanation on how Cloudflare Access’s access rules work. After the user completes logging into a identity provider (IdP) (maybe some external ones like “login with Google” and whatever IdP you may build yourself, or the simple email OTP provided by Cloudflare as a starter kit), IdP will provide a table of key-value pair, signed to ensure authenticity. This table can be understood to be a credential saying: “I, the IdP you are trusting, certify that the user of concern has a user id xxxx in my database, with an email of [email protected]” and many other information, up to how the IdP is configured. If the IdP is communicating with Cloudflare Access using OpenID Connect protocol, then this table of key-value pairs is called OIDC claims. Cloudflare Access differentiate from different users using their email addresses, and it have a free limit of 50 active users per month. We can bypass this limit by hosting our own IdP and reporting fake email addresses to let in arbitrarily many active users. But this is well beyond the scope of this post.
The next step is to set up the tunnel between Cloudflare and original server. Cloudflare now recommends the new way of setting up tunnels, i.e. on the web dashboard. The Web dashboard will tell you what to do to install the tunnel service on your host. The following is an example.
# Run on the original server.
# Download latest cloudflared
curl -L --output cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
# Install cloudflared
sudo dpkg -i cloudflared.deb
# Install client secret to tunnel client
sudo cloudflared service install ❄️❄️❄️❄️❄️❄️❄️❄️
After the creation of the tunnel, we need to specify what to forward and where to forward in the web dashboard, just like in the case of ssh tunnels. We need to add an item under “public hostname” tab and set the other endpoint to be a subdomain matching the subdomain we just locked down using Access. The dashboard comes with detailed documentation for reference.
This setup is the current setup that I am currently using (as of end of 2023). Compared to the Tailscale setup, it seems fancier, and Cloudflare CDN may be able to cache some static resources for me for speed improvement (not quite sure about this). Moreover, it enables more flexibility, since I can elegantly log into any public computer and use my Code.
Just like the first time I tried to exposed code-server, I made some serious security mistake when I was configuring Cloudflare. After (finally) resolving the mystery of the old and new documentation, I started to change the name of the endpoint of the tunnel (because when I was trying out things, the naming of these were pledged with words like “test” and “temp”, which looked annoying…), but forgot to change the domain name in Access. And then I just left my service, fully open, on the public Internet for several hours… Thanks to the decision of running code-server in Docker, this time the VPS have not to be terminated and re-installed. (well, thank you, Docker!) This told me an important lesson: for these kind of distributed cloud service, since each component are communicating through the public Internet, it is quite easy to leak access to highly sensitive resources just by some minor mis-configuration.
The original post was finished in a rush before the end of Jan 1st, 2024, and the translation was done on Jan 15th, 2024 upon request of a friend of mine.
I think I still need more practice on writing longer posts. It took me almost half a year to finally finish this, and the translation took me two weeks.
🔏 Jack Wang @ Madison, WI, CST 2024/01/01