If you’re doing anything even remotely related to web development — or if you’re just running your own website or blog — you’re probably familiar with the File Transfer Protocol. Regardless of whether this familiarity is voluntary or forced upon you by customers, legacy computer systems or incompetent hosting providers, I sincerely hope that you are aware of the costs of using FTP in this day and age. FTP is by far the most backwards and insecure protocol for transferring files, yet still it enjoys wide popularity due to it’s simplicity and availability. Speaking bluntly, FTP sucks and I wish for it to go away for the sake of everybody, and here’s why.
So, since you’ve already taken note of my strong dislike for the FTP protocol (RAS Syndrome, anyone?), I might as well give you some details. The main reason why I hate FTP is not only its (basically) non-existent security concept but also the general poor design of the protocol for the tasks at hand. Let me explain a few things first before we dive into the issues.
We’ll have to look back a couple of decades to a time when most of us probably had different things in mind than programming or weren’t even born yet. It’s the 1970s, where the Internet was a more peaceful and quiet place, accessed mostly by universities, certain companies and governments. A place where people trusted each other, where encryption was an unused luxury. Unfortunately, these days are long gone and the Internet has become a hostile environment where you can’t trust anyone, which is exactly why there is no place for FTP anymore.
FTP is a text based protocol that operates on two channels, namely the command channel and the data channel. For the data channel, we have to differentiate between two data transfer modes, ACTIVE and PASSIVE.
PASSIVE mode is where the client asks the server for IP and port number and then opens a separate data channel to the specified destination. Fair enough.
ACTIVE mode is the older one of the two and is rarely used nowadays. What the client does is basically tell the server where it can be found and then have the server initiate the data connection. The main reason why this mode isn’t really used anymore is because of Fire Walls and Network Address Translation (NAT) that generally block all incoming connections. These protective technologies were not used back in the day and therefore there caused no issues with a mode like this. Regardless, I’ve yet to find a realistic use case where ACTIVE mode makes sense and why it has been introduced in the first place.
However, before the data channel is created, the user has to be authenticated. This is done by sending user name and password in plain text to the server and awaiting its response (more on that later).
The following flow chart represents a conversation between client and server, with the client intending to upload a picture into the folder
/foo/bar/. It might go something like this:
As made plainly obvious, this is a ridiculous number of back-and-forth messaging that leads to a huge waste of time due to round-trip delay and control overhead. The delays occur not only on FTP level but also on TCP level since the transfer of every single file requires a new TCP connection to be opened, introducing yet another TCP handshake. Depending on your use case this might be acceptable, though it generally is not if you’re trying to transfer more than a handful of files. On a personal side note: I’ve had the pleasure of transferring hundreds of pictures and text files via a tunneled FTP connection from China to Europe. I did not have any other way of accessing the server, so *.tar.gzipping the files and extracting them on the server using SSH was not possible. In case of the text files, the bottleneck was clearly the round-trip overhead and not the connection throughput.
Another problem with data transfers over FTP is that there is basically no verification whether a transfer was completed successfully. The server is even unaware of the file size! As much as FTP is concerned, once a stream ends that’s the end of the file. It doesn’t matter if the connection was interrupted and half the file is missing. I’ve encountered this problem dozens of times and it annoys the heck out of me. At least transfers can usually be resumed from where they stopped — that is, if you notice that the file is corrupted in the first place.
I now want to point out and summarize the major deficits and annoyances of the File Transfer Protocol. These are the main reasons why I am trying to avoid having to use it whenever possible.
1) Credentials are Transmitted Unencrypted
This alone is already a major deal-breaker. User name and password are sent in plain text, readable to anyone who is able to intercept network traffic. You are basically granting everyone access to your files, starting from your local network along the entire routing path to the FTP server. You might as well shout your password out loud in public, it’s about the same level of security.
2) Data is Transmitted Unencrypted
Similar to the authentication process, the actual files are also transmitted without any form of encryption or protection. While this doesn’t matter as much for files that are going to be public anyway (i.e. website content, such as this picture of my cat), it is definitely a no-go for sensitive files.
3) Transferring a Ton of Files is Slow as Hell
As explained above, for each file that is to be transferred a new data channel must be opened. This means that TCP handshakes must be performed for every single file before FTP can start transferring the data. The protocol overhead might be negligible for large files but it completely destroys performance when numerous tiny files must be transferred.
4) Directories Can’t Be Deleted Directly
This one is also a major pain in the rear. It’s not possible to delete non-empty folders via FTP. Instead, all files have to be deleted individually and recursively one by one. This is done by frequently changing the current directory and iterating through all files, which is introducing even more overhead (due to the fact that FTP is stateful). Only if all contained files and directories have been deleted, then the folder can be removed as well. Ever tried to delete a deeply nested directory with hundreds of files via FTP? You’ve probably had more than enough time for a cup of coffee…
5) Special Characters / Unicode
In the ancient times when FTP was developed, nobody was talking about extended character sets. Why would anyone possibly need anything other than English letters? Well, things have changed but unfortunately FTP didn’t. Even though — apparently — there are some ways to get UTF-8 supported between client and server, it seems to be more of a hack than an actual solution to the problem. Since support for UTF-8 is neither concretely standardized nor widely implemented, it’s better to refrain from using anything but standard characters (as suggested by the FileZilla Wiki). I’ve so far been unable to get it to work with any of the hosting providers I worked with.
Of course there are several alternatives. I won’t go into details here but I want to at least mention FTPS and SFTP. FTPS (FTP Secure) is basically the same rubbish with a little bit of security added. SFTP (SSH File Transfer Protocol) on the other hand is something entirely different and is what I’m using when I have SSH access — and exactly there lies the problem. There is no point in denying that plain old FTP is widely used and almost always available while better technologies often pulled the short straw. Then again, if a hosting provider doesn’t offer alternatives it’s probably time to look for another company…