I’m at An Event Apart and have just listened to a great presentation by Jeremy Keith on data permanency. He touched on many issues, from technical ones regarding physical recording media and logical file formats, to digital rights management, legal issues like copyright, and what happens when the service which holds your data (delicious, Geocities, Myspace…) goes away or even just reorganizes its URL structure.
I used to operate my own hosting out of my house – apocalypse.org. At the time, the services I used it for were difficult to find for free – web hosting, shell access, email. I used it for myself and my own projects. I also wanted to continue a tradition from MIT of sharing computer access with other people, so I allowed many friends to use it for email and hosting as well.
In the end I gave up on it for a combination of reasons.
First, Internet delivery into the home did not have the level of reliability that I needed. It seemed that every time I traveled, some hardware glitch would happen either in my server or with my ISP. It was bad enough that the system I depended on went offline, but being responsible for dozens of other people depending on it was intolerable to me. Even when I used a “business class” service, my ISP did not seem to believe that businesses needed reliable Internet access or tech support outside of 9AM – 5PM Monday through Friday. Yes, I needed a better ISP, but one was not available to me given my location. These kinds of reliability issues can be a real killer for a business operated from your server.
Second, possibly because of the domain name, we became the target of many (sometimes successful) break-in attempts. I didn’t mind keeping the system software up to date but I did mind having to track down break-ins and clean up after them.
Finally, over time high quality free or cheap alternatives for email and web hosting became available, so the need to continue operating the system for the benefit of others lessened.
If you do operate your own server out of your home, you’re likely to run into technical issues that will make your life more difficult:
- getting a fixed IP address for your server. Some ISPs will offer a fixed IP address for an extra fee. There are ways to update DNS as your IP address changes, and while your ISP will likely not change your IP address frequently, having your server’s IP address change is distinctly non-optimal. Also, because of the way that SSL works, for each different domain that you host and that requires SSL, you will need a different IP address.
- routing: I have a pre-CIDR class C network number that I have used for my network. “Pre-CIDR” means that it’s routable independent of the bigger block of network numbers that it is a member of. It’s an exotic network number to deal with and your average ISP’s tech folks may simply not understand it, and may have difficulty setting up their routing to handle it. It’s like being able to take your cell phone number with you when you switch from AT&T to Verizon. Unfortunately my ISP would often forget to route my network number when they updated their routers, and I would lose Internet connectivity.
- upload (outbound) speeds are generally much slower than download (inbound) speeds. The common technologies for bringing Internet into the home (cable, DSL, satellite) are designed to bring data to you but not so much to deliver data from you. This makes sense as most users consume far more bytes than they provide, especially in the age of Internet-delivered digital video. Yes, from a purely technical perspective you are transmitting data, and outbound data is absolutely essential – but you don’t need much bandwidth just to allow the protocols to work. This doesn’t work well when you’re being more of data provider than consumer. You can move to other Internet delivery services like T-1 lines instead of cable or DSL, but they are substantially more expensive for substantially less bandwidth (perhaps $1000/month for 1.44Mbps versus $100/month for 100Mbps). Cable and DSL also tend to have data caps that may become problems for server operators.
- packet filtering – Many ISPs do port filtering that will prevent email sent to your home-located server. They may also prevent access of a home web server or filter out other protocols. You may be able to purchase “business service” from the at a higher price with fewer restrictions.
If you’re interested in self-hosting, I would suggest that a compromise between control and reliability may be to use a colocation service. With a colo, you provide a computer that you own and locate it in someone else’s data center. The upsides are that the people who operate the data center should be more reliable than your local ISP, and you own the hardware that your data resides on. The downside is that the hardware is most likely outside your direct physical control (unless you happen to live next to the colo data center and can actually get access to it).
For best results, you might want to put two machines in different colo data centers and keep them synced with one another.
Further out from the ideal of a home-based self-hosted server is a “virtual private server” of the kind that Slicehost (a Rackspace company) offers. In this case, the data center owns the server and uses virtualization software to lease a share of it to you. You are still responsible for the software on the server and for backups and dealing with security issues. Slicehost offers a backup service that is helpful, where they copy your entire server slice on a set schedule. It’s brute-force compared to a more server-specific backup scheme, but then how many people go to the bother of actually doing backups?
If a piece of a server doesn’t appeal to you, you can use a dedicated hosting company like Rackspace, but unless you have serious performance demands beyond what a simple web server or email server would likely need, this probably won’t make financial sense to you.
You can go further to shared hosting setups like Dreamhost, where you share a server with many other users. This has many shortcomings, but still leaves you with more control than you’ll have when your data is completely under the control of a third party service like Twitter, Facebook or Blogger.
In the end, it’s all about tradeoffs. You’re trading off one thing (increased longevity and control of your data, increased ability to deal with security issues) against another (potentially decreased reliability, increased administrative complexity, possibly dealing with backups and security issues). A lot depends on your level of technical expertise and your willingness to deal with complexity versus delegating the handling of technical matters to strangers.
A bit of irony here… immediately after the Jeremy Keith’s talk, I checked my email and found a message from Slicehost about the fact that they will be ending operations and transitioning their customers to Rackspace (which owns them but operates independently). The impact of the transition should be minimal but it was quite a surprise, and quite well-timed.