We’ve recently moved to having our filestore NFS exported from a cluster. This provides almost complete resilience from hardware failures, and moves us away from depending on individual end-user systems with locally attached filestore.
Given the inherent insecurities with NFS we opted to use IPsec authentication (but not encryption) between the hosts involved. The NFS server only accepts connections from a list of hosts, and we know those hosts are who they say they are by relying on the IPsec authentication. We’ve also made it use privileged ports to ensure local users don’t try any spoofing 🙂
The trade-off here appears to be latency. I’ve done some completely unscientific tests that involved shovelling UDP data at a fixed rate between two machines. These are the “jitter” figures they produced:
- 0.10ms – direct
- 0.30ms – via router
- 0.70ms – via router with IPsec
Bear in mind that those figures might not bear any relation to the latencies involved with NFS packets, but it should give an idea of the relative delays added by routing and IPsec.
We could, to some extent, reduce those figures by replacing hardware. Quicker routers would undoubtedly remove some of the routing latency, and quicker machines could perform the IPsec calculations faster. But this probably isn’t the cheapest solution.
The first test I want to try is adding a private network between the NFS server and NFS client, with no routing involved. Seeing as it’s private we can reasonably trust that people won’t be able to spoof packets on that network and remove the IPsec authentication. In theory, these differences could signficantly reduce the latencies involved.
We’ll continue to monitor this for a while first, though. We need to keep an eye on loading on the NFS server, network usage, and so on. But, at the moment, it seems likely the problems are in the network part of NFS communication process.