Imagine you're a system administrator, and an email arrives from your boss. It goes something like this:
"Hey, bud, we need some new storage for Project Qux. We heard that this [insert major company here] uses a product called the Oracle Sun ZFS Storage Appliance as the back-end for their [insert really popular app here]. We want to do something like that at similar scale; can you evaluate how well that compares to XYZ storage we already own?"
So you get in touch with your friendly local ZFS sales dudette, who arranges a meeting that includes a Sales Engineer to talk about technical stuff related to your application. The appliance, however, has an absolutely dizzying array of options. Where do you start?
Without a thorough evaluation of performance characteristics, there are two scenarios most people evaluating these appliances end up choosing:
To start with, I'll talk about Scenario 1: setting up yourself and your ZFS evaluation up to fail: Doing It Wrong.
I bumped into several individuals at OpenWorld that had obviously already made choices that guaranteed the ZFS appliance they purchased was not going to work for them. They just didn't know it yet. And of course, despite my best
intentions to help them cope with the mess they made, they remained
unsatisfied with their purchase.
Both the choices and outcome were eminently predictable, and apparently motivated by several common factors.
From my point of view if someone isn't ready to invest six figures in storage, then they aren't yet ready for the kind of performance and reliability an enterprise-grade NAS like the ZFS appliance can offer them. The hardware they can afford won't provide them an accurate picture of how storage performs at scale.
Any enterprise storage one can buy at a four or five-figure price point is still a toy; a useful one, but still a toy compared with its bigger siblings.
It'll be nifty and entertaining if the goal is familiarize oneself with the operating system and interfaces. It will allow users to get a glimpse of the kinds of awesome advantages ZFS offers. It'll offer a reasonable test platform for bigger & better things later as you explore REST, Analytics, Enterprise Manager, and the Oracle-specific optimizations available to you. And perhaps it might serve reasonably well as a departmental file server or small-scale storage for a few dozen terabytes of data. But it won't offer performance or reliability on a scale similar to what serious enterprises deserve.
Most customers that invest in dedicated storage for the first time don't yet understand their data usage patterns. IOPS? A stab in the dark. Throughput? Maybe a few primitive tests from a prototype workstation. Hot data volume? Read response latency requirements? Burst traffic vs. steady-state traffic? Churn rate? Growth over time? Deduplication or cloning strategies? Block sizes? Tree depth? Filesystem entries per directory? Data structure? Best supported protocol? Protocol bandwidth compared to on-disk usage? Compressibility? Encryption requirements? Replication requirements?
I'm not saying one has to have all these answers prior to purchasing storage. In fact, the point of this series is to encourage you to purchase a good general-purpose hardware platform that is really good at most workloads, and configure it in a way that you're less likely to shoot yourself in the foot. But over and over the people with the biggest problems were the ones who didn't understand their data, yet hoped that purchasing some low-end ZFS storage would somehow magically solve their poorly-understood problems.
Most data worth storing is worth backing up. While I'm a big fan of the Oracle StorageTek SL8500 tape silo, not everybody is ready for a tape backup solution that can span the size of a football field or Quidditch pitch.
Nevertheless, trusting that the inherent reliability and self-healing of a filesystem will see a company through a disaster is not a good idea. Earthquakes, tornados, errant forklift drivers, newbie admins with root access, overly-enthusiastic Logistics personnel with a box knife and a typo-ridden list of systems to move are common. Backups should be considered and implemented long before valuable data is committed to storage.
Capacity planning is crucial in the modern enterprise. While I'm certain our sales guys are really happy to sell systems on an urgent basis with little or no discount in response to poor planning on the part of customers, that kind of decision making is often really hard on the capital expense budget.
A big part of successful capacity planning is forecasting future needs. Products like Oracle Enterprise Manager and ZFS Analytics can help. Home-brewed capacity forecasting is viable and common. A system administrator is at her best when she's already anticipated the need of the business and has a ready solution for the future problems she understands will arrive eventually, and with an enterprise NAS a modest investment in hardware can continue to yield future dividends as an admin continues to better understand her data utilization patterns and learns to use the available tools to intelligently manage it.
Here are the options I would pick if I wanted to set up my ZFS appliance to fail:
If you do the above, you'll pretty much guarantee a bad time for yourself with ZFS storage. Unfortunately, this seems to be the way far too many people try to configure the storage, and they set themselves up for failure right from the start.
So we've talked about Doing It Wrong. How do you Do It Right?
In case you don't know what I do, I co-manage several hundred storage appliances for a living (soon to be over a thousand, with hundreds of thousands of disks among them. Wow. The sheer scope of working for Oracle continues to amaze me!). Without knowing anything else about the workload except that the customer wants high-performance general-purpose file storage, below is the reference configuration I would pick if I want to maximize the workload's chances of success. If I think I need to differ from this reference configuration, it's important to ask "How does this improve on the reference configuration?" This reference configuration has proven its merit time and time again under a dizzying array of workloads, and I'd only depart from it under very compelling arguments to do so.
Such arguments exist, but if they are motivated by price, I am always trading away performance for a lower price!
Guiding this reference configuration are the following priorities:
So here's the hardware configuration we typically use in Oracle IT. It's not the biggest, it's certainly not the most expensive, but it has the advantage of simplicity, flexibility, and stellar performance for the vast majority of our use cases, and it all fits neatly into one little standard 48U rack. I'll hold off on part numbers, though, as those change over time.
Now let's step into software configuration. If you've configured your system as above, random writes are a breeze. Your appliance will rock the writes. The Achilles' heel of the ZFS appliance in a typical general-purpose "capacity" configuration as above is random reads. They can be both slow themselves, and they can slow down other I/O. You want to do whatever you can to minimize their impact.
There you have it: an ideal general-purpose file server with good capacity, great performance for average loads, and something that in typical Oracle Database or mixed-use environments will really make you glad you invested in an Oracle Sun ZFS Storage Appliance.