Privacy in a coffee shop

So I have to post you about two things — the outcry regarding FB privacy abuses, and the state’s political response in response to that outcry.

https://www.wsj.com/articles/for-facebooks-employees-crisis-is-no-big-deal-1523314648

So I have to post you about two things — the outcry regarding FB privacy abuses, and the state’s political response in response to that outcry.

https://www.wsj.com/articles/for-facebooks-employees-crisis-is-no-big-deal-1523314648

I don’t understand. You put your info and personal intimacies on FB. For years. For 10 years. Everything about yourself. For free. You do all this for free, putting your life online for 10 years. And then you complain when the internet service provider you’ve been using for free harvests your information? Like all of a sudden your privacy has been violated?

While privacy is at the forefront of the issue, the underlying tenet to me is the value of self-information and the value of the transfer of self-information by which that privacy is being asserted.

If you and I were in a coffee shop, and we were trying to have a private conversation, and we noticed someone listening in and eavesdropping on our conversation…we’d kick their ass!!! But seriously, the coffee shop is a place where people barter exchange with the shop for food and drink. Except there’s something else going on. There’s space to sit and work and relax. There’s wifi. But you don’t have to buy something from the shop to use its internet connection or to sit or meet with others or to transact personal and private business. The coffee shop proprietor isn’t demanding you buy something to use its other various services.

So people in coffee shops all the time assemble in these private-public spaces and yammer away about sensitive personal details with everyone in ear’s distance hearing it. And this doesn’t even cover the supremely annoying people yammering away loudly on their phones.

FB is the internet’s coffee shop. And everyone is hanging out at this place a lot, A LOT, and yammering away about their personal lives, and accepting that since they’re not buying and have never bought anything from the food counter that the shop is making money by taking all that yammering, which is being given to it for free, and turning around and selling that yammering to advertisers.

When you give your data for free in exchange for service you assign an informational value of free to yourself. Your data and your privacy is worth free to you. That is what is implicit to me. The implicit statement is: my privacy and personal data is worth nothing because I am giving it freely to a service I am using while knowing that service makes money off of advertising from me giving my data and from me not asking anything in return from the service making money off my data.

That value exchange of self-information seems to me to be the same whether you stop in the shop one time or stop in one time a minute. The rate of exchange remains the same. The transfer volume of self-informational doesn’t alter the value of the self-information being zero.

So you’ve been going to this coffee shop for some time and had a general chat one day about how hard it is to get your foot in your shoes and the very next day you show up and at the table you’re sitting is an advert for shoehorns and other fine accessories. And this goes on for a while until related ads start showing up the minute after you mention a specific topic. At what point do you get up and leave the coffee shop and never come back? Especially when you aren’t being forced to use this shop and there are other shops which provide similar service?

A year ago it became widely known that foreign nations were scraping data from this shop and buying political ads to influence the presidential election. Last month it became widely known that companies were indeed harvesting data from this shop to service those political persuasion campaigns.

Guess what? No one is leaving the coffee shop. A free and non-coerced civic polity continues to give away their data for free.

When something is free then you are the product. And you have assigned your own self-information to be worth $0. So either leave the shop and never go back, or keep going to the shop and know what you are in for. Because it’s not called PrivacyBook.

So, again, this is what I don’t understand. People put personal intimacies on FB for years. For free. And all of a sudden their privacy has been violated?

But then…something far far FAR worse happens. The government decides it must intervene and assert authority. Overstepping its role by somehow protecting people from their own lack of self-awareness. The government is not our Mom and Dad. The American people are not teenagers. The same thing happened when various levels of government tried to block the rise of Uber and AirBnB. Not only is society using these services but society is defending its right to exist by using it without reservation. So let them do it. If people have a problem with privacy violations, and there is no illegal activity taking place, then let the people work it out.

(Side note: it just shocks me that politicians, particularly conservative ones, would inject themselves into the fray by attacking a corporate juggernaut and cornerstone of the American economy. While the privacy issue does seem in some ways a media hype job, per the above WSJ article, I’m surprised a conservative administration and legislative leadership is letting this attack happen. But that’s today’s world when all you care about is votes and not principle.)

The bottom line is that I miss the community on Barnson.org. I understand my sentiment may not just be old-fashioned but a fossil emotion in the hyper-now digital world that is instantaneous, widespread engagement. But I don’t care. If FB went away tomorrow I wouldn’t miss 90% of the people who are my tagged ‘friends’ at that coffee shop. I miss this coffee shop. I miss the people I know and care about, and the quasi-privacy of our thoughtful, considerate conversation and debate within the back corner of the bigger shop that is the internet.

Trump revokes Washington Post’s campaign press credentials

So I have to post you. I’m no Trump supporter but I did happen to hit the WP yesterday when the headline “Donald Trump suggests President Obama was involved with Orlando shooting” was live.

http://mobile.reuters.com/article/newsOne/idUSKCN0YZ2DA

So I have to post you. I’m no Trump supporter but I did happen to hit the WP yesterday when the headline “Donald Trump suggests President Obama was involved with Orlando shooting” was live.

http://mobile.reuters.com/article/newsOne/idUSKCN0YZ2DA

I was way shocked. I couldn’t believe that to be true. So I went to view Trump’s speech and nowhere did Trump say, at all, that Obama was involved with the Orlando shooting.

Of course I don’t condone revoking press credentials. But I do observe how for the past several months the WP has been unusually harsh and increasingly biased towards and against Trump. The WP has gone from reporting the news to reporting their bias. My guess is the WP is doing this out of some internal crusade to protect journalism and defy those who would curtail a free press.

But that’s not the point of my posting you. The point is that I feel neither WP nor the Trump campaign realize how this continued siege of negative reporting HELPS Trump. I feel there are many people out there, the DC-dislikers, who consider the negative reporting to be coming from a source representative of a congressional institution they want to change. To these DC-dislikers, the WP is mainstream and legacy media feeding their enmity. The more negative the reports against Trump the more the DC-dislikers dig their heels into their minds and become more aligned with Trump. It’s a strange and warped psychological situation.

And basically I see two mistakes. I see the editorial mistake of the WP failing to report activity and static detail, almost allowing the aggressive virility of the late Hunter Thompson to seep into their writing. And I see the tactical mistake of the Trump campaign assessing a negative coercion power legacy media believes it still wields.

Handy Space Monitoring on ZFSSA

This is a re-post from my blog at http://blogs.oracle.com/storageops/entry/handy_space_monitoring

Semi-real-time space monitoring is pretty straightforward with
ECMAScript & XMLRPC.  I’ve never really been a fan of using used
+ avail as a metric; it’s simply too imprecise for this kind of
work.  With XMLRPC, you can gauge costs down to the byte, and with
Javascript/ECMAScript you have some easy date handling for your
report.

This is a re-post from my blog at http://blogs.oracle.com/storageops/entry/handy_space_monitoring

Semi-real-time space monitoring is pretty straightforward with ECMAScript & XMLRPC.  I’ve never really been a fan of using used + avail as a metric; it’s simply too imprecise for this kind of work.  With XMLRPC, you can gauge costs down to the byte, and with Javascript/ECMAScript you have some easy date handling for your report.

Here’s a code snippet to monitor fluctuations in your overall pool space usage.  Just copy-paste at the CLI to run it. Let’s call this "Matt’s Handy Pool Space Delta Monitor".  This one will update every 5 seconds; just change the "sleep" interval to whatever you need to increase or decrease the update speed; press CTRL-C a few times rapidly to exit.

There must be a way to get the ECMASCript interpreter to break out of the whole loop in response to a CTRL-C the first time, rather than just breaking the current loop requiring multiple CTRL-C presses, but I’m not exactly certain how to do it:

script
var previousSize = 0,
  currentSize = 0;
while (true) {
  currentDate = new Date();
  currentSize = nas.poolStatus(nas.listPoolNames()[0]).np_used;
  printf(‘%s bytes delta: %s bytes\n’,
    currentDate.toUTCString(),
    currentSize – previousSize);
  previousSize = currentSize;
  run(‘sleep 5’);
}
.

Here’s some sample output from a very busy system which handles some of Oracle’s ZFS bundle analysis uploads.  The system is constantly extracting, compressing, and destroying data, so it’s pretty dynamic.

aueis19nas09:> script
("." to run)> var previousSize = 0,
("." to run)>   currentSize = 0;
("." to run)> while (true) {
("." to run)>   currentDate = new Date();
("." to run)>   currentSize = nas.poolStatus(nas.listPoolNames()[0]).np_used;
("." to run)>   printf(‘%s bytes delta: %s bytes\n’,
("." to run)>     currentDate.toUTCString(),
("." to run)>     currentSize – previousSize);
("." to run)>   previousSize = currentSize;
("." to run)>   run(‘sleep 5’);
("." to run)> }
("." to run)> .
Wed, 08 Jul 2015 17:44:31 GMT bytes delta: 102937482702848 bytes
Wed, 08 Jul 2015 17:44:36 GMT bytes delta: 0 bytes
Wed, 08 Jul 2015 17:44:42 GMT bytes delta: 362925056 bytes
Wed, 08 Jul 2015 17:44:47 GMT bytes delta: 1039872 bytes
Wed, 08 Jul 2015 17:44:52 GMT bytes delta: 424662016 bytes
Wed, 08 Jul 2015 17:44:57 GMT bytes delta: -181739520 bytes
Wed, 08 Jul 2015 17:45:02 GMT bytes delta: 0 bytes
Wed, 08 Jul 2015 17:45:07 GMT bytes delta: -362792960 bytes
Wed, 08 Jul 2015 17:45:13 GMT bytes delta: -56487936 bytes
Wed, 08 Jul 2015 17:45:18 GMT bytes delta: 0 bytes
Wed, 08 Jul 2015 17:45:23 GMT bytes delta: 311884288 bytes
Wed, 08 Jul 2015 17:45:28 GMT bytes delta: -3111936 bytes
Wed, 08 Jul 2015 17:45:33 GMT bytes delta: 329170944 bytes
Wed, 08 Jul 2015 17:45:38 GMT bytes delta: 94827520 bytes
Wed, 08 Jul 2015 17:45:44 GMT bytes delta: -24576 bytes
Wed, 08 Jul 2015 17:45:49 GMT bytes delta: 356221440 bytes
Wed, 08 Jul 2015 17:45:54 GMT bytes delta: -36864 bytes
Wed, 08 Jul 2015 17:45:59 GMT bytes delta: 503583744 bytes
Wed, 08 Jul 2015 17:46:04 GMT bytes delta: 175494144 bytes
Wed, 08 Jul 2015 17:46:10 GMT bytes delta: -342528 bytes
Wed, 08 Jul 2015 17:46:15 GMT bytes delta: 135242240 bytes
Wed, 08 Jul 2015 17:46:20 GMT bytes delta: -39769600 bytes
Wed, 08 Jul 2015 17:46:25 GMT bytes delta: -124416 bytes
Wed, 08 Jul 2015 17:46:30 GMT bytes delta: -136044544 bytes
^CWed, 08 Jul 2015 17:46:31 GMT bytes delta: 0 bytes
^C^Cerror: script interrupted by user
aueis19nas09:>

Caveats:

  • This isn’t actually a 5-second sample; it simply sleeps 5 seconds between sample periods, and due to execution time you will probably get a little drift that will manifest as a displayed interval of 6 seconds here & there if left running a long time.
  • If you wanted to modify this to be GB instead of bytes, you’d replace "currentSize – previousSize" with something like "Math.round((currentSize – previousSize) / 1024 / 1024 / 1024)", but that will probably just end up with a string of 0 or 1 results with such a short polling interval.  You’d need to see significant and rapid data turnover to get a non-zero result if polling by gigabyte every five seconds!
  • This only monitors the first pool on your system. To monitor other pools on your system, you’d change "nas.listPoolNames()[0]" to "nas.listPoolNames()[1]" or whatever number the pool you want to monitor is in response to the "nas.listPoolNames()" command.

Enjoy!

Stuff Blog: Day 1

So I decided to create a “Stuff Blog” to document my adventure trying to sell down all the stuff in my life. Most of it I don’t need, and I want to get rid of as much as is practical.

So I decided to create a “Stuff Blog” to document my adventure trying to sell down all the stuff in my life. Most of it I don’t need, and I want to get rid of as much as is practical.

Day 1 listings: My Garmin VivoSmart smart watch and my ChromeCast. Yeah, I know they are both small personal electronics; I’m going to try something larger tomorrow. Like maybe an old bed or an old desk or something.

Understanding the Oracle Backup Ecosystem

Mirrored at https://blogs.oracle.com/storageops/entry/understanding_the_oracle_backup_ecosystem

Mirrored at https://blogs.oracle.com/storageops/entry/understanding_the_oracle_backup_ecosystem

Table of Contents

Understanding the Oracle Backup Ecosystem

Backup/Restore Drivers

The “Oops”

Defending against and pursuing lawsuits

Taxes & Audits

Disaster Recovery

Reduce Downtime

Improve Productivity

The Backup/Restore Tiers

Tier 1 Backups

Tier 2 Backups

Tier 3 Backups

Tier 4 Backups

The Tools

ZDLRA

SMU

OSB

ACSLS

STA

Oracle ZFS Storage

Tools For Tiers

Understanding the Oracle Backup Ecosystem

A frequent question I hear these days is something along the lines of “How is Oracle IT leveraging the Zero Loss Data Recovery Appliance, Oracle Secure Backup, and ZFS together?”

Disclaimer 1: The opinions in this blog are my own, and do not necessarily represent the position of Oracle or its affiliates.

Disclaimer 2: In Oracle IT, we “eat our own dog food”. That is, we try to use the latest and greatest releases of our product in production or semi-production environments, and the implementation pain makes us pretty strong advocates for improvements and bug fixes. So what I talk about here is only what we’re doing right now; it’s not what we were doing a year ago, and probably won’t be what we’re doing a year from now. Some of today’s innovative solutions are tomorrow’s deprecated processes. Take it all with a grain of salt!

Disclaimer 3: I’m going to talk about some of my real-world, actual experiences here in Oracle IT over the past decade that influenced my position on backups. Don’t take these experiences as an indictment of our Information Technology groups. Accidents happen; some are preventable, some not. The real key to success is not in not failing, but in moving forward and learning from the experience so we don’t repeat it.

Backup/Restore Drivers

Typically, the need for offline backup & restore is driven by a few specific types of needs.

The “Oops”

Humans are fallible. We make mistakes. The single most common reason for unplanned restores in Oracle IT is human error. This is also true for other large enterprises: Google enjoyed a high-profile incident of corrupted mailboxes several years ago due to a flawed code update. Storing data in the “cloud” is not a protection against human error. The only real protection you have from this kind of incident is some kind of backup that is protected by virtue of being either read-only or offline.

Defending against and pursuing lawsuits

In today’s litigious environment, being able to take “legal hold” offline, non-modifiable, long-retention backups of critical technology is a prerequisite to efficiently defending you and your company from various legal attacks. Trying to back up or restore an environment that has zero backup infrastructure in place is a huge hassle, and can endanger your ability to win a lawsuit. You want to have a mechanism in place to deal with the claims of your attackers – or to support the needs of your Legal team in pursuing infringements – without disrupting your normal operations.

Taxes & Audits

Tax laws in various countries usually require some mandatory minimum of data retention to satisfy potential audit requirements. If you can’t cough up the data required to pass an audit – regardless of the reason, even if it’s a really good one! – you’re probably facing a stiff fine at a minimum.

Disaster Recovery

I’m going to be real here. This is my blog, not some sanitized, glowing sales brochure. Everybody is – or should be! – familiar with what “Disaster Recovery” is. Various natural and man-made disasters have happened in recent decades, and many companies went out of business as a result due to inadequate disaster recovery plans. While the chance of a bomb, earthquake, or flood striking your data center is probably very low, it does exist. Here’s a short list of minor disasters I’ve personally observed during my career. There have been many more; I’ll only speak of relatively recent ones.

  • A minor earthquake had an epicenter just two miles from one of our data centers. I was in the data center in question at the time; it felt as if a truck struck the building. Several racks of equipment didn’t have adequate earthquake protection and shifted; they could easily have fallen over and been destroyed.

  • An uninterruptible power supply’s automated transfer switch exploded, resulting in smoke throughout the data center and a small fire that could have spread and destroyed data.

  • Another data center had a failure in the fire prevention system, resulting in sprinklers dousing several racks worth of equipment.

  • Busy staff and a flawed spreadsheet resulted in the wrong rack of equipment being forklifted and shipped to another data center.

  • A data center was in the midst of a major equipment move with very narrow outage windows. During one such time-critical move, facilities staff incorrectly severed the ZFS Appliance “Clustron” cables with a box knife before shipping the unit. I powered the unit up without detecting the break, resulting in a split-brain situation on our appliance that corrupted data. Mea culpa! Seriously, don’t do that. I don’t think the ZFSSA is vulnerable to this anymore as a result of this incident, but it was painful at the time and I don’t want anyone to go through that again…

  • Multiple storage admins on my team have accidentally destroyed the wrong share or snapshot on a storage appliance. When you have hundreds of thousands of similarly-named projects, shares, and snapshots, it’s nearly inevitable, even if the “nodestroy” bit is set: if the service request says to destroy a share, and all the leadership signed off on the change request for destroying it, you destroy it despite the “nodestroy” thing. But it’s quite rare.

  • Admins allowed too many disks to be evicted from the disk pool on an Exadata because ((reasons, won’t go into it)), resulting in widespread data loss and a data restore.

This was the minor stuff. Imagine if it were major! If you don’t have solid, tested disaster recovery plans that include some kind of offline or near-line backup, you’re exposed and are likely to go out of business even if you suffer a user-induced disaster such as the “Oops” category above.

Reduce Downtime

Having a good backup means that you have less downtime for your staff in case of any challenge with your data. Knowing how long it takes to restore your data is a benefit of a regularly-scheduled restore test.

Improve Productivity

Finally, if you don’t have a good backup, the chance is high that you’ll eventually end up having to do some work over again due to lack of good back-out options. This loss of productivity hurts the bottom line.

The Backup/Restore Tiers

In any large enterprise environment, there exist multiple tiers of needs for backup/restore. It’s often helpful to view backup and restore as a single type of tier: if your backup needs tend to be time-sensitive, your restore needs are probably even more so. Therefore, in the interest of simplicity I’ll assume your tier need for restores mirrors your tier for backups.

Here’s how I view these tiers today. They aren’t strictly linear as below – there is a lot of cross-over – but they align nicely with the technologies used to back them up.

  1. Mission-critical, high-visibility, high-impact, unique database content.
  2. Mission-critical, high-visibility, high-impact, unique general purpose content.
  3. Lower-criticality unique database and general purpose content.
  4. Non-unique database and general purpose content.

Tier 1 Backups

For Tier 1 Oracle database backup and restore, there exists one best choice today: The Zero Data Loss Recovery Appliance, or "ZDLRA". While you can perform backups to ZFS or OSB tape directly – which works quite well, and we’ve done it for years in various environments – the ZDLRA has some important advantages I’ll cover below.

That said, though, the Oracle ZFS Storage Appliance in combination with Oracle Secure Backup can provide Tier 1-level backups, but the “forever-incremental” strategy available on ZDLRA is simply not an option. For Tier 1 non-ZDLRA backups, we resort to more typical strategies: rman backup backupset using a disk-to-disk-to-tape approach, NFS targets, direct-to-tape options, etc.

For Tier 1, you also want multiple options if possible: layer upon layer of protection.

Tier 2 Backups

For Tier 2 general-purpose content, the ZDLRA just isn’t particularly relevant because it doesn’t deal with non-Oracle-Database data. By calling it “Tier 2” I’m not implying it’s less important than Tier 1 backups, just that you have a lot more flexibility with your backup and recovery strategies. Tier 2 also applies to your Oracle database environments that do not merit the expense of ZDLRA; ZFS and tape tend to be considerably cheaper, but with a corresponding rise in recovery time and manageability.

In Tier 2, you’ll have the same kind of backup & restore windows as Tier 1, but will use non-ZDLRA tools to take care of the data: direct-to-tape backups, staging to OSB disk targets for later commitment to tape, etc. Like Tier 1, you want to layer your recovery options. Our typical layers are:

  1. Sound change management process to eliminate the most common category of “Oops” restores.

  2. Snapshots. Usually a week or more, but a minimum of 4 daily automated snapshots to create a 3-day snap recovery window.
  3. Replication to DR sites. For Oracle Database, this usually means “Dataguard”. For non-DB data, ZFS Remote Replication is commonly used and has proven exceptionally reliable, if occasionally a little tricky to set up for extremely large (100+TB) shares.
  4. For Oracle databases, an every-15-minutes archive log backup to tape that is sent off site regularly at the primary and DR site(s).
  5. Weekly incremental backups to tape, using whatever hot backup type of technology is available to us on the platform so that a backup is “clean” and can be restored from without corrupted in-flight data at both the primary & DR site(s).
  6. Monthly full backups to tape at both the primary & DR site(s).
  7. Ad-hoc backups to tape as required.

Tier 3 Backups

Leveraging the same toolset as Tier 2 backups, Tier 3 backups are simply environments that need less-frequent backups of any sort. It’s the kind of stuff that if you lost access for 12-24 hours, your enterprise could keep running but would inconvenience a bunch of users. It’s not stuff that endangers your bottom line – if it’s a revenue-producing service, it must be treated as Tier 1 or Tier 2, or else you might end up owing your customers some money back! – but would be painful/irritating/time-consuming to reproduce.

In Oracle IT, this tier of data receives second-class treatment. It gets backed up once per week instead of constantly. Restore windows range from a few hours to a couple of days. Retention policies are narrower. Typically, very static environments like those held for Legal Hold or rarely-read data are stored in this tier. The data is important enough to back up, but the restoration window is much more fluid and the demands infrequent.

ZFS Snapshots are critical for this kind of environment, and typically will be held for a much longer period than the few days one might see in a production environment. Because the data is much more static, the growth of snapshots relative to their filesystems is very low.

Tier 4 Backups

The key phrase for backups in this tier is “non-unique”. In other words, the data could easily be reproduced with roughly the same amount of effort it would take to restore from tape. In general, these Tier 4 systems don’t receive much if any backup at all. ZFS snapshots occur on user-modifiable filesystems so that we can recover within a few days from a user “oops” incident, but if we were to lose the entire pool it could be reconstructed within a couple of days. Although it’s important to have some mechanism for tape backup should one be required, they will be the exception and not the rule.

The Tools

Now to the fun part. How do we glue these things together in various tiers? What tools do we use?

ZDLRA

  1. The forever-incremental approach to backups means that there is less CPU and I/O load on your database instance. Backup windows typically generate the heaviest load on your appliance, and since the ZDLRA should never require full backups after the first one, it’s an outstanding choice for I/O-challenged environmental backups.

  2. The ZDLRA easily services a thousands-of-SIDs environment without backup collisions. This is really critical for Cloud-style environments with many small databases, where traditional rman scheduling tends to fall apart pretty easily due to schedule conflicts to limited tape resources.
  3. Autonomous tape archival helps aggregate backups and provide on-demand in-scope Legal Hold, Disaster Recovery, Environment Retirement, and Tax/Audit backups to tape. Many may think “tape is dead”… but they think wrong!

SMU

Oracle’s SMU – “Snap Management Utility – is a great way to back up Tier 2 Oracle databases to ZFS. It handles putting your database into hot standby mode so that you can take an ACID-compliant snapshot of the data and set up restore points along the way. If you can’t afford ZDLRA, SMU + ZFS is a great first step. Just don’t forget to take it to tape too!

OSB

OSB version 12 provides “Disk Targets”. This, in essence, gives users of OSB 12 a pseudo-VTL capability. This new Disk Target functionality provides some other unique benefits:

  1. Aggregate multiple rman backups of smaller-than-a-single-tape size onto a single tape.

  2. With sufficient streams to disk, you can be rid of rman scheduling challenges that often vex thousands-of-SIDs environments when backing up to tape.
  3. By aggregating rman and other data to a single archive tape, you increase the density of data on tape, avoid buffer underruns, and maximize the free time for your tape drive. What often happens with a slow rman backup is that the tape ramps its speed down to match the input stream, doubling or even quadrupling the time the tape drive is busy. By buffering the backups to disk first, you can ensure the tape drive is driven at maximum speed once you’re ready to use “obtool cpinstance” to copy those instances to tape.
  4. Ability to use any kind of common spindle or SSD storage as a disk target. We use a combination of local disks on Sun/Oracle X5-2L servers running Solaris as well as ZFS Storage Appliance targets over 10gbit Ethernet

ACSLS

Oracle’s StorageTek Automated Cartridge System Library Software – ACSLS for short – provides a profoundly useful capability: virtualization of our tape silos. We can present a single silo from our smaller SL3000 libraries to the Big Boy SL8500 library as a virtual tape silo to a given instance of OSB. This allows truly isolated multi-tenancy and reporting for individual customers or lines of business. This capability is leveraged to the max across all of our Enterprise, Cloud, and Managed Cloud environments.

STA

Oracle’s StorageTek Analytics (STA) provide predictive failure analysis of tapes and silo components. All storage – tape, SSD, and magnetic spindle – will fail eventually. STA provides valuable insight into the rate of this decay, and works in tandem with ACSLS to pro-actively, predictively fail media out of the library when it’s no longer reliable.

Oracle ZFS Storage

Oracle’s ZFS Storage Appliance provides a uniquely flexible, configurable storage platform to leverage as a disk backup target, rman “backup backupset” staging area for massive-throughput Oracle database backups, remote replication source or target, and more. The proven self-healing capabilities of Oracle’s ZFS storage – particularly effective in a once-in, many-out backup situation – helps guarantee that backups are healthy and exactly what you intended to commit to tape. In many ways, the ZFS Storage Appliance is the fulcrum around which all our other utilities rotate, and its seamless integration as a disk target for OSB over either NFS or NDMP is simple, straightforward, and provides unparalleled analytic ability.

Tools For Tiers

If you’ve read this far, you probably already have a pretty good idea of what to use for which tier. ACSLS, STA, ZFS, and OSB all factor into every tier of backups in one way or another. By tier:

  1. ZDLRA with a sub-15-minute recovery point objective.

  2. ZFS Snapshots, hot backups to tape and/or OSB Disk Targets, and for some specific environments SMU may be appropriate, with a 15-minute recovery point objective.
  3. ZFS Snapshots are the primary “backup”, with a far more generous 24-hour recovery point objective using OSB disk and tape targets.
  4. ZFS Snapshots as the primary or only “backup”; no specific recovery point objective as the environment could be reconstructed if necessary.

I hope this is helpful for you when figuring out how to back up your Red Stack. All the best!

“The Flaw”

Just watched “The Flaw”. It’s an entertaining and surprisingly unbiased documentary covering the myriad causes of the 2008 financial disaster from which the world is still recovering.

Just watched “The Flaw”. It’s an entertaining and surprisingly unbiased documentary covering the myriad causes of the 2008 financial disaster from which the world is still recovering.

The most startling realization of the film for me is that from 1977 to 2007 the American people collectively engaged in the largest redistribution of wealth in world history, transferring money from the poorest 65% to the top 1%, from people who would spend the money to those who tend to invest the money rather than spend it. And we did all of this VOLUNTARILY through debt.

The second most startling realization is that we are still doing this. And it’s accelerating. The poorest among us are once again making the richest richer, and the richest are once again investing in more debt-based money-generating vehicles based on asset bubbles rather than investing in things that have worth due to their utility. All because, ultimately, exploitative debt-based real estate securities generate far more short-term profits than investing in factories and technologies that make real, tangible stuff.

Enjoy the respite from the housing bubble, folks. It’s still ongoing, and we’re still pumping twenty billion dollars a month into trying to keep the illusion of wealth growth through home appreciation for the middle-class rather than real, tangible wage increases and innovation with production.

My thoughts on the Apple Watch keynote

Watched the keynote today. Am I going to get an iWatch? No. Here’s why:

Watched the keynote today. Am I going to get an iWatch? No. Here’s why:

1. 18-hour “typical day” battery life. Ouch. I expect a watch to last at least a full day on a charge, and less if I’m tracking a fitness activity with it (but I still expect 10+ hours during fitness activities). From early reports, under heavy use this “18 hour” battery life is really about two hours; there’s a reason the very first accessory available for the watch is an expansion battery. 2. Patents have pretty well locked up the optical heart rate market, so unless Apple licensed one of the two major patent-holders, the optical heart rate is going to be terribly inaccurate under heavy motion, high heart rates, sweat, and for those with dark skin. 3. No waterproofing. Just splash-resistance. This is the deal-breaker for me. My fitness watch needs to be able to go into the pool, reservoir, or ocean and be 100% fine in an unexpected downpour when I’m on the bike or the run. 4. Total dependence on an iPhone. I want my wearable to track movement, distance, and activities even if I choose to leave the phone at home while hitting the weights, pool, bike, or track.

You won’t notice “price” on my list. Like most Apple products, when you evaluate the capabilities, weight, and feature set at day of release, Apple products are actually very competitive. At $349, I think it’s going to sell like gangbusters, with a compelling feature set that eclipses much of the similarly-priced competition.

And I hope they sell a gazillion of them so they can eventually address the needs of multisport athletes.

Maybe in version 2.0. Or 3.0…

2015 Mock Sprint Tri Results

I had some issues with my Garmin 910xt, but eventually I fixed the mock tri file. Woot! Next time, I’ll disable all auto lap functionality before starting the tri, because apparently that’s what interferes with the run data & corrupts the file.

I had some issues with my Garmin 910xt, but eventually I fixed the mock tri file. Woot! Next time, I’ll disable all auto lap functionality before starting the tri, because apparently that’s what interferes with the run data & corrupts the file.

Total moving time (not stopped @ stoplights): 112 minutes (1 hr, 52 minutes). Or more or less totally in line with most average beginner times, with a slightly better bike and a considerably worse run. Not at all unexpected.

* Mock Swim: 7:29. https://connect.garmin.com/modern/activity/715055281 T1: 7:26.https://connect.garmin.com/modern/activity/715055283 . I will do way better than this if I’m not DRIVING from the pool to my house for T1. * Mock Bike: 47:49 https://connect.garmin.com/modern/activity/715055284 T2: 2:05 https://connect.garmin.com/modern/activity/715055285 * Mock Run: 47:07 https://connect.garmin.com/modern/activity/715055286 (This is the totally broken part)

Glad to have the data & compare it to my first super-sprint from last year: * RCStake Swim leg I’m twice as fast (it was 300m 6x50m, not 700m): https://connect.garmin.com/modern/activity/560790985 * RCStake Bike leg 2MPH faster: https://connect.garmin.com/modern/activity/560790991 * RCStake run leg: OK, I was a little slower today than on the run leg last year. But the mock tri is nearly twice the length. https://connect.garmin.com/modern/activity/560790995 .

Observations: * My 910xt is finally recognizing my swim strokes as freestyle instead of backstroke! This means my form work is starting to pay off. And those laps I did do backstroke are almost twice as slow as freestyle, which clearly tells me I need to avoid backstroking if at all possible; a slow freestyle is faster than my fastest backstroke! * I blew up my legs on the uphill bike leg and didn’t work nearly hard enough on the back half of the ride while mostly cruising dowhill. My calves cramped up on the first part of the run, probably from under-use on the second half of the bike ride. * I need to learn to aero, or spend more time in the drops. I spent maybe 25% of my time (or less) in aero on my road bike. Sure, they are just little shorty aero bars, but nonetheless it was windy and I think it would have helped. * Hydration & electrolytes were OK, but I think I’d do better with some timed nutrition: a little EFS electrolyte drink before the swim, a little on the bike, and my energy levels should stay a little more consistent on the run. More mental than physical, I think. * Transitions were rough. Going to optimize them a bit for my first sprint in two weeks. * Too much hotfoot & walking on the run. I should use my metatarsal pads on the bike ride and probably Vibrams instead of my clunky running shoes on the run. My turnover will be quicker, and for such a short duration on the run it should help avoid the hotfoot I often get on longer runs well over an hour.

Excited. Clearly I *can* finish the sprint tri in a reasonable amount of time, and I’m pretty certain there will be at least a few non-DNF people behind me at the end. Which is really all I can ask 🙂 — Matthew P. Barnson http://barnson.org/

ZFS Tricks: Scheduling Scrubs

Content mirrored at https://blogs.oracle.com/storageops/entry/zfs_trick_scheduled_scrubs

A frequently-asked-question on ZFS Appliance-related mailing lists is "How often should I scrub my disk pools?"  The answer to this is often quite challenging, because it really depends on you and your data.

Usually when asked a question I want to provide the answers to the questions they should have asked first, so that I’m certain our shared conversational contexts match up. So here’s some background questions that we should have answers to before answering the "How often" question.

Content mirrored at https://blogs.oracle.com/storageops/entry/zfs_trick_scheduled_scrubs

A frequently-asked-question on ZFS Appliance-related mailing lists is "How often should I scrub my disk pools?"  The answer to this is often quite challenging, because it really depends on you and your data.

Usually when asked a question I want to provide the answers to the questions they should have asked first, so that I’m certain our shared conversational contexts match up. So here’s some background questions that we should have answers to before answering the "How often" question.

What is a scrub?

To "scrub" a disk means to read data from all disks in all vdevs in a pool. This process compares blocks of data against their checksums; if any of the blocks don’t match the related checksum, ZFS assumes that data has been corrupted (bit rot happens to every form of storage!) and will look for valid copies of the data. If found, it’ll write a good copy of the data to that storage, marking the old copy as "bad".

What is the benefit of a disk scrub?

Most people have a lot more "stale" data than they think they do: stuff that was written once, and never read from again. If data isn’t read, there’s no way to tell if it’s gone bad due to bit rot or not. ZFS will self-heal data if bad data is found, so a scrub forces a read of all data in the pool to verify that it isn’t currently bit-rotted, and heal the data if it is.

What performance impact is there to a scrub?

The ZFS appliance runs disk scrubs at a very low priority as a nearly-invisible background process. While there is a performance impact to scrubbing disk pools, this very low-priority background process should not have much if any impact to your environment. But the busier your appliance is with other things, and the more data is on-disk, the longer the scrub takes.

How long do scrubs run?

On a fresh system with little data and low utilization, scrubs complete very quickly.  For instance, on a brand-new, quiescent pool with 192 4TB disks, scrubs typically complete in just moments. There is no data to read, therefore the scrubs return almost as soon as we start them.

On very busy systems with very large pools and lots of I/O, it’s possible for scrubs to run for months before completion. For example, a 192-disk, full-rack 7410 with 2TB drives in the Oracle Cloud recently required eight months to complete a pool scrub. The system was used around-the-clock with extreme write loads; the low quantity of of RAM (256GB/head), compression (LZJB better than 2:1), and nearly-full pool (80%+) conspired to force the scrub to run extremely slowly.

If the slow-running, low-impact scrub needs to complete in a shorter time than that, contact Support and ask for a workflow to prioritize your scrubs to run a little faster.  Realize, of course, if you do so that the performance impact goes up if scrubs run at higher priority!

Should I scrub my pools?

  1. Is the pool formatted with either RAIDZ or Mirror2 configuration? Although these two options offer higher performance than RAIDZ2 or Mirror3, redundancy is lower. (No, I’m not going to talk about Stripe. That should only ever be used on a simulator; I don’t even know why it exists on a ZFS appliance.)
  2. Are unable to absolutely 100% guarantee that every byte of data in the pool is read frequently?  Note that even databases that the DBAs think of as "very busy" often have blocks of data that go un-read for years and are at risk of bit rot. Ask me how I know…
  3. Do you run restore tests of your data less frequently than once per year?
  4. Do you back up every byte of data in your pool less frequently than once per quarter?

If you answer "Yes" to any of the above questions, then you probably want to scrub your pools from time to time to guarantee data consistency.

How often should I scrub my pools?

This question is challenging for Support to answer, because as always the true answer is "It Depends".  So before I offer a general guideline, here are a few tips to help you create an answer more tailored to your use pattern.

  1. What is the expiration of your oldest backup? You should probably scrub your data at least as often as your oldest tapes expire so that you have a known-good restore point.
  2. How often are you experiencing disk failures? While the recruitment of a hot-spare disk invokes a "resilver" — a targeted scrub of just the VDEV which lost a disk — you should probably scrub at least as often as you experience disk failures on average in your specific environment.
  3. How often is the oldest piece of data on your disk read? You should scrub occasionally to prevent very old, very stale data from experiencing bit-rot and dying without you knowing it.

If any of your answers to the above are "I don’t know", I’ll provide a general guideline: you should probably be scrubbing your zpool at least once per quarter. It’s a schedule that works well for most use cases, provides enough time for scrubs to complete before starting up again on all but the busiest & most heavily-loaded systems, and even on very large zpools (192+ disks) should complete fairly often between disk failures.

How do I schedule a pool scrub automatically?

There exists no easy mechanism to schedule pool scrubs from the BUI or CLI as of February 2015. I opened a RFE a few months back for one to be provided, but I’m not certain how far down the development pipeline such a feature is, if it will exist at all. So in Oracle IT, we just rolled our own. 

The below code is an example of how this can be accomplished. It is provided as-is, with no warranty expressed or implied. Use it at your own risk.

It’s been working well for many months for us. Simply copy/paste the below code to some convenient filename, such as "safe_scrub.akwf".  Then upload the below workflow to your appliance using the "maintenance workflows" BUI screen.  The default schedule runs once every 12 weeks on a Sunday. You can tweak the schedule to match your needs either in the source code if you want to adjust the default schedule, or by visiting the "maintenance workflows" command-line interface and adjust the schedule manually after you upload it.

/*globals run, continue, list, printf, print, get, set, choices, akshDump, nas, audit, shell, appliance*/ /*jslint maxerr: 50, indent: 4, plusplus: true, forin: true */

/*safe_scrub.akwf * A workflow to initiate a scrub on a schedule. * Author: Matthew P. Barnson  * Update history: * 2014-10-09 Initial concept * 2014-11-20 EIS deployment * 2015-02-19 Sanitized for more widespread use * 2015-02-19 Multiple pool functionality added by: Adam Rappner  */

/* This program is provided 'as is' without warranty of any kind, expressed or * implied, including, but not limited to, the implied warranties of * merchantability and fitness for a particular purpose.*/

var MySchedules = [ // Offset 3 days (Sunday), 9 hours, 00 minutes, week interval. // The UNIX Epoch -- January 1, 1970 -- occurred on a Thursday. // Therefore the ZFS appliance's week in a schedule starts on Thursday. // Sample offset: Every week //{offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 604800, units: "seconds"} // Sample offset: Every 4 weeks //{offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 2419200, units: "seconds"} // Sample offset: Once every 12 weeks on a Sunday {offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 7257600, units: "seconds"} ];

var workflow = { name: 'Scheduled Scrub', origin: 'Oracle PDIT mbarnson', description: 'Scrub on a schedule', version: '1.2', hidden: false, alert: false, setid: true, scheduled: true, schedules: MySchedules, execute: function (params) { "use strict"; var myDate = run('date'), myReturn = "", pools = nas.listPoolNames(), p = 0; // Iterate over pools & start scrubs for (p = 0; p < pools.length; p = p + 1) { myDate = run('date'); try { run('cd /'); run('configuration storage set pool=' + pools[p]); run('configuration storage scrub start'); myReturn += "New scrub started on pool: " + pools[p] + " "; audit('Scrub started on pool: ' + pools[p] + ' at ' + myDate); } catch (err) { myReturn += "Scrub already running on pool: " + pools[p] + " "; audit('Scrub already running on pool: ' + pools[p] + ' at ' + myDate); } } return ('Scrub in progress. ' + myReturn + '\n'); } };

Happy scrubbing!

ZFS: Doing It Right

ZFS: Doing It Right

Imagine you’re a system administrator, and an email arrives from your boss. It goes something like this:

"Hey, bud, we need some new storage for Project Qux.  We heard that this [insert major company here] uses a product called the Oracle Sun ZFS Storage Appliance as the back-end for their [insert really popular app here]. We want to do something like that at similar scale; can you evaluate how well that compares to XYZ storage we already own?"

So you get in touch with your friendly local ZFS sales dudette, who arranges a meeting that includes a Sales Engineer to talk about technical stuff related to your application. The appliance, however, has an absolutely dizzying array of options.  Where do you start?

ZFS: Doing It Right

Imagine you’re a system administrator, and an email arrives from your boss. It goes something like this:

"Hey, bud, we need some new storage for Project Qux.  We heard that this [insert major company here] uses a product called the Oracle Sun ZFS Storage Appliance as the back-end for their [insert really popular app here]. We want to do something like that at similar scale; can you evaluate how well that compares to XYZ storage we already own?"

So you get in touch with your friendly local ZFS sales dudette, who arranges a meeting that includes a Sales Engineer to talk about technical stuff related to your application. The appliance, however, has an absolutely dizzying array of options.  Where do you start?

Without a thorough evaluation of performance characteristics, there are two scenarios most people evaluating these appliances end up choosing:

  1. ZFS choices that will almost certainly fail, and
  2. ZFS choices with a reasonable chance of success despite their lack of knowledge.

To start with, I’ll talk about Scenario 1: setting up yourself and your ZFS evaluation up to fail: Doing It Wrong.

How Are People Do It Wrong?

I bumped into several individuals at OpenWorld that had obviously already made choices that guaranteed the ZFS appliance they purchased was not going to work for them.  They just didn’t know it yet. And of course, despite my best intentions to help them cope with the mess they made, they remained unsatisfied with their purchase.

Both the choices and outcome were eminently predictable, and apparently motivated by several common factors.

Misplaced Cost-Consciousness

From my point of view if someone isn’t ready to invest six figures in storage, then they aren’t yet ready for the kind of performance and reliability an enterprise-grade NAS like the ZFS appliance can offer them.  The hardware they can afford won’t provide them an accurate picture of how storage performs at scale.

Any enterprise storage one can buy at a four or five-figure price point is still a toy; a useful one, but still a toy compared with its bigger siblings.

It’ll be nifty and entertaining if the goal is familiarize oneself with the operating system and interfaces. It will allow users to get a glimpse of the kinds of awesome advantages ZFS offers. It’ll offer a reasonable test platform for bigger & better things later as you explore REST, Analytics, Enterprise Manager, and the Oracle-specific optimizations available to you.  And perhaps it might serve reasonably well as a departmental file server or small-scale storage for a few dozen terabytes of data.  But it won’t offer performance or reliability on a scale similar to what serious enterprises deserve.

Misunderstanding Needs

Most customers that invest in dedicated storage for the first time don’t yet understand their data usage patterns. IOPS? A stab in the dark. Throughput? Maybe a few primitive tests from a prototype workstation. Hot data volume? Read response latency requirements? Burst traffic vs. steady-state traffic? Churn rate? Growth over time? Deduplication or cloning strategies? Block sizes? Tree depth? Filesystem entries per directory? Data structure? Best supported protocol? Protocol bandwidth compared to on-disk usage? Compressibility? Encryption requirements? Replication requirements?

I’m not saying one has to have all these answers prior to purchasing storage.  In fact, the point of this series is to encourage you to purchase a good general-purpose hardware platform that is really good at most workloads, and configure it in a way that you’re less likely to shoot yourself in the foot.  But over and over the people with the biggest problems were the ones who didn’t understand their data, yet hoped that purchasing some low-end ZFS storage would somehow magically solve their poorly-understood problems.

Lack Of Backups

Most data worth storing is worth backing up. While I’m a big fan of the Oracle StorageTek SL8500 tape silo, not everybody is ready for a tape backup solution that can span the size of a football field or Quidditch pitch.

Nevertheless, trusting that the inherent reliability and self-healing of a filesystem will see a company through a disaster is not a good idea.  Earthquakes, tornados, errant forklift drivers, newbie admins with root access, overly-enthusiastic Logistics personnel with a box knife and a typo-ridden list of systems to move are common.  Backups should be considered and implemented long before valuable data is committed to storage.

Solving Yesterday’s Problems

Capacity planning is crucial in the modern enterprise. While I’m certain our sales guys are really happy to sell systems on an urgent basis with little or no discount in response to poor planning on the part of customers, that kind of decision making is often really hard on the capital expense budget.

A big part of successful capacity planning is forecasting future needs. Products like Oracle Enterprise Manager and ZFS Analytics can help. Home-brewed capacity forecasting is viable and common. A system administrator is at her best when she’s already anticipated the need of the business and has a ready solution for the future problems she understands will arrive eventually, and with an enterprise NAS a modest investment in hardware can continue to yield future dividends as an admin continues to better understand her data utilization patterns and learns to use the available tools to intelligently manage it.

How To Fail At ZFS And Performance Reviews

Here are the options I would pick if I wanted to set up my ZFS appliance to fail:

  • Go with any non-clustered option; reliability suffers. Failure imminent.
  • Choose the lowest RAM option; space pressure will make my bosses really unhappy with the storage as things slow down. Great way to fail.
  • Buy CPUs at the lowest possible specification; taking advantage of CPU speed for compression would make the storage run better, and using CPU for encryption gives us options for handling sensitive data. Don’t want that if our goal is failure!
  • Pick an absurdly low number of high-capacity, low-IOPS spindles, like maybe twenty to forty 7200RPM drives; I/O pressure will drive me nuts troubleshooting, but heck, it’s job security.
  • Don’t invest in Logzillas (SLOG devices). The resultant write IOPS bottleneck will guarantee everybody hates this storage.
  • If I do invest in Logzillas (SLOG devices), use as few as possible and stripe them instead of mirroring them; that kills two birds with one stone: impaired reliability AND impaired performance!
  • Buy Readzillas (L2ARC), but ignore the total RAM available to the system and go for the big, fat, expensive Readzilla SSDs because I think we’re going to have a "lot of reads" without understanding what Readzillas actually do. This will impair RAM performance further, wasting both my money AND squandering performance!

If you do the above, you’ll pretty much guarantee a bad time for yourself with ZFS storage.  Unfortunately, this seems to be the way far too many people try to configure the storage, and they set themselves up for failure right from the start.

So we’ve talked about Doing It Wrong. How do you Do It Right?

Do It Right: Rock ZFS, Rock Your Performance Review

In case you don’t know what I do, I co-manage several hundred storage appliances for a living (soon to be over a thousand, with hundreds of thousands of disks among them. Wow. The sheer scope of working for Oracle continues to amaze me!). Without knowing anything else about the workload except that the customer wants high-performance general-purpose file storage, below is the reference configuration I would pick if I want to maximize the workload’s chances of success.  If I think I need to differ from this reference configuration, it’s important to ask "How does this improve on the reference configuration?"  This reference configuration has proven its merit time and time again under a dizzying array of workloads, and I’d only depart from it under very compelling arguments to do so.

Such arguments exist, but if they are motivated by price, I am always trading away performance for a lower price!

Understanding The Basics

Guiding this reference configuration are the following priorities:

  1. Redundancy. If it’s worth doing, it’s worth protecting; the ZFS appliance is reliable because it’s very fault-tolerant and self-healing, not because the commodity Sun hardware it’s built with is inherently more reliable than competing options.
  2. Mirrored Logzillas (SLOG devices). Balance this with RAM and spindles, though, as too much of any of the three and one or more will be underused.  And for a few obscure technical reasons related to reliability, I strongly prefer Mirrored Logzillas over Striped.
  3. RAM. ZFS typically leverages RAM really well. You’ll want to balance this with Logzilla & spindles, of course, using ratios similar to the reference configuration.
  4. Spindle read IOPS. Ideally, I should have some idea of the total expected read IOPS of my application, and configure sufficient spindles to handle the maximum anticipated random read load.  If this kind of data is unavailable, I’ll default to the reference configuration.
  5. Network. 10Gbit Ethernet is cheap enough these days that any reasonable storage should use it. It’s still a really tough pipe to fill for most organizations since it’s so large, but it is possible.
  6. CPU. It’s almost an afterthought, really; even the lowest CPU configuration of a given appliance that is capable of handling 1TB of RAM per head (2TB per cluster) comes with abundant CPU. But if I want to use ZFS Encryption heavily, or use the more CPU-intensive compression algorithms, CPU becomes a pretty legitimate thing to spend some money on.
  7. Readzilla/L2ARC/Read Cache. The ARC — main memory — is really your best, highest-performing cache on a ZFS appliance, but if there are specific reasons for investing heavily in Readzilla (L2ARC) cache, we’ll know a few months after we start using it. Basically, if my ARC hit rate drops down into the 80% range or lower, I want to add a Readzilla or two to the system. The cool thing is, you can add these any time; you don’t have to put this into the capital expense budget up-front, but it’s something you can do responsively if the storage appliance use pattern starts to suggest you ought to.

Your Best Baseline Hardware Configuration

So here’s the hardware configuration we typically use in Oracle IT. It’s not the biggest, it’s certainly not the most expensive, but it has the advantage of simplicity, flexibility, and stellar performance for the vast majority of our use cases, and it all fits neatly into one little standard 48U rack.  I’ll hold off on part numbers, though, as those change over time.

  • ZS4-4 cluster (two heads).
  • 15 core (or more) processor.
  • 1TB or 1.5TB RAM per head (2TB or 3TB total RAM across the cluster).
  • Dual port 10Gbit NIC per head.  We typically buy two of these for a total of four ports for full redundancy.
  • Two SAS cards per head (required).
  • Clustron (pre-installed) to connect your cluster heads together.
  • 8 shelves. I suggest if you anticipate fairly low IOPS and mostly capacity-related pressures that you opt for the DE2-24C configuration (capacity), but if you think IOPS will be pretty heavy, opting for DE2-24P (performance) is a good alternative but with pretty dramatically reduced capacity.
  • 8x200GB Logzilla SSDs. This is probably overkill, but some few environments can leverage having this much intent log.
  • Fill those shelves with 7200RPM drives as required.  Formatted capacity in TiB as I recommend below will be around 44.5% of raw capacity in TB once spares and the conversion from TB to TiB is taken into account.  Typically in this configuration I’ll have 184 spinning disks, so whatever capacity of disk I buy, I can do the math.  The cool part is that, on average, I’ll roughly double this with LZJB compression on average mixed-use workloads, giving around 67% up to 106% of raw capacity when formatted and used.  Which is, in essence, freakin’ awesome.

Fundamental Tuning & Configuration

Now let’s step into software configuration.  If you’ve configured your system as above, random writes are a breeze. Your appliance will rock the writes. The Achilles’ heel of the ZFS appliance in a typical general-purpose "capacity" configuration as above is random reads. They can be both slow themselves, and they can slow down other I/O. You want to do whatever you can to minimize their impact.

  • I’ll create two pools, splitting the shelves down the middle, and when setting up the cluster assign half of each shelf’s resources to a pool.
  • Those pools will be assigned one per head in the cluster configuration. This really lets us exploit maximum performance as long as we’re not failed over.
  • Use LZJB by default for each project. Numerous technical reasons for this; for now, if you don’t know what they are, take it on faith that LZJB typically provides a ZFS appliance a SERIOUS performance boost, but only if it’s applied before data is written… if applied after, it doesn’t do much.  This speeds up random reads considerably.
  • If using an Oracle database, just use OISP. It makes your life so so much easier from configuration to layout: two shares, and done.  If not using OISP, then pay close attention to the best practices for database layout to avoid shooting oneself in the foot!
  • If using an Oracle database, leverage HCC on every table where it’s practical. HCC-compressing the data — despite the CPU cost on your front-end database CPU initially — usually provides a pretty huge I/O boost to the back-end once again for reads. Worth it.
  • Scrub your pools. In a later blog entry I’ll discuss using a scheduled workflow to invoke a scrub, but for now just use Cron on an admin host, or assign some entry-level dude to mash the "scrub" button once a week for data safety. Around about year 3 of use, hard drive failure rates peak and continue failing at a more-or-less predictable rate indefinitely. There are certain extremely rare conditions under which it’s possible to lose data that is written once and very infrequently read in a mirror2 configuration; if you scrub your pools on a regularly-scheduled basis (at the default priority, this means more or less continuously), your exposure to the risk is dramatically lower to the point of "negligible risk".

Wrapping It Up

There you have it: an ideal general-purpose file server with good capacity, great performance for average loads, and something that in typical Oracle Database or mixed-use environments will really make you glad you invested in an Oracle Sun ZFS Storage Appliance.