I’ve been busy

So, yeah. Eleven years ago I basically stopped blogging. The reasons were many: three jobs, insufficient income, and I started a new gig that reminded me repeatedly and pointedly that having a public presence on the Internet if I was not paid to do so was a liability. Plus I’d gotten into alternative social media channels for a while, then abandoned them completely as I got immersed in my work and other hobbies.

But lately, I’ve found a lack of authentic voices on the internet. The Internet is such a clickbait-farming ad-supported wasteland of barely-readable text. When I try to read thoughtful articles, they are often surrounded top and bottom by ads around a meager fraction of a paragraph before I must scroll, tap, tap, tap, and scroll again to read the darn thing. Or I turn on “Reader Mode” and get the first paragraph and a subscription link so I can get more spamvertising in my inbox.

So anyway. Here’s my tiny little bump on the corner of the Internet, trying to provide valuable content that interests me. Frequency TBD. Topics TBD. But I hope you find it interesting.

Privacy in a coffee shop

So I have to post you about two things — the outcry regarding FB privacy abuses, and the state’s political response in response to that outcry.

https://www.wsj.com/articles/for-facebooks-employees-crisis-is-no-big-deal-1523314648

So I have to post you about two things — the outcry regarding FB privacy abuses, and the state’s political response in response to that outcry.

https://www.wsj.com/articles/for-facebooks-employees-crisis-is-no-big-deal-1523314648

I don’t understand. You put your info and personal intimacies on FB. For years. For 10 years. Everything about yourself. For free. You do all this for free, putting your life online for 10 years. And then you complain when the internet service provider you’ve been using for free harvests your information? Like all of a sudden your privacy has been violated?

While privacy is at the forefront of the issue, the underlying tenet to me is the value of self-information and the value of the transfer of self-information by which that privacy is being asserted.

If you and I were in a coffee shop, and we were trying to have a private conversation, and we noticed someone listening in and eavesdropping on our conversation…we’d kick their ass!!! But seriously, the coffee shop is a place where people barter exchange with the shop for food and drink. Except there’s something else going on. There’s space to sit and work and relax. There’s wifi. But you don’t have to buy something from the shop to use its internet connection or to sit or meet with others or to transact personal and private business. The coffee shop proprietor isn’t demanding you buy something to use its other various services.

So people in coffee shops all the time assemble in these private-public spaces and yammer away about sensitive personal details with everyone in ear’s distance hearing it. And this doesn’t even cover the supremely annoying people yammering away loudly on their phones.

FB is the internet’s coffee shop. And everyone is hanging out at this place a lot, A LOT, and yammering away about their personal lives, and accepting that since they’re not buying and have never bought anything from the food counter that the shop is making money by taking all that yammering, which is being given to it for free, and turning around and selling that yammering to advertisers.

When you give your data for free in exchange for service you assign an informational value of free to yourself. Your data and your privacy is worth free to you. That is what is implicit to me. The implicit statement is: my privacy and personal data is worth nothing because I am giving it freely to a service I am using while knowing that service makes money off of advertising from me giving my data and from me not asking anything in return from the service making money off my data.

That value exchange of self-information seems to me to be the same whether you stop in the shop one time or stop in one time a minute. The rate of exchange remains the same. The transfer volume of self-informational doesn’t alter the value of the self-information being zero.

So you’ve been going to this coffee shop for some time and had a general chat one day about how hard it is to get your foot in your shoes and the very next day you show up and at the table you’re sitting is an advert for shoehorns and other fine accessories. And this goes on for a while until related ads start showing up the minute after you mention a specific topic. At what point do you get up and leave the coffee shop and never come back? Especially when you aren’t being forced to use this shop and there are other shops which provide similar service?

A year ago it became widely known that foreign nations were scraping data from this shop and buying political ads to influence the presidential election. Last month it became widely known that companies were indeed harvesting data from this shop to service those political persuasion campaigns.

Guess what? No one is leaving the coffee shop. A free and non-coerced civic polity continues to give away their data for free.

When something is free then you are the product. And you have assigned your own self-information to be worth $0. So either leave the shop and never go back, or keep going to the shop and know what you are in for. Because it’s not called PrivacyBook.

So, again, this is what I don’t understand. People put personal intimacies on FB for years. For free. And all of a sudden their privacy has been violated?

But then…something far far FAR worse happens. The government decides it must intervene and assert authority. Overstepping its role by somehow protecting people from their own lack of self-awareness. The government is not our Mom and Dad. The American people are not teenagers. The same thing happened when various levels of government tried to block the rise of Uber and AirBnB. Not only is society using these services but society is defending its right to exist by using it without reservation. So let them do it. If people have a problem with privacy violations, and there is no illegal activity taking place, then let the people work it out.

(Side note: it just shocks me that politicians, particularly conservative ones, would inject themselves into the fray by attacking a corporate juggernaut and cornerstone of the American economy. While the privacy issue does seem in some ways a media hype job, per the above WSJ article, I’m surprised a conservative administration and legislative leadership is letting this attack happen. But that’s today’s world when all you care about is votes and not principle.)

The bottom line is that I miss the community on Barnson.org. I understand my sentiment may not just be old-fashioned but a fossil emotion in the hyper-now digital world that is instantaneous, widespread engagement. But I don’t care. If FB went away tomorrow I wouldn’t miss 90% of the people who are my tagged ‘friends’ at that coffee shop. I miss this coffee shop. I miss the people I know and care about, and the quasi-privacy of our thoughtful, considerate conversation and debate within the back corner of the bigger shop that is the internet.

Trump revokes Washington Post’s campaign press credentials

So I have to post you. I’m no Trump supporter but I did happen to hit the WP yesterday when the headline “Donald Trump suggests President Obama was involved with Orlando shooting” was live.

http://mobile.reuters.com/article/newsOne/idUSKCN0YZ2DA

So I have to post you. I’m no Trump supporter but I did happen to hit the WP yesterday when the headline “Donald Trump suggests President Obama was involved with Orlando shooting” was live.

http://mobile.reuters.com/article/newsOne/idUSKCN0YZ2DA

I was way shocked. I couldn’t believe that to be true. So I went to view Trump’s speech and nowhere did Trump say, at all, that Obama was involved with the Orlando shooting.

Of course I don’t condone revoking press credentials. But I do observe how for the past several months the WP has been unusually harsh and increasingly biased towards and against Trump. The WP has gone from reporting the news to reporting their bias. My guess is the WP is doing this out of some internal crusade to protect journalism and defy those who would curtail a free press.

But that’s not the point of my posting you. The point is that I feel neither WP nor the Trump campaign realize how this continued siege of negative reporting HELPS Trump. I feel there are many people out there, the DC-dislikers, who consider the negative reporting to be coming from a source representative of a congressional institution they want to change. To these DC-dislikers, the WP is mainstream and legacy media feeding their enmity. The more negative the reports against Trump the more the DC-dislikers dig their heels into their minds and become more aligned with Trump. It’s a strange and warped psychological situation.

And basically I see two mistakes. I see the editorial mistake of the WP failing to report activity and static detail, almost allowing the aggressive virility of the late Hunter Thompson to seep into their writing. And I see the tactical mistake of the Trump campaign assessing a negative coercion power legacy media believes it still wields.

Handy Space Monitoring on ZFSSA

This is a re-post from my blog at http://blogs.oracle.com/storageops/entry/handy_space_monitoring

Semi-real-time space monitoring is pretty straightforward with
ECMAScript & XMLRPC.  I’ve never really been a fan of using used
+ avail as a metric; it’s simply too imprecise for this kind of
work.  With XMLRPC, you can gauge costs down to the byte, and with
Javascript/ECMAScript you have some easy date handling for your
report.

This is a re-post from my blog at http://blogs.oracle.com/storageops/entry/handy_space_monitoring

Semi-real-time space monitoring is pretty straightforward with ECMAScript & XMLRPC.  I’ve never really been a fan of using used + avail as a metric; it’s simply too imprecise for this kind of work.  With XMLRPC, you can gauge costs down to the byte, and with Javascript/ECMAScript you have some easy date handling for your report.

Here’s a code snippet to monitor fluctuations in your overall pool space usage.  Just copy-paste at the CLI to run it. Let’s call this "Matt’s Handy Pool Space Delta Monitor".  This one will update every 5 seconds; just change the "sleep" interval to whatever you need to increase or decrease the update speed; press CTRL-C a few times rapidly to exit.

There must be a way to get the ECMASCript interpreter to break out of the whole loop in response to a CTRL-C the first time, rather than just breaking the current loop requiring multiple CTRL-C presses, but I’m not exactly certain how to do it:

script
var previousSize = 0,
  currentSize = 0;
while (true) {
  currentDate = new Date();
  currentSize = nas.poolStatus(nas.listPoolNames()[0]).np_used;
  printf(‘%s bytes delta: %s bytes\n’,
    currentDate.toUTCString(),
    currentSize – previousSize);
  previousSize = currentSize;
  run(‘sleep 5’);
}
.

Here’s some sample output from a very busy system which handles some of Oracle’s ZFS bundle analysis uploads.  The system is constantly extracting, compressing, and destroying data, so it’s pretty dynamic.

aueis19nas09:> script
("." to run)> var previousSize = 0,
("." to run)>   currentSize = 0;
("." to run)> while (true) {
("." to run)>   currentDate = new Date();
("." to run)>   currentSize = nas.poolStatus(nas.listPoolNames()[0]).np_used;
("." to run)>   printf(‘%s bytes delta: %s bytes\n’,
("." to run)>     currentDate.toUTCString(),
("." to run)>     currentSize – previousSize);
("." to run)>   previousSize = currentSize;
("." to run)>   run(‘sleep 5’);
("." to run)> }
("." to run)> .
Wed, 08 Jul 2015 17:44:31 GMT bytes delta: 102937482702848 bytes
Wed, 08 Jul 2015 17:44:36 GMT bytes delta: 0 bytes
Wed, 08 Jul 2015 17:44:42 GMT bytes delta: 362925056 bytes
Wed, 08 Jul 2015 17:44:47 GMT bytes delta: 1039872 bytes
Wed, 08 Jul 2015 17:44:52 GMT bytes delta: 424662016 bytes
Wed, 08 Jul 2015 17:44:57 GMT bytes delta: -181739520 bytes
Wed, 08 Jul 2015 17:45:02 GMT bytes delta: 0 bytes
Wed, 08 Jul 2015 17:45:07 GMT bytes delta: -362792960 bytes
Wed, 08 Jul 2015 17:45:13 GMT bytes delta: -56487936 bytes
Wed, 08 Jul 2015 17:45:18 GMT bytes delta: 0 bytes
Wed, 08 Jul 2015 17:45:23 GMT bytes delta: 311884288 bytes
Wed, 08 Jul 2015 17:45:28 GMT bytes delta: -3111936 bytes
Wed, 08 Jul 2015 17:45:33 GMT bytes delta: 329170944 bytes
Wed, 08 Jul 2015 17:45:38 GMT bytes delta: 94827520 bytes
Wed, 08 Jul 2015 17:45:44 GMT bytes delta: -24576 bytes
Wed, 08 Jul 2015 17:45:49 GMT bytes delta: 356221440 bytes
Wed, 08 Jul 2015 17:45:54 GMT bytes delta: -36864 bytes
Wed, 08 Jul 2015 17:45:59 GMT bytes delta: 503583744 bytes
Wed, 08 Jul 2015 17:46:04 GMT bytes delta: 175494144 bytes
Wed, 08 Jul 2015 17:46:10 GMT bytes delta: -342528 bytes
Wed, 08 Jul 2015 17:46:15 GMT bytes delta: 135242240 bytes
Wed, 08 Jul 2015 17:46:20 GMT bytes delta: -39769600 bytes
Wed, 08 Jul 2015 17:46:25 GMT bytes delta: -124416 bytes
Wed, 08 Jul 2015 17:46:30 GMT bytes delta: -136044544 bytes
^CWed, 08 Jul 2015 17:46:31 GMT bytes delta: 0 bytes
^C^Cerror: script interrupted by user
aueis19nas09:>

Caveats:

  • This isn’t actually a 5-second sample; it simply sleeps 5 seconds between sample periods, and due to execution time you will probably get a little drift that will manifest as a displayed interval of 6 seconds here & there if left running a long time.
  • If you wanted to modify this to be GB instead of bytes, you’d replace "currentSize – previousSize" with something like "Math.round((currentSize – previousSize) / 1024 / 1024 / 1024)", but that will probably just end up with a string of 0 or 1 results with such a short polling interval.  You’d need to see significant and rapid data turnover to get a non-zero result if polling by gigabyte every five seconds!
  • This only monitors the first pool on your system. To monitor other pools on your system, you’d change "nas.listPoolNames()[0]" to "nas.listPoolNames()[1]" or whatever number the pool you want to monitor is in response to the "nas.listPoolNames()" command.

Enjoy!

Stuff Blog: Day 1

So I decided to create a “Stuff Blog” to document my adventure trying to sell down all the stuff in my life. Most of it I don’t need, and I want to get rid of as much as is practical.

So I decided to create a “Stuff Blog” to document my adventure trying to sell down all the stuff in my life. Most of it I don’t need, and I want to get rid of as much as is practical.

Day 1 listings: My Garmin VivoSmart smart watch and my ChromeCast. Yeah, I know they are both small personal electronics; I’m going to try something larger tomorrow. Like maybe an old bed or an old desk or something.

Understanding the Oracle Backup Ecosystem

Mirrored at https://blogs.oracle.com/storageops/entry/understanding_the_oracle_backup_ecosystem

Mirrored at https://blogs.oracle.com/storageops/entry/understanding_the_oracle_backup_ecosystem

Table of Contents

Understanding the Oracle Backup Ecosystem

Backup/Restore Drivers

The “Oops”

Defending against and pursuing lawsuits

Taxes & Audits

Disaster Recovery

Reduce Downtime

Improve Productivity

The Backup/Restore Tiers

Tier 1 Backups

Tier 2 Backups

Tier 3 Backups

Tier 4 Backups

The Tools

ZDLRA

SMU

OSB

ACSLS

STA

Oracle ZFS Storage

Tools For Tiers

Understanding the Oracle Backup Ecosystem

A frequent question I hear these days is something along the lines of “How is Oracle IT leveraging the Zero Loss Data Recovery Appliance, Oracle Secure Backup, and ZFS together?”

Disclaimer 1: The opinions in this blog are my own, and do not necessarily represent the position of Oracle or its affiliates.

Disclaimer 2: In Oracle IT, we “eat our own dog food”. That is, we try to use the latest and greatest releases of our product in production or semi-production environments, and the implementation pain makes us pretty strong advocates for improvements and bug fixes. So what I talk about here is only what we’re doing right now; it’s not what we were doing a year ago, and probably won’t be what we’re doing a year from now. Some of today’s innovative solutions are tomorrow’s deprecated processes. Take it all with a grain of salt!

Disclaimer 3: I’m going to talk about some of my real-world, actual experiences here in Oracle IT over the past decade that influenced my position on backups. Don’t take these experiences as an indictment of our Information Technology groups. Accidents happen; some are preventable, some not. The real key to success is not in not failing, but in moving forward and learning from the experience so we don’t repeat it.

Backup/Restore Drivers

Typically, the need for offline backup & restore is driven by a few specific types of needs.

The “Oops”

Humans are fallible. We make mistakes. The single most common reason for unplanned restores in Oracle IT is human error. This is also true for other large enterprises: Google enjoyed a high-profile incident of corrupted mailboxes several years ago due to a flawed code update. Storing data in the “cloud” is not a protection against human error. The only real protection you have from this kind of incident is some kind of backup that is protected by virtue of being either read-only or offline.

Defending against and pursuing lawsuits

In today’s litigious environment, being able to take “legal hold” offline, non-modifiable, long-retention backups of critical technology is a prerequisite to efficiently defending you and your company from various legal attacks. Trying to back up or restore an environment that has zero backup infrastructure in place is a huge hassle, and can endanger your ability to win a lawsuit. You want to have a mechanism in place to deal with the claims of your attackers – or to support the needs of your Legal team in pursuing infringements – without disrupting your normal operations.

Taxes & Audits

Tax laws in various countries usually require some mandatory minimum of data retention to satisfy potential audit requirements. If you can’t cough up the data required to pass an audit – regardless of the reason, even if it’s a really good one! – you’re probably facing a stiff fine at a minimum.

Disaster Recovery

I’m going to be real here. This is my blog, not some sanitized, glowing sales brochure. Everybody is – or should be! – familiar with what “Disaster Recovery” is. Various natural and man-made disasters have happened in recent decades, and many companies went out of business as a result due to inadequate disaster recovery plans. While the chance of a bomb, earthquake, or flood striking your data center is probably very low, it does exist. Here’s a short list of minor disasters I’ve personally observed during my career. There have been many more; I’ll only speak of relatively recent ones.

  • A minor earthquake had an epicenter just two miles from one of our data centers. I was in the data center in question at the time; it felt as if a truck struck the building. Several racks of equipment didn’t have adequate earthquake protection and shifted; they could easily have fallen over and been destroyed.

  • An uninterruptible power supply’s automated transfer switch exploded, resulting in smoke throughout the data center and a small fire that could have spread and destroyed data.

  • Another data center had a failure in the fire prevention system, resulting in sprinklers dousing several racks worth of equipment.

  • Busy staff and a flawed spreadsheet resulted in the wrong rack of equipment being forklifted and shipped to another data center.

  • A data center was in the midst of a major equipment move with very narrow outage windows. During one such time-critical move, facilities staff incorrectly severed the ZFS Appliance “Clustron” cables with a box knife before shipping the unit. I powered the unit up without detecting the break, resulting in a split-brain situation on our appliance that corrupted data. Mea culpa! Seriously, don’t do that. I don’t think the ZFSSA is vulnerable to this anymore as a result of this incident, but it was painful at the time and I don’t want anyone to go through that again…

  • Multiple storage admins on my team have accidentally destroyed the wrong share or snapshot on a storage appliance. When you have hundreds of thousands of similarly-named projects, shares, and snapshots, it’s nearly inevitable, even if the “nodestroy” bit is set: if the service request says to destroy a share, and all the leadership signed off on the change request for destroying it, you destroy it despite the “nodestroy” thing. But it’s quite rare.

  • Admins allowed too many disks to be evicted from the disk pool on an Exadata because ((reasons, won’t go into it)), resulting in widespread data loss and a data restore.

This was the minor stuff. Imagine if it were major! If you don’t have solid, tested disaster recovery plans that include some kind of offline or near-line backup, you’re exposed and are likely to go out of business even if you suffer a user-induced disaster such as the “Oops” category above.

Reduce Downtime

Having a good backup means that you have less downtime for your staff in case of any challenge with your data. Knowing how long it takes to restore your data is a benefit of a regularly-scheduled restore test.

Improve Productivity

Finally, if you don’t have a good backup, the chance is high that you’ll eventually end up having to do some work over again due to lack of good back-out options. This loss of productivity hurts the bottom line.

The Backup/Restore Tiers

In any large enterprise environment, there exist multiple tiers of needs for backup/restore. It’s often helpful to view backup and restore as a single type of tier: if your backup needs tend to be time-sensitive, your restore needs are probably even more so. Therefore, in the interest of simplicity I’ll assume your tier need for restores mirrors your tier for backups.

Here’s how I view these tiers today. They aren’t strictly linear as below – there is a lot of cross-over – but they align nicely with the technologies used to back them up.

  1. Mission-critical, high-visibility, high-impact, unique database content.
  2. Mission-critical, high-visibility, high-impact, unique general purpose content.
  3. Lower-criticality unique database and general purpose content.
  4. Non-unique database and general purpose content.

Tier 1 Backups

For Tier 1 Oracle database backup and restore, there exists one best choice today: The Zero Data Loss Recovery Appliance, or "ZDLRA". While you can perform backups to ZFS or OSB tape directly – which works quite well, and we’ve done it for years in various environments – the ZDLRA has some important advantages I’ll cover below.

That said, though, the Oracle ZFS Storage Appliance in combination with Oracle Secure Backup can provide Tier 1-level backups, but the “forever-incremental” strategy available on ZDLRA is simply not an option. For Tier 1 non-ZDLRA backups, we resort to more typical strategies: rman backup backupset using a disk-to-disk-to-tape approach, NFS targets, direct-to-tape options, etc.

For Tier 1, you also want multiple options if possible: layer upon layer of protection.

Tier 2 Backups

For Tier 2 general-purpose content, the ZDLRA just isn’t particularly relevant because it doesn’t deal with non-Oracle-Database data. By calling it “Tier 2” I’m not implying it’s less important than Tier 1 backups, just that you have a lot more flexibility with your backup and recovery strategies. Tier 2 also applies to your Oracle database environments that do not merit the expense of ZDLRA; ZFS and tape tend to be considerably cheaper, but with a corresponding rise in recovery time and manageability.

In Tier 2, you’ll have the same kind of backup & restore windows as Tier 1, but will use non-ZDLRA tools to take care of the data: direct-to-tape backups, staging to OSB disk targets for later commitment to tape, etc. Like Tier 1, you want to layer your recovery options. Our typical layers are:

  1. Sound change management process to eliminate the most common category of “Oops” restores.

  2. Snapshots. Usually a week or more, but a minimum of 4 daily automated snapshots to create a 3-day snap recovery window.
  3. Replication to DR sites. For Oracle Database, this usually means “Dataguard”. For non-DB data, ZFS Remote Replication is commonly used and has proven exceptionally reliable, if occasionally a little tricky to set up for extremely large (100+TB) shares.
  4. For Oracle databases, an every-15-minutes archive log backup to tape that is sent off site regularly at the primary and DR site(s).
  5. Weekly incremental backups to tape, using whatever hot backup type of technology is available to us on the platform so that a backup is “clean” and can be restored from without corrupted in-flight data at both the primary & DR site(s).
  6. Monthly full backups to tape at both the primary & DR site(s).
  7. Ad-hoc backups to tape as required.

Tier 3 Backups

Leveraging the same toolset as Tier 2 backups, Tier 3 backups are simply environments that need less-frequent backups of any sort. It’s the kind of stuff that if you lost access for 12-24 hours, your enterprise could keep running but would inconvenience a bunch of users. It’s not stuff that endangers your bottom line – if it’s a revenue-producing service, it must be treated as Tier 1 or Tier 2, or else you might end up owing your customers some money back! – but would be painful/irritating/time-consuming to reproduce.

In Oracle IT, this tier of data receives second-class treatment. It gets backed up once per week instead of constantly. Restore windows range from a few hours to a couple of days. Retention policies are narrower. Typically, very static environments like those held for Legal Hold or rarely-read data are stored in this tier. The data is important enough to back up, but the restoration window is much more fluid and the demands infrequent.

ZFS Snapshots are critical for this kind of environment, and typically will be held for a much longer period than the few days one might see in a production environment. Because the data is much more static, the growth of snapshots relative to their filesystems is very low.

Tier 4 Backups

The key phrase for backups in this tier is “non-unique”. In other words, the data could easily be reproduced with roughly the same amount of effort it would take to restore from tape. In general, these Tier 4 systems don’t receive much if any backup at all. ZFS snapshots occur on user-modifiable filesystems so that we can recover within a few days from a user “oops” incident, but if we were to lose the entire pool it could be reconstructed within a couple of days. Although it’s important to have some mechanism for tape backup should one be required, they will be the exception and not the rule.

The Tools

Now to the fun part. How do we glue these things together in various tiers? What tools do we use?

ZDLRA

  1. The forever-incremental approach to backups means that there is less CPU and I/O load on your database instance. Backup windows typically generate the heaviest load on your appliance, and since the ZDLRA should never require full backups after the first one, it’s an outstanding choice for I/O-challenged environmental backups.

  2. The ZDLRA easily services a thousands-of-SIDs environment without backup collisions. This is really critical for Cloud-style environments with many small databases, where traditional rman scheduling tends to fall apart pretty easily due to schedule conflicts to limited tape resources.
  3. Autonomous tape archival helps aggregate backups and provide on-demand in-scope Legal Hold, Disaster Recovery, Environment Retirement, and Tax/Audit backups to tape. Many may think “tape is dead”… but they think wrong!

SMU

Oracle’s SMU – “Snap Management Utility – is a great way to back up Tier 2 Oracle databases to ZFS. It handles putting your database into hot standby mode so that you can take an ACID-compliant snapshot of the data and set up restore points along the way. If you can’t afford ZDLRA, SMU + ZFS is a great first step. Just don’t forget to take it to tape too!

OSB

OSB version 12 provides “Disk Targets”. This, in essence, gives users of OSB 12 a pseudo-VTL capability. This new Disk Target functionality provides some other unique benefits:

  1. Aggregate multiple rman backups of smaller-than-a-single-tape size onto a single tape.

  2. With sufficient streams to disk, you can be rid of rman scheduling challenges that often vex thousands-of-SIDs environments when backing up to tape.
  3. By aggregating rman and other data to a single archive tape, you increase the density of data on tape, avoid buffer underruns, and maximize the free time for your tape drive. What often happens with a slow rman backup is that the tape ramps its speed down to match the input stream, doubling or even quadrupling the time the tape drive is busy. By buffering the backups to disk first, you can ensure the tape drive is driven at maximum speed once you’re ready to use “obtool cpinstance” to copy those instances to tape.
  4. Ability to use any kind of common spindle or SSD storage as a disk target. We use a combination of local disks on Sun/Oracle X5-2L servers running Solaris as well as ZFS Storage Appliance targets over 10gbit Ethernet

ACSLS

Oracle’s StorageTek Automated Cartridge System Library Software – ACSLS for short – provides a profoundly useful capability: virtualization of our tape silos. We can present a single silo from our smaller SL3000 libraries to the Big Boy SL8500 library as a virtual tape silo to a given instance of OSB. This allows truly isolated multi-tenancy and reporting for individual customers or lines of business. This capability is leveraged to the max across all of our Enterprise, Cloud, and Managed Cloud environments.

STA

Oracle’s StorageTek Analytics (STA) provide predictive failure analysis of tapes and silo components. All storage – tape, SSD, and magnetic spindle – will fail eventually. STA provides valuable insight into the rate of this decay, and works in tandem with ACSLS to pro-actively, predictively fail media out of the library when it’s no longer reliable.

Oracle ZFS Storage

Oracle’s ZFS Storage Appliance provides a uniquely flexible, configurable storage platform to leverage as a disk backup target, rman “backup backupset” staging area for massive-throughput Oracle database backups, remote replication source or target, and more. The proven self-healing capabilities of Oracle’s ZFS storage – particularly effective in a once-in, many-out backup situation – helps guarantee that backups are healthy and exactly what you intended to commit to tape. In many ways, the ZFS Storage Appliance is the fulcrum around which all our other utilities rotate, and its seamless integration as a disk target for OSB over either NFS or NDMP is simple, straightforward, and provides unparalleled analytic ability.

Tools For Tiers

If you’ve read this far, you probably already have a pretty good idea of what to use for which tier. ACSLS, STA, ZFS, and OSB all factor into every tier of backups in one way or another. By tier:

  1. ZDLRA with a sub-15-minute recovery point objective.

  2. ZFS Snapshots, hot backups to tape and/or OSB Disk Targets, and for some specific environments SMU may be appropriate, with a 15-minute recovery point objective.
  3. ZFS Snapshots are the primary “backup”, with a far more generous 24-hour recovery point objective using OSB disk and tape targets.
  4. ZFS Snapshots as the primary or only “backup”; no specific recovery point objective as the environment could be reconstructed if necessary.

I hope this is helpful for you when figuring out how to back up your Red Stack. All the best!

“The Flaw”

Just watched “The Flaw”. It’s an entertaining and surprisingly unbiased documentary covering the myriad causes of the 2008 financial disaster from which the world is still recovering.

Just watched “The Flaw”. It’s an entertaining and surprisingly unbiased documentary covering the myriad causes of the 2008 financial disaster from which the world is still recovering.

The most startling realization of the film for me is that from 1977 to 2007 the American people collectively engaged in the largest redistribution of wealth in world history, transferring money from the poorest 65% to the top 1%, from people who would spend the money to those who tend to invest the money rather than spend it. And we did all of this VOLUNTARILY through debt.

The second most startling realization is that we are still doing this. And it’s accelerating. The poorest among us are once again making the richest richer, and the richest are once again investing in more debt-based money-generating vehicles based on asset bubbles rather than investing in things that have worth due to their utility. All because, ultimately, exploitative debt-based real estate securities generate far more short-term profits than investing in factories and technologies that make real, tangible stuff.

Enjoy the respite from the housing bubble, folks. It’s still ongoing, and we’re still pumping twenty billion dollars a month into trying to keep the illusion of wealth growth through home appreciation for the middle-class rather than real, tangible wage increases and innovation with production.

My thoughts on the Apple Watch keynote

Watched the keynote today. Am I going to get an iWatch? No. Here’s why:

Watched the keynote today. Am I going to get an iWatch? No. Here’s why:

1. 18-hour “typical day” battery life. Ouch. I expect a watch to last at least a full day on a charge, and less if I’m tracking a fitness activity with it (but I still expect 10+ hours during fitness activities). From early reports, under heavy use this “18 hour” battery life is really about two hours; there’s a reason the very first accessory available for the watch is an expansion battery. 2. Patents have pretty well locked up the optical heart rate market, so unless Apple licensed one of the two major patent-holders, the optical heart rate is going to be terribly inaccurate under heavy motion, high heart rates, sweat, and for those with dark skin. 3. No waterproofing. Just splash-resistance. This is the deal-breaker for me. My fitness watch needs to be able to go into the pool, reservoir, or ocean and be 100% fine in an unexpected downpour when I’m on the bike or the run. 4. Total dependence on an iPhone. I want my wearable to track movement, distance, and activities even if I choose to leave the phone at home while hitting the weights, pool, bike, or track.

You won’t notice “price” on my list. Like most Apple products, when you evaluate the capabilities, weight, and feature set at day of release, Apple products are actually very competitive. At $349, I think it’s going to sell like gangbusters, with a compelling feature set that eclipses much of the similarly-priced competition.

And I hope they sell a gazillion of them so they can eventually address the needs of multisport athletes.

Maybe in version 2.0. Or 3.0…

2015 Mock Sprint Tri Results

I had some issues with my Garmin 910xt, but eventually I fixed the mock tri file. Woot! Next time, I’ll disable all auto lap functionality before starting the tri, because apparently that’s what interferes with the run data & corrupts the file.

I had some issues with my Garmin 910xt, but eventually I fixed the mock tri file. Woot! Next time, I’ll disable all auto lap functionality before starting the tri, because apparently that’s what interferes with the run data & corrupts the file.

Total moving time (not stopped @ stoplights): 112 minutes (1 hr, 52 minutes). Or more or less totally in line with most average beginner times, with a slightly better bike and a considerably worse run. Not at all unexpected.

* Mock Swim: 7:29. https://connect.garmin.com/modern/activity/715055281 T1: 7:26.https://connect.garmin.com/modern/activity/715055283 . I will do way better than this if I’m not DRIVING from the pool to my house for T1. * Mock Bike: 47:49 https://connect.garmin.com/modern/activity/715055284 T2: 2:05 https://connect.garmin.com/modern/activity/715055285 * Mock Run: 47:07 https://connect.garmin.com/modern/activity/715055286 (This is the totally broken part)

Glad to have the data & compare it to my first super-sprint from last year: * RCStake Swim leg I’m twice as fast (it was 300m 6x50m, not 700m): https://connect.garmin.com/modern/activity/560790985 * RCStake Bike leg 2MPH faster: https://connect.garmin.com/modern/activity/560790991 * RCStake run leg: OK, I was a little slower today than on the run leg last year. But the mock tri is nearly twice the length. https://connect.garmin.com/modern/activity/560790995 .

Observations: * My 910xt is finally recognizing my swim strokes as freestyle instead of backstroke! This means my form work is starting to pay off. And those laps I did do backstroke are almost twice as slow as freestyle, which clearly tells me I need to avoid backstroking if at all possible; a slow freestyle is faster than my fastest backstroke! * I blew up my legs on the uphill bike leg and didn’t work nearly hard enough on the back half of the ride while mostly cruising dowhill. My calves cramped up on the first part of the run, probably from under-use on the second half of the bike ride. * I need to learn to aero, or spend more time in the drops. I spent maybe 25% of my time (or less) in aero on my road bike. Sure, they are just little shorty aero bars, but nonetheless it was windy and I think it would have helped. * Hydration & electrolytes were OK, but I think I’d do better with some timed nutrition: a little EFS electrolyte drink before the swim, a little on the bike, and my energy levels should stay a little more consistent on the run. More mental than physical, I think. * Transitions were rough. Going to optimize them a bit for my first sprint in two weeks. * Too much hotfoot & walking on the run. I should use my metatarsal pads on the bike ride and probably Vibrams instead of my clunky running shoes on the run. My turnover will be quicker, and for such a short duration on the run it should help avoid the hotfoot I often get on longer runs well over an hour.

Excited. Clearly I *can* finish the sprint tri in a reasonable amount of time, and I’m pretty certain there will be at least a few non-DNF people behind me at the end. Which is really all I can ask 🙂 — Matthew P. Barnson http://barnson.org/

ZFS Tricks: Scheduling Scrubs

Content mirrored at https://blogs.oracle.com/storageops/entry/zfs_trick_scheduled_scrubs

A frequently-asked-question on ZFS Appliance-related mailing lists is "How often should I scrub my disk pools?"  The answer to this is often quite challenging, because it really depends on you and your data.

Usually when asked a question I want to provide the answers to the questions they should have asked first, so that I’m certain our shared conversational contexts match up. So here’s some background questions that we should have answers to before answering the "How often" question.

Content mirrored at https://blogs.oracle.com/storageops/entry/zfs_trick_scheduled_scrubs

A frequently-asked-question on ZFS Appliance-related mailing lists is "How often should I scrub my disk pools?"  The answer to this is often quite challenging, because it really depends on you and your data.

Usually when asked a question I want to provide the answers to the questions they should have asked first, so that I’m certain our shared conversational contexts match up. So here’s some background questions that we should have answers to before answering the "How often" question.

What is a scrub?

To "scrub" a disk means to read data from all disks in all vdevs in a pool. This process compares blocks of data against their checksums; if any of the blocks don’t match the related checksum, ZFS assumes that data has been corrupted (bit rot happens to every form of storage!) and will look for valid copies of the data. If found, it’ll write a good copy of the data to that storage, marking the old copy as "bad".

What is the benefit of a disk scrub?

Most people have a lot more "stale" data than they think they do: stuff that was written once, and never read from again. If data isn’t read, there’s no way to tell if it’s gone bad due to bit rot or not. ZFS will self-heal data if bad data is found, so a scrub forces a read of all data in the pool to verify that it isn’t currently bit-rotted, and heal the data if it is.

What performance impact is there to a scrub?

The ZFS appliance runs disk scrubs at a very low priority as a nearly-invisible background process. While there is a performance impact to scrubbing disk pools, this very low-priority background process should not have much if any impact to your environment. But the busier your appliance is with other things, and the more data is on-disk, the longer the scrub takes.

How long do scrubs run?

On a fresh system with little data and low utilization, scrubs complete very quickly.  For instance, on a brand-new, quiescent pool with 192 4TB disks, scrubs typically complete in just moments. There is no data to read, therefore the scrubs return almost as soon as we start them.

On very busy systems with very large pools and lots of I/O, it’s possible for scrubs to run for months before completion. For example, a 192-disk, full-rack 7410 with 2TB drives in the Oracle Cloud recently required eight months to complete a pool scrub. The system was used around-the-clock with extreme write loads; the low quantity of of RAM (256GB/head), compression (LZJB better than 2:1), and nearly-full pool (80%+) conspired to force the scrub to run extremely slowly.

If the slow-running, low-impact scrub needs to complete in a shorter time than that, contact Support and ask for a workflow to prioritize your scrubs to run a little faster.  Realize, of course, if you do so that the performance impact goes up if scrubs run at higher priority!

Should I scrub my pools?

  1. Is the pool formatted with either RAIDZ or Mirror2 configuration? Although these two options offer higher performance than RAIDZ2 or Mirror3, redundancy is lower. (No, I’m not going to talk about Stripe. That should only ever be used on a simulator; I don’t even know why it exists on a ZFS appliance.)
  2. Are unable to absolutely 100% guarantee that every byte of data in the pool is read frequently?  Note that even databases that the DBAs think of as "very busy" often have blocks of data that go un-read for years and are at risk of bit rot. Ask me how I know…
  3. Do you run restore tests of your data less frequently than once per year?
  4. Do you back up every byte of data in your pool less frequently than once per quarter?

If you answer "Yes" to any of the above questions, then you probably want to scrub your pools from time to time to guarantee data consistency.

How often should I scrub my pools?

This question is challenging for Support to answer, because as always the true answer is "It Depends".  So before I offer a general guideline, here are a few tips to help you create an answer more tailored to your use pattern.

  1. What is the expiration of your oldest backup? You should probably scrub your data at least as often as your oldest tapes expire so that you have a known-good restore point.
  2. How often are you experiencing disk failures? While the recruitment of a hot-spare disk invokes a "resilver" — a targeted scrub of just the VDEV which lost a disk — you should probably scrub at least as often as you experience disk failures on average in your specific environment.
  3. How often is the oldest piece of data on your disk read? You should scrub occasionally to prevent very old, very stale data from experiencing bit-rot and dying without you knowing it.

If any of your answers to the above are "I don’t know", I’ll provide a general guideline: you should probably be scrubbing your zpool at least once per quarter. It’s a schedule that works well for most use cases, provides enough time for scrubs to complete before starting up again on all but the busiest & most heavily-loaded systems, and even on very large zpools (192+ disks) should complete fairly often between disk failures.

How do I schedule a pool scrub automatically?

There exists no easy mechanism to schedule pool scrubs from the BUI or CLI as of February 2015. I opened a RFE a few months back for one to be provided, but I’m not certain how far down the development pipeline such a feature is, if it will exist at all. So in Oracle IT, we just rolled our own. 

The below code is an example of how this can be accomplished. It is provided as-is, with no warranty expressed or implied. Use it at your own risk.

It’s been working well for many months for us. Simply copy/paste the below code to some convenient filename, such as "safe_scrub.akwf".  Then upload the below workflow to your appliance using the "maintenance workflows" BUI screen.  The default schedule runs once every 12 weeks on a Sunday. You can tweak the schedule to match your needs either in the source code if you want to adjust the default schedule, or by visiting the "maintenance workflows" command-line interface and adjust the schedule manually after you upload it.

/*globals run, continue, list, printf, print, get, set, choices, akshDump, nas, audit, shell, appliance*/ /*jslint maxerr: 50, indent: 4, plusplus: true, forin: true */

/*safe_scrub.akwf * A workflow to initiate a scrub on a schedule. * Author: Matthew P. Barnson  * Update history: * 2014-10-09 Initial concept * 2014-11-20 EIS deployment * 2015-02-19 Sanitized for more widespread use * 2015-02-19 Multiple pool functionality added by: Adam Rappner  */

/* This program is provided 'as is' without warranty of any kind, expressed or * implied, including, but not limited to, the implied warranties of * merchantability and fitness for a particular purpose.*/

var MySchedules = [ // Offset 3 days (Sunday), 9 hours, 00 minutes, week interval. // The UNIX Epoch -- January 1, 1970 -- occurred on a Thursday. // Therefore the ZFS appliance's week in a schedule starts on Thursday. // Sample offset: Every week //{offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 604800, units: "seconds"} // Sample offset: Every 4 weeks //{offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 2419200, units: "seconds"} // Sample offset: Once every 12 weeks on a Sunday {offset: (3 * 24 * 60 * 60) + (9 * 60 * 60), period: 7257600, units: "seconds"} ];

var workflow = { name: 'Scheduled Scrub', origin: 'Oracle PDIT mbarnson', description: 'Scrub on a schedule', version: '1.2', hidden: false, alert: false, setid: true, scheduled: true, schedules: MySchedules, execute: function (params) { "use strict"; var myDate = run('date'), myReturn = "", pools = nas.listPoolNames(), p = 0; // Iterate over pools & start scrubs for (p = 0; p < pools.length; p = p + 1) { myDate = run('date'); try { run('cd /'); run('configuration storage set pool=' + pools[p]); run('configuration storage scrub start'); myReturn += "New scrub started on pool: " + pools[p] + " "; audit('Scrub started on pool: ' + pools[p] + ' at ' + myDate); } catch (err) { myReturn += "Scrub already running on pool: " + pools[p] + " "; audit('Scrub already running on pool: ' + pools[p] + ' at ' + myDate); } } return ('Scrub in progress. ' + myReturn + '\n'); } };

Happy scrubbing!