Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
News Technology

Judge (Tech) Advice By Results 162

Bennett Haselton writes "What advice would you give someone who just bought a new laptop? What would you tell someone about how to secure their webserver against attacks? For that matter, how would you tell someone to prepare for their first year at Burning Man? I submit that the metric by which we usually judge tech advice, and advice in general, is fundamentally flawed, and has bred much of the unhelpful tech advice out there." Read below to see what Bennett has to say.

First, take a step back and imagine trying to come up with good advice in an area where results are easy to measure, like weight loss. (For the sake of argument, assume the advice recipients are genuinely medically obese people who can benefit from safe weight loss, not anorexics.) Suppose you were trying to measure the effects of two pieces of weight-loss advice, say, Program 1 and Program 2. You would think the most straightforward way to measure the effectiveness of the programs would be to divide a group of 100 volunteers randomly into two groups of 50, then have Group 1 follow Program 1, and have Group 2 follow Program 2 (with some type of monitoring for compliance). At the end of some time period, you simply measure which group has lost more weight (up to some healthy maximum threshold), and the program they were following, is the better program. What could be simpler than that? Isn't that the best, most obvious way to compare the two programs?

Actually no. I would say that's a terrible way to measure the two programs' effectiveness, under almost any reasonable set of assumptions about how the programs will be applied in the real world.

First of all, it's trivially easy to devise a program that would score really well under this system -- exercise for an hour and a half total every day, while eating nothing but fruits and vegetables and lean meats (or whatever would be considered a "perfect" diet by people who follow fanatically healthy eating habits -- I have no idea, because I don't). On the other hand, this by itself is not a valid reason to reject this measurement, because just because it's easy to score well under a particular measurement system, doesn't mean the measurement is not valid.

The real problem with this metric is that it has no bearing on what good it would do to give this advice to people in the real world, because in the case of the work-out-and-eat-kale gospel, most people are not going to follow it. So consider an alternative metric: Take 100 volunteers, divide them randomly into two groups, tell Group 1 about Program 1, and tell Group 2 about Program 2. That's it -- but you have no power to force them to actually follow the advice. All you know is that they were all drawn from a pool of volunteers who were sincerely interested in losing weight, but if you make the advice too complicated, they'll tune out, or if you make the advice too hard to follow, they'll lose motivation. And then at the end of some time period, you check in and see which group has lost more weight. You could call this "whole-audience based results" (I promise I'm not trying to coin a neologism, but let's call it WABR), because you're looking at the results achieved by everyone who heard the advice, not just the people who were deemed to have "followed" the advice correctly. (The previously rejected metric, looking only at the results of people who are judged to have followed the advice correctly, could be called Compliance-Based Results or CBR).

Consider that if a fitness fanatic gives weight-loss advice to one particular person, who either doesn't follow it perfectly or quits after a short period, the advice-giver can always claim that the advice was great, the recipient just didn't "do it right". But if you're giving your advice to 50 people in Group 1, and someone else is giving different advice to 50 people in Group 2, the samples are large enough that the proportion of unmotivated people is going to be about the same in each group -- so if Group 2 loses more weight, you probably can't use the excuse that you got stuck with all the unmotivated losers in Group 1. The advice that Group 2 must have worked better because it struck some sort of balance between effectiveness and ease of compliance.

Under this metric, it's not as easy to come up with a "program" that would score well. Simply telling people "Just eat less and exercise more," for example, would obviously score terribly under this metric, since (1) "less" and "more" are not defined precisely and (2) most people in the target audience have heard this advice before anyway. You would have to think carefully about what kinds of cooking and diet advice are easy to follow and fairly enjoyable, or what kind of exercise advice would fit into the average person's lifestyle. If someone objects that "No one piece of advice works for everyone" -- fair enough, so you could even design a program that segments your target audience: "If you have lots of time on your hands but not a lot of money for things like fresh produce, do A, B and C. Otherwise, if you have a very busy schedule but you can afford to buy whatever you want, do X, Y, and Z." You could nonetheless combine all that "if-then-else" advice into a single program and call it Program 1 -- as long as the metric for the success of Program 1 is to give it to 50 volunteers who are interested in losing weight, and track how much weight they actually use, without getting into arguments about whether they "really followed" the program or not.

If Michelle Obama made me her anti-obesity czar, that's more or less what I would do:

  • Recruit a large number of test volunteers who are interested in losing weight.
  • Recruit some (much smaller) number of doctors, nutritionists, and general fitness blowhards who are interested in giving people advice about losing weight.
  • Each advice-giver is allowed to submit a set of instructions on how to lose weight.
  • The volunteer pool is randomly divided into groups, and each group is assigned one of the submitted methods (probably after a panel of doctors pre-screened the methods for medical safety; otherwise, the winning method would probably end up being something involving heroin). That method is distributed to everyone in the volunteer group, but nobody will monitor them for compliance.
  • Check back in with each volunteer pool at the end of some time period. Whichever volunteer group has lost the most weight, the person who submitted the advice that was given to that group, gets a million dollars, and the glory that is rained down upon them as their winning advice is promoted all the world.

No, really, seriously. If you want to reduce obesity rates in the country, shouldn't the ideal solution be something WABR-based, very close to this? It does no good to come up with a piece of advice that works well under CBR -- where you can force people to follow the program (or exclude them from the results if they don't) -- because that doesn't predict how the advice will work when distributed to the population at large, where of course you can't force people to follow the program. On the other hand, if the advice works reasonably well for a group of volunteers whose compliance is entirely up to them, then that should be a better predictor of how well it would work on a larger audience.

(Of course, someone might object that the true metric of healthy weight-loss advice is not how much weight you've lost after several months, but whether you've made a permanent lifestyle change that keeps it off even several years later. In that case you would just make that the new prize-winning criterion -- which group has lost and kept the most weight off three years down the road -- but still sticking to the WABR principle.)

Another advantage of WABR is that it avoids squabbling over whether a person "really" followed the advice, if they failed to achieve the desired result. If an advice-giver tells you to "eat less and exercise more", and you eat a little less and exercise a little more but fail to achieve any noticeable changes, it's highly unlikely that the advice-giver is going to concede their advice didn't work, even if you did follow it literally. On the other hand, no matter how much less you eat or how much more you exercise, if it doesn't work, the advice-giver can always say that you didn't reduce your calories or exercise enough -- which makes the advice unfalsifiable, because there's no circumstance under which the advice-giver would have to admit they were wrong. This also applies to advice that's extremely difficult to follow, such as "Eliminate all sugar from your diet" -- if the advice fails, it would be easy for the advice-giver to find ways that the advice recipient deviated from the program (if they ate fruits -- which most doctors recommend doing -- does fructose count?). WABR means that you don't have to adjudicate who actually followed the advice, because the results are collected from everyone who heard the advice.

Now, back to tech. I've deliberately avoided dwelling on technical examples, because after reading through the weight loss example, you can probably generalize this pretty easily. If Bob tells you to keep your new laptop virus-free by ditching Windows and all of your programs and switching to Linux, and Alice tells you to keep your new laptop virus-free by installing a free anti-virus program, then in a WABR test, I'll bet Alice's group would be left with fewer virus infections at the end of the year than Bob's group, for the simple reason that most people can't or won't follow Bob's advice. I'd even concede that the small number of people who do switch to Linux might have fewer viruses to deal with, but I'd say it's irrelevant. By any reasonable definition, Alice's advice is more helpful, or, simply put, better.

When I wrote "4 Tips For Your New Laptop" for Slashdot last Christmas, I think I was subconsciously using WABR as a metric for how well the advice would work for people. Because if you sincerely want the advice to be helpful (and I did), shouldn't the definition of success be the average benefit across all the people who read or attempt to follow the advice? Rather than a piece of advice that has a 100% success rate among readers who can follow it, but only 5% of them can?

One user posted this comment in response to the article:

First, syncing to cloud is not backup. Second, being at the mercy of a provider doesn't strike me as a good idea in long-term.

Better invest in a NAS. A 2-bay Synology would suffice. 2 4TB drives in Mirrored Raid work great. WD has the "red" line of drives specifically made and tested for NAS storage. They are not as fast but run cool, silent, no vibrations.

Most NAS units run on linux so you can easily add syncing, versioning, "personal cloud", maybe use to play movies on smart TVs via DLNA and so on.

Finally, from time to time do proper backups. For home use, proper backup means burning data on DVD/BD - on 2 separate discs.

OK. Let's suppose every word in that comment is correct. Now suppose we gave 50 people the advice from my original article, and 50 other people the advice I just quoted, but we have no power to actually force either group to follow the advice in either case. Which group do you think would have fewer computer catastrophes over the course of the year? (Yes, of course a lot of people would drop out of following the quoted advice because they didn't know what the guy was talking about, but imagine a version that had each sentence fleshed out in more detail explaining the acronyms and describing what the hardware costs. I still think my simpler advice would win.) I don't mean to pick on that guy in particular. Most computing advice out there would not score very well under WABR.

Similarly, when I wrote about how to make your first trip to Burning Man easier, it was partly in response to all the veterans who had given me CBR-based advice, like, "Build a hexayurt to sleep in." Of course, if you look only at a sample of people who actually did build a hexayurt at Burning Man, most of them probably had a great experience there. But if your advice is to tell people to build a hexayurt, only a small proportion of them will try it (and if they try and fail, you can claim that they didn't actually "follow your advice"!). The advice I wrote was to buy a tent and stake it down, because I think that if you tell 50 people to do that, and tell another group of 50 people to build a hexayurt, the people that you tell to buy a tent are on average more likely to have a good experience. (Although it wouldn't be a huge difference, because most people that you tell to build a hexayurt, will eventually figure out that you were fucking with them and will buy a tent anyway.)

Of course, as I said in a previous article about the sorry state of cooking instructions on the Internet (scroll down to the part about jalapeno poppers), the real reason most directions on the Internet suck, is because they were written to grab search engine traffic. That just requires some keywords to appear in the title of the page and in multiple spots in the body content, and has nothing to do with whether the directions work. So nothing I say is going to change the minds of people who are farming "how-to" content for some extra clicks.

I'm more concerned about people who are supposedly trying to be helpful, but revert to advice that sounds as if it would do well under CBR but badly under WABR. Consider -- if your goal in giving the advice is, very generally, to bring the greatest benefit to the average person hearing it, then WABR should be your metric for success, shouldn't it? Obviously I'm not suggesting that it's usually practical to test one piece of advice against another by recruiting 100 volunteers, dividing them into two groups of 50, etc. I'm saying that in cases where it's instinctively very likely that one piece of advice would do much better under WABR than another, then that's the advice you should give to people -- a fact that is lost on the leet hax0rs who think they're being useful by saying things like "Dump Windows and install Linux."

And it's not merely that advice which scores poorly under WABR is unhelpful. WABR is the measurement by which a person's advice is helpful to other people, so if a person is giving advice that they can't possibly sincerely believe would score well by that metric, it comes across as caring more about something other than being helpful. Perhaps the advice-giver wants to sound smart, or simply wants to avoid the possibility of having to admit they were wrong (if you make your advice hard to follow, that reduces the chance of somebody actually climbing that mountain and then pointing out to you if your suggestion didn't work). So it's not just that the advice-giver is being unhelpful, it's that they're being a dick.

For a long time, I would hear pieces of tech advice that I knew would probably give a good result if I followed them to the letter (i.e. would do well under CBR), but something would nag at me, not only making me think that I probably would not end up with a good result, but making me resent the advice-giver for some reason that I couldn't precisely define. Now, I think, I've precisely defined it: I should have told them, "If you gave this advice to 50 people, and some other comparable advice to another similar group of 50 people, and if we measured the results by looking at everybody in each group without getting into arguments over whether they 'properly followed' the advice or not, you must be aware that the advice you just gave me would score worse than any number of alternatives that you could have supplied with just a little more effort." Unfortunately that's not very compact.

So, if someone asks you for general technical guidance, I submit you will be doing them a favor if you keep WABR in mind. I would also advocate for it as a way to settle disputes over which of two pieces of third-party advice is actually "better".

According to my own rule, though, I'm not sure how many people reading this will actually keep this approach in mind next time they're giving technical advice. On the other hand, it's hard to imagine an alternative exhortation that would achieve a better result.

This discussion has been archived. No new comments can be posted.

Judge (Tech) Advice By Results

Comments Filter:

Any circuit design must contain at least one part which is obsolete, two parts which are unobtainable, and three parts which are still under development.

Working...