Erg interessant interview met een Xbox insider
Inside Source Reveal the Truth About Xbox 360 "Red Ring of Death" Failures
The Xbox 360 "Red Ring of Death"Since it's launch in Fall of 2005 Xbox 360 systems all over the world have had major hardware failure problems resulting in millions of costumers having to mail their Xbox back to Microsoft. No one really knows what has been causing these problems since the official lines never divulged the specific problems or rates of failure. All a person has to do is to press the power button on their Xbox 360 and there is a chance that it will just up and fail to boot up and shine the "Red Rings of Death". Microsoft decided to extent the warrantee for the Xbox 360 but the cloud of fear and uncertainly still hang around the game system.
This past week I met and interviewed an individual who has worked on the Xbox 360 project for many years and they had some things that they wanted to get out into the public. I have the fullest confined in the integrity of my confidential source. While respecting and protecting his rights we were able to have an in-depth interview of working in the Xbox project and just how things progressed to this point. Just keep in mind that a while back I broke the story that Bungie was leaving Microsoft and had all the details a full week before the official PR announcement Once again I have a confidential source from inside Redmond and I't all checks out to me.
Now on to the Interview:
Q: So what do you think the real failure rate of the Xbox 360 is? Some have estimated it as high as 30%. I got my Xbox in early 2007 and so far so good but what do you think the chance is that it's going to die on me one day.
It's around 30%, and all will probably fail early. This quarter they are expecting 1 M failures, most of those Xenons. Some of those are repeat failures. Life expectancy is all over the map because the design has very little margin for most of the important parameters. That means it's not a fault tolerant design. So a good unit may last a couple of years, while a bad unit can fail in hours. I have a launch unit and have not had a single problem with it. And it's used a lot. But I don't know anyone else with a 360 that hasn't broken, except you now. There's no way to tell when yours might die. But the cooler you can keep it, the longer it will probably last. So stand it up, keep it in free air, etc. :Note : Xenon was the code name for the first Xbox 360 mother board.
Q: Of all five videogame systems on the market now (PS3, PSP, PS2, Wii and 360)only the Xbox 360 has had such major hardware failure problems. Microsoft being the only company based in the US making a videogame system. What part of Microsoft's way of doing things do you think caused this situation to happen.
First, MS has under resourced that product unit in all engineering areas since the very beginning. Especially in engineering support functions like test, quality, manufacturing, and supplier management. There just weren't enough people to do the job that needed to be done. The leadership in many of those areas was also lopsided in essential skills and experience. But I hear they are really trying to staff up now based on what has happened, and how cheap staff is compared to a couple of billion in cost of quality.
Second, MS was so focused on beating Sony this cycle that the 360 was rushed to market when all indications were that it had serious flaws. The design qual testing was insufficient and incomplete when the product was released to production. The manufacturing test equipment had major gaps in test coverage and wasn't reliable or repeatable. Manufacturing processes at eall levels of suppliers were immature and not in control. Initial end to end yields were in the mid 30%. Low yields always indicate serious design and manufacturing defects. Management chose to continue to ship anyways, and keep the lines running while trying to solve problems and bring the yields up. Whenever something failed and there was a question about whether the test result was false, they would remove that test, retest and ship, or see if the unit would boot a game and run briefly and then ship. 360 is too complex of a machine to get away with that.
In the end I think it was fear of failure, ambition to beat Sony, and the arrogance that they could figure anything out, that led to the decision to keep shipping. That management team had made some pretty bad decisions in the past and had never had to pay a proportional consequence. I'm sure they thought that somehow they would figure it out and everything would end up ok. Plus, they tend to make big decisions like that in terms of dollars. They would rationalize that if the first few million boxes had a high failure rate, a few 10's of millions of dollars would cover it. And contrasting that cost with a big lead on Sony, would pay it in a heartbeat. They weren't even thinking about Nintendo.
Compare that to Sony, who delayed their launch, even though they were behind, when their box wasn't ready.
Q: In your opinion what do you think the main cause of the Red Ring of Death failures have been?
RROD is caused by anything that fails in the "digital backbone" on the mother board. Also known as a core digital error. CPU, GPU, memory, etc. Bad parts, incompatible parts (timing problems) bad manufacturing process (like solder joints), misapplied heat sinks or thermal interface material, missing parts, broken parts, parts of the wrong value, missed test coverage. Any one or more, on any chip, or many other discrete components, would cause this. And many of the failures were obviously infant mortality, where they work when they leave the factory and fail early in use. The main design flaw was the excessive heat on the GPU warping the mother board around it. This would stress the solder joints on the GPU and any bad joints would then fail in early life.
There are also other significantly high failure rates in other areas, like the DVD.
Q: Does some games more than others can cause hardware failure. Gears of War and Dead Rising were thought to be system killers when they came out.
Of course. Infant mortality, which is a weakened mechanical "thing" like a solder joint with a void in it, are exercised to failure by cyclic stress. The number of cycles and the amplitude of temperature change from low to high determine how quickly it will fail. Certain games will consume more bandwidth on the GPU, which has the most substandard thermal solution on the mother board, making it a lot hotter, warping the mobo and flexing the solder joints. Weak joints fail quickly. The better the game, the more often it will be played, again accelerating failures.
Q: Let's go over some of the rumored reasons RROD. Could you tell how close each theory is?
Over heating CPU/GPU due to the lead free solder?
They don't overheat due to PB Free. They over heat due to too much power dissipated in too small of an area, w/o a sufficient thermal management design to take the heat away from the junction of the transistors on the chips, the packages themselves, and the mobo. And the over heating is on the GPU. When the CPU heatsink is applied right, it does not over heat.
Defective parts due to overseas subcontractors?
Some defective parts, like BGAs where the solder balls are not of sufficient and uniform size, so they don't solder down evenly, or the substrate is warped, causing some joints to have insufficient solder. Bad chips from marginal or under tested wafers. Others are deficient processes, like misaligning the solder paste to the circuit board, or same on the parts, or not having the thermal profile right in the reflow oven during soldering. Manufacturers new to PB free tend to err on the low temp side thinking they are saving the parts reliability wise from a large thermal load. What they are really doing is not reflowing the PB free solder enough to make a good joint. PB free solder is non eutectic, which means the different metals in the solder alloy melt at different temperatures, unlike leaded solder where everything melts at the same temperature. If you under heat it, it won't bond well to the board or parts, won't form a good joint, leaving voids and other defects in the joints that lead to early failure under normal circumstances. But when you add the extraordinary heat and mother board warpage that goes with it, well you get a catastrophic failure rate like we've all seen on 360.
Defective or insufficient heat sinks?
A heat sink like the one they eventually put on the GPU would have helped a lot, since it stops the GPU heat from warping the mobo and breaking the solder joints. The CPU heatsink was fine. I've heard the memory was running hot too, and contributing to these failures. Not sure if they were heated by contact with the GPU heatsink, proximity on the mother board, or both. But with the new GPU heatsink the failure rate probably would have still been double digits overall. Way too high still.
Corrupt BIOS or OS bricking the system?
Maybe. But haven't heard of this outside of the periodic dash updates bricking boxes.
Is humidity a factor? Are Xbox 360s in Florida just as likely as a 360 in Seattle?
Humidity is a co-factor with temperature for many failure modes. The hotter the room ambient conditions, the more likely a 360 is to fail, all else being equal. Same for humidity.
Is keeping the 360 horizontal more safe than keeping it vertical?
I don't think so. Vertical exposes more surface area and volume to heat exchange with cooler room air. And I think opens more vent holes. Just don't let it fall over.
System wide design problems due to a production schedule that shipped a full year before the competition's systems?
Yes. It just wasn't mature enough. Too many design defects, lack of design margins, immature test processes and equipment, insufficient PB free manufacturing expertise at partner manufacturers who made the mother board.
Or is there no one specific problem but a bunch of possible problem for each console?
Yes. See above.
Q: How have IBM and ATI dealt with the Xbox 360 problems?
Sorry, I don't know. But they were contracted to design and help launch the chips. After that, MS owned the design and tooling. So they didn't have to worry about it. Although I'm sure they were pulled in.
Q: Just what is up with the RROD "Towel Trick" fix?
My best guess is that it somewhat reflows the solder joints on the GPU while it's under a high compressive load from the heatsink clip, causing any open solder joints to make contact again. I don't think it's going to fully reflow them because 1) PB free solder melts above 300 degrees C, and if that happened the GPU would be pulled flat to the mother board with a big puddle of solder under it shorting everything out.
Q: One of the problems that I have run into my 360 is that the disk tray will fail to eject and not let me swap disks. Have any ideas?
LOL. Reboot and try it again! Sorry, couldn't help myself. You didn't give me enough info. How often does it happen? Notice any conditions that tend to make it happen more repeatably (after long play, unit standing up, right after a previous eject, etc.)? Can you recover and get the tray open at some other time after it fails? What did you have to do? It might be as simple as a bad connection somewhere in the circuit for the eject button. Usually I'd recommend percussive maintenance (hit it) but that would probably damage the disc and could damage the console. So don't. Maybe the disc is jammed in there. Does the tray try to come out and then stop? Maybe there is a misalignment with the box case. See if you can find a place where it might be catching. If you can't find the problem, bring it with you when we meet and I'll look at it.
Q: What do you think of the Karla Starr of the Seattle Weekly's article about video game hardware testing?
I read that when it came out. It's pretty accurate. I've been to VMC a few times where that testing is done. It's kinda brute force last stage game qual testing, after a lot of other testing has been done at the developer and MS. Funny, but you can only automate so much. And then you need to have people touch it and use it to find the unlikely bugs.
Q: How much more reliable are the current generation of Xbox 360 than the previous designs? Original Xenon, Zypher and Falcon.
I've heard that the failure rates for the current design is sub 10%. Much much better, but still too high imoh. And those designs haven't seen much life yet, so no one knows if that failure rate will hold.
Q: Do you think that the "Falcon" Xbox 360 design is the final Xbox 360 hardware iteration or will they come out with a redesigned Xbox 360?
They will come out with new hardware at least once a year until they retire this design. That's the console financial model. Keep the features and functionality the same, reduce cost and price, and improve quality if needed. The 360 roadmap always called for SI die shrink and integration, since that's where most of the cost is. Right now they are working to get the GPU and CPU on the same BGA package for the next mobo. Could lower cost, heat, number of heat sinks, mother board size (maybe squeeze the PS inside too), etc. Too bad that they screwed up and forgot to retain the JTAG IEEE 488 test functionality, at least what little they had. Now it will be almost impossible for them to tell if that chip is bad if the unit won't boot in the factory. So they will have to trouble shoot by replacing the most expensive part in the system blindly. They keep repeating bad decisions, and everyone is afraid to push issues considered to be bad news.
Q: Do you think that third party fans like the Nyko Intercooler will make things worse? Are they snake oil? I personally have plastic Tiki figures around my Xbox to ward off any evil spirits and so far they have done better in protecting than some of the fan coolers that you see at Gamestop.
I don't know, I'd have to test them. But I'll give you some thoughts. In order for those fans to do any good, they would have to increase the volume of air coming through the box w/o adding heat. I think those things are powered through the USB hub, which is specced at 5 volts, 1/2 an amp. So very little heat added. But the piggybacked fan would have to run at a higher volume that the box fan in order to unload it and make it spin faster, pulling more air over the heatsinks. Would be an easy test to run. Just tape a dry cleaning bag to the back with and w/o the extra fan and time how long to fill. Or if you have access to one, an anemometer is a test instrument that measures airflow and would give a more accurate reading.
Note : the Nyko Intercoolers draws power from the 360 power-source and it looks like surefire way to potentially make things worse.
Q: How many times does an Xbox 360 unit have to be sent in and repaired before they will replace it with a completely new unit?
That's not how it works. You send in a broken box, you get back a working box (hopefully). So there is a rotating stock of the original units that get repaired and returned to service. Plus, they keep finding these cashes of launch units here and there and using them too. Didn't you hear during the holidays that bundles were found with units made in 06? Those were pulled back from the retail channel last spring when the new heatsink was done, and had the new heatsink placed on them and then put into the shipping flow like any other box.
Back to the rotating inventory of launch units. You risk getting one of those back until the last one is out of the system. I imagine the next big outrage will be when some of the folks who waited till Falcon to buy a console for reliability reasons, and has to send it in for service, gets a Xenon back! Even when all of the Xenons are gone, you will likely get a newer gen repaired one back rather than new. Unless the fail rate gets so low there are none available. I'm holding my breath...
Q: How could the wireless racing wheel have overheating problems with the AC adapter? I can't think of any external video game accessory that had similar problems.
I don't know. I heard that one was an over reaction, and no test could have found it. That happens sometimes. A supplier changes something, or it happens so rarely that it can't be seen in any reasonable or even possible sample size. Like Xbox 1's catching on fire. That happened 25 times out of 25 million units. How can you test for that unless you know exactly what causes it? If you know, you design it out.
Q: The original Xbox had a recall of some of the power supply cords. Did that affect the design of the 360?
Safety became a paramount concern. We realized that we could meet all regulations and still have problems. So extra effort was made to have zero safety defects. See the comment about 25 fires from this, above.
Q: There has seemed to be an executive exodus from the top of the Xbox project. Seamus Blackley, Peter Moore, James Allard. Do you think that there something that has been causing the "fathers of Xbox" to want to move on?
Seamus left a long time ago, and I think there was some conflict so that it wasn't entirely voluntary. J Allard left to go do Zune (along with Greg Gibson), and is a big part of the team who owns the strategic vision of MS E&D under Robbie Bach. Peter was a surprise. He sure left in a hurry, and not the way top people usually go, which is usually with a longer notice. And right after the warranty extension announcement. I don't know if they are related, but it looks like they could be in some way. I noticed you didn't mention Ed Fries, who left in 04. I heard he landed at Sony, but can't verify. But I don't see the senior team wanting to move or moving. Very few people who leave do so voluntarily. Note: I did forget to mention Ed Fries.
Q: Do you see much of a long term future for Microsoft?s Entertainment & Devices Division? I saw that they just got a new campus and troubled projects rarely get new expensive buildings. Do you see that division ever turning a profit? So what do you think their overall hardware strategy is? Do you think that they will still be selling videogame systems and music players in five years?
Xbox's mission statement is to preserve the Windows monopoly and extend it into the living room, as a media extender for a Media Center PC, along with a host of other MS and other company's hardware devices that fit into a digital entertainment lifestyle. MS has the bucks to keep losing money on Xbox for a long time, maybe forever. They've already lost around 6 billion dollars. How are they ever going to make that back on Xbox? They can't. Maybe they don't think they have to. That amount might be just 1 or 2 quarters of profit for an integrated hw/sw portfolio, with windows, PC Hardware, Xbox, Zune, TV, Movies, ads, etc., all providing some revenue stream to MS. You should check out their jobs site sometime. You can learn a lot about what they are doing. And their patent applications. They have a team working on making PCs now. That voice activated thing they did for Ford? Where do you think you will see that next? MS devices and sw is my guess.
That new H&E campus says that MS is getting into consumer electronics in a big way, and you can bet they are working to refine a strategy of integrating their offerings into a digital lifestyle universe, with most everything covered that we could want to stay productive, connected and entertained. Not piece meal, like some companies seem to be approaching electronics. Look at Apple. They are doing great, keep rolling out innovative stuff, but what's their vision and strategy to implement? What's their roadmap and timeline? How does it all go together, work together? I can't tell from what they say or do. But I can see what MS is trying to do. They are just getting started I think. So yes, they will still be doing this in 5 years. But they really need to mature their business and change some blood in there. Hire some key people who have experience running large hardware companies who can put the right organization, process and infrastructure in place. If they don't, they may continue to have quality and operational issues that will really dampen their progress. And with all of the external challenges in consumer markets, even MS can't afford to be it's own enemy for too much longer.
Q: Do you think that there is going to be a third generation Xbox?
I understand they are working on it right now. But don't look for it any time soon. It's years away. News flash: Sony and Nintendo are working on their next boxes in some way too.
Q: So do you play games?
Just a little. I lack the hardware abstraction layer in my brain that allows me to translate body motion into controller commands. If I am playing a racing game and I want to turn right I tend to turn the controller to the right. Just like the Wii. Funny thing. In the middle of '03 I tried to convince our director of "innovation" that we needed to do motion control, simple and intuitive controllers, and focus on family oriented and just plain fun content. Well before the Wii came out. He completely disregarded it. Oh well. I bet they wish they had that decision back as a do over.
Laatst bewerkt: 19 jan 2008