![]() ![]() Finally, we also show the proposed cost benefit of temperature optimizations that increase hard disk drive reliability. Chassis knobs like disk placement and fan speeds have a larger impact on temperature. We corroborate our findings from the real data study and show that workload knobs show minimal impact on temperature. We then experimentally evaluate knobs that control disk drive temperature, including workload and chassis design knobs. ![]() We also explore workload impacts on temperature and disk failures and show that the impact of workload is not significant. We establish that variations in temperature are not significant in datacenters and have little impact on failures. We show that temperature exhibits a stronger correlation to failures than the correlation of disk utilization with drive failures. We specifically establish correlation between temperatures and failures observed at different location granularities: (a) inside drive locations in a server chassis, (b) across server locations in a rack, and (c) across multiple racks in a datacenter. We present a dense storage case study from a population housing thousands of servers and tens of thousands of disk drives, hosting a large-scale online service at Microsoft. In this work, we focus on the interrelationship between temperature, workload, and hard disk drive failures in a large scale datacenter. Among different server components, hard disk drives are known to contribute significantly to server failures however, there is very little understanding of the major determinants of disk failures in datacenters. ![]() A large datacenter facility incurs increased maintenance costs in addition to service unavailability when there are increased failures. With the advent of cloud computing and online services, large enterprises rely heavily on their datacenters to serve end users. It is an original work carried out at the Center for Advanced Life Cycle Engineering, University of Maryland. Having more recorded attributes with critical values leads to label more ST3000DM001 drives as failed while there might be the hard drives from the other brands or part numbers that experienced more critical SMART attributes but were not labeled as failed because of the lack of records. Backblaze recorded 25 SMART attributes in total for all hard disk drive brands where ST3000DM001 having 83.3% of the attributes ranked as the drive with the most attributes recorded. Additionally, 8% of all ST3000DM001 drives that Backblaze labeled as failed did not have raw values above zero for the five attributes that were considered critical. Therefore, it is possible that some SMART attributes have experienced critical values that have not been recorded by Backblaze. The analysis showed that when Backblaze started to record the data, the hard disk drives had already worked for a while with power on hours mean and standard deviation of 6,683 and 365 h, respectively. A case study on the actual use of SMART and the limitations of the SMART attribute information, the data center’s information and the use of desktop drives in a commercial application are also presented. SMART attributes used for predicting failure are discussed and analyzed over the life of many hard drives. This paper aims to analyze the reported failures Backblaze data set for ST3000DM001 HDDs intended for desktop applications within a data center application. ![]() Self-monitoring, analysis and reporting technology (SMART) continuously provides attribute information on HDD usage and degradation characteristics. The reliability of hard disk drives (HDDs) is dependent on the drive construction, as well as the operational and environmental conditions, in which the drive is used. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |