Monday, July 15, 2024

Synthesized Google storage I/O traces for systems research

Advanced systems research: Community access to synthesized Google storage I/O traces.
Designing large-scale distributed storage systems requires thorough understanding of storage hardware and software interactions in real life. Google just provided synthesized Google I/O traces for storage servers and discs to help researchers. ASPLOS 2024 released “Thesios: Synthesizing Accurate Counterfactual I/O Traces from I/O Samples,” their contribution.

I/O traces: what are they and why are they important?

I/O traces of storage devices and servers’ input/output processes are essential for studying storage behavior and performance. It’s that capture the various patterns and demands of exascale data centers like Google’s are valuable. Studying these remnants lets researchers:

Understand storage system bottlenecks and performance. Create more accurate models and workload simulations.

Thesios: Making Accurate Counterfactual I/O Traces from Samples

Designing large distributed storage systems requires representative I/O modelling. Counterfactual “what-if” evaluations of new storage rules or hardware before deployment are crucial use cases. They propose Thesis to accurately synthesize hypothetical full-resolution it by integrating down-sampled traces from several discs installed on various storage servers.

Applying this approach to real-world traces routinely sampled at Google, they show that their synthesized traces achieve 95–99.5% accuracy in read/write request numbers, 90–97% accuracy in utilization, and 80–99.8% accuracy in read latency compared to disc metrics.

I/O traces

Four case studies show how Thesis can synthesize and analyze counterfactual I/O traces for hypothetical policy, hardware, and server changes:

  • Changing disc utilization, fullness, and capacity.
  • Evaluating new data placement policy.
  • Analyzing the power and performance effects of deploying discs with reduced RPM.
  • Understanding the impact of increased buffer cache size on a sty Counterfactual assessments would need expensive and risky A/B experiments in production without Thesios.
  • Optimize storage systems for efficiency and reliability.
  • High-quality I/O traces are difficult to collect due to storage-system heterogeneity and the necessity to capture information with little overhead.
  • Thesios, their new technique, addresses these concerns.

what is synthesize

Introducing I/O trace synthesis method Thesios

Google Cloud created Thesios to build accurate and representative I/O traces. Google’s data centres collect down sampled it from numerous discs across different storage servers, which Thesios uses.

The capacity to synthesise counterfactual I/O traces for data-driven “what-if” research is unique to Thesios. Four case studies show how Thesios allows different counterfactual I/O-trace synthesis and evaluations of possible policy, hardware, and server changes:

  • Synthesizing disc I/O traces with putative capacities, utilization, and fullness.
  • Using different workload filtering criteria to form hot and cold discs and analyzing their power consumption effects.
  • Assessing the energy and latency effects of a low-RPM disc.
  • Estimating how increasing server buffer cache size affects cache hits.

Why share these traces?

Two-month-long synthesised typical traces from three Google storage clusters with 2.5 billion I/O data were released. This trace includes user-facing and internal application I/O operations. Google share realistic workloads from their huge data centres to feed storage-systems research.

  • Encourage storage technology optimizations and developments.
  • Allow more accurate large-scale storage system simulations and modelling.
  • Show how industry and academia may safely share production traces, boosting collaboration and progress.

Thesios: Making Accurate Counterfactual I/O Traces from Samples

They invite systems researchers to examine Google I/O traces. They think these traces are a unique chance to explore large-scale storage and make real progress. This repository contains Thesios-generated Google storage server and disc I/O traces.

Thesios: synthesises representative it from down-sampled I/O traces from several Google distributed storage system discs (HDDs) attached to numerous storage servers.

A research paper or project titled “Thesios: Making Accurate Counterfactual I/O Traces from Samples” focuses on creating accurate counterfactual input/output (I/O) traces using samples. Key points presumably addressed in such a publication include the following:

The primary objective is to provide precise counterfactual I/O traces, or hypothetical situations that represent the possible behaviors of system I/O. Debugging, performance optimization, and comprehending system behavior can all benefit from this.


Sample Techniques: It’s likely that the publication covers a range of sample strategies for gathering I/O traces. This could involve using stratified sampling, random sampling, or other statistical techniques to provide objective and representative data.

Counterfactual Generation: It probably describes how the sampled data was utilized to create counterfactual traces. This can entail simulating the behavior of the system and forecasting its reactions to various inputs or circumstances.


Performance Analysis: By studying counterfactual I/O traces, researchers and engineers can predict how changes to the system (e.g., hardware upgrades, software enhancements) could effect performance.

Debugging: Counterfactual analysis can help find the fundamental causes of difficulties by revealing how the system might behave differently if certain variables were modified.

System Optimization: Insights from counterfactual I/O traces can inform optimizations, helping to enhance efficiency and performance.


Accuracy: One major problem is making sure that the counterfactual traces accurately represent possible real-world situations. Most likely, the paper covers techniques to confirm and validate the generated traces’ accuracy.

Complexity: Modeling complicated systems and generating counterfactuals can be computationally costly. The paper may discuss how to deal with this difficulty.

Impact of the Conclusion: A discussion of the implications of accurate counterfactual I/O trace creation for system design, analysis, and optimization is probably where the paper ends.

Future Work: Developing sample strategies, honing modelling methodologies, or broadening the use of counterfactual I/O analysis are some ideas for future research.

Thota nithya
Thota nithya
Thota Nithya has been writing Cloud Computing articles for govindhtech from APR 2023. She was a science graduate. She was an enthusiast of cloud computing.


Please enter your comment!
Please enter your name here

Recent Posts

Popular Post Would you like to receive notifications on latest updates? No Yes