24.11.2025

Behind the Algorithm: The Real Story Is the Data

While the race between leading AI models dominates the headlines, the question of data is pushed aside. In a study we conducted with dozens of AI and data experts, we found that sophisticated algorithms and advanced models are not what determine success or failure, but rather the quality of the data throughout the entire lifecycle of an AI project. 

What Do AI and Data Experts Around the World Have in Common? 

To understand where the AI world stands today, you have to talk to the people on the ground. And that’s exactly what we did. 

In a recent study published by Dr. Maayan Nakash and myself (“Behind the Algorithm: International Insights into Data-Driven AI Model Development,” conducted with the support of Tomer Yatzkan), we analysed dozens of in-depth interviews with senior AI and data leaders -CTOs, CAIOs, CDOs, engineers, and researchers from a variety of industries and countries. Our goal was to map how experts work with data across the AI lifecycle, and where their most significant bottlenecks lie. 

The picture that emerged was remarkably consistent: working with data is labour-intensive, iterative, and requires coordination across teams, infrastructure, and organizational procedures. More importantly, it is data – not the “magic” of algorithms -that ultimately determines the quality and trustworthiness of AI systems. 

We spoke with AI and data experts from the U.S., Europe, India, and Israel, and all of them emphasized the same point: no matter how advanced the model is, algorithmic sophistication is not the decisive factor in determining whether an AI system succeeds or fails. What truly drives performance, cost, and trust, both in the system and in the organization, is the data infrastructure: the quality of the data across the system’s entire lifecycle. 

Where Does the Problem Come From? 

Data in an organization is not a one-time “input”, it is a living supply chain – collection, cleaning, integration, labeling, documentation, and more. At every junction, distortions can emerge: missing fields in new databases, inconsistent labeling across teams, incomplete metadata that hides crucial context, operational drift, and more. 

Data is a living, breathing infrastructure. It changes in real time, carries human biases, and is shaped by cultural, business, and regulatory environments. Without clear visibility into how data flows throughout the organization, and without the ability to continuously improve it (not to mention allocating proper resources), systems often end up learning from outdated, faulty, or corrupted data. Such systems inevitably push the organization away from its desired business outcomes. 

The AI Lifecycle Model 

Data is not just a raw material—it’s the foundation of the entire product. 

In practical business terms, this leads to: 

  • Performance degradation: Noise or “dirty” data directly harms model predictions, leading to partial or incorrect insights—sometimes in critical contexts.
    Loss of valuable information: Without proper monitoring, key fields are lost along the way. Any missing critical data point weakens the decision-making foundation and prevents the system from reaching its full potential.
    Operational overhead: Teams find themselves stuck in a marathon of “data firefighting,” lacking real observability into data quality, pipeline flow, and the condition of each checkpoint.
    Erosion of trust: Repeated errors or unexplained decisions undermine confidence from customers, users, boards, and even regulators. Lack of clarity in the data is fertile ground for uncertainty and reputational risk. 

A Paradigm Shift: From Model-Centric AI to Data-Centric AI 

One of the most interesting findings in our research is the gap between academic discourse and real-world practice. Although data is the “beating heart” of AI systems, most research remains model-centric, with only a small portion focused on data preparation, validation, and quality. In reality, however, practitioners spend most of their time on the painstaking, Sisyphean work of handling data. 

Thankfully, the industry is beginning to shift. 

Our research highlights the transition from model-first to data-first development. For years, the common assumption was that better models could fix everything. Today, AI experts overwhelmingly agree: improving data quality yields more meaningful, more cost-effective, and more trustworthy performance gains than model upgrades. 

The bottom line is simple: a model can be replaced with the click of a button; data culture must be built.
The organizations that will thrive in the AI era are those that recognize that the real, complex, and mission-critical work lies in the data. Those who embrace a data-first mindset today will gain more reliable, fair, and profitable systems tomorrow. Those who overlook data quality are playing “technological roulette” — except here, when the house loses, all of us pay the price. 

 

Based on an qualitative study conducted with dozens of senior data and AI leaders worldwide. For further reading:
Ziv, L., & Nakash, M. (2025). Behind the Algorithm: International Insights into Data-Driven AI Model Development. Machine Learning and Knowledge Extraction, 7(4), 122.
For more information: https://www.linkedin.com/in/limorziv 

 

Accessibility