The Talent500 Blog
Comprehensive AI Forecasting: Salesforce's Innovative Approach 1

Comprehensive AI Forecasting: Salesforce’s Innovative Approach

Salesforce operates data centers worldwide, continuously monitoring infrastructure health metrics in real-time. Accurate demand forecasting is essential for provisioning infrastructure capacity. Insufficient capacity can lead to customer-impacting incidents, while excess capacity may cause budget overruns. Teams such as capacity planning, finance, and performance engineering depend on reliable forecasts to ensure cloud infrastructure scales effectively, maintaining high availability and cost efficiency.

Expanding AI Capabilities

In early 2023, the Infrastructure Data Science (InfraDS) team faced a challenge: expanding infrastructure health forecasting to cover all 100+ services at Salesforce, rather than the five critical services previously focused on. Drastically scaling the number of data scientists was clearly not the right answer. Instead, the team built a new configuration-driven Time Series Forecasting Platform designed to manage this increased scale. As a result, the platform’s capabilities have grown from five to over 70 forecasting use cases, generating millions of time series forecasts daily.

Moreover, the time required to deploy new models has decreased from weeks to days. This expansion illustrates how Salesforce has successfully scaled its time series artificial intelligence platform to meet the demands of its multi-cloud billion-dollar infrastructure.

AI driven dashboard

Challenges in AI Forecasting

Forecasting at scale presents unique challenges due to the lack of a universal modeling approach. Each new use case compels data scientists to balance model accuracy, hierarchical coherence, awareness of concept drift, and resilience. For instance, stability becomes crucial in long-range forecasts, which often incorporate economic drivers as inputs.

In contrast, short-range forecasts need to adapt swiftly to data drift and typically exclude much exogenous information. Foundational models like Moirai and TimesFM offer flexible, zero-shot frameworks that handle diverse use cases, showing great promise for generic forecasting problems. However, new time series models will always emerge, and unlike complex models used in fields such as NLP and Computer Vision, simpler models like ARIMA, Prophet, and XGBoost remain essential in time series forecasting for their interpretability and low overhead.

Managing a high volume of these simpler, personalized models — one for each dataset — poses its own set of challenges, making rapid iteration from local experiments to production imperative.

Innovative Solutions for AI Scalability

Inspired by “human-centric” frameworks like Metaflow, the decision was made to standardize and abstract common data and compute infrastructure requirements for each forecasting project while maintaining flexibility in algorithmic development. By hiding engineering tooling behind a YAML interface, data scientists can focus on modeling, and platform maintainers ensure the infrastructure operates seamlessly.

This approach provides a unified interface for the entire model lifecycle, including backtesting, distributed processing, and deployment, with built-in security and scalability. InfraDS simplified its approach to managing models at scale by treating them as cattle instead of pets.

Key Design Principles

  • Reproducible Environments: Transitioning a model from local to production—and vice versa—should be a straightforward operation to encourage rapid experimentation.
  • Configuration-as-Code: Infrastructure implementations such as network connectivity and compute scaling should be hidden from users.
  • Autonomy: Data scientists should have tools enabling end-to-end ownership of their models with robust monitoring capabilities.

Despite unique considerations in time series modeling — such as hierarchical reconciliation and seasonality — these integrate well into core abstractions without limiting data scientists to a single model type:

  • Data: Utilize SQL queries with Jinja templating for ingesting raw metrics.
  • Algorithms: Focus on feature engineering and adjusting model hyperparameters.
  • Post Processing: Set alert thresholds for predicting potential issues.
  • Model Evaluation: Track accuracy metrics for continuous monitoring.
  • Orchestration: Manage scheduling dependencies efficiently.

This structured yet flexible approach ensures effective forecasting services while empowering data scientists.

AI led team collaboration

Ensuring High Availability and Accuracy

Safety Guarantees: Ensuring robust production code quality was essential. Utilizing modern Python tooling like Mypy established strong code safety guarantees with minimal performance impact. Extensible Model Selection: The system was designed to easily incorporate new models or algorithms without extensive re-architecting. Both custom models and off-the-shelf options are supported.

Flexible Compute Backends: Seamless integrations with multiple compute backends are enabled for balancing development velocity with production scalability. Salesforce’s innovative approach demonstrates how artificial intelligence can be leveraged effectively across diverse applications while maintaining reliability and efficiency.

For more information, follow the link.

For more artificial intelligence related articles, follow this link here.

0
Avatar

Vishal Singh

Add comment