Markdown Version | Session Recording
Session Date/Time: 24 Jul 2025 07:30
fantel
Summary
This meeting was a BoF (Birds of a Feather) session to discuss the need for network notifications, particularly in the context of AI/ML workloads and other high-volume, low-latency applications. The session included presentations on use cases, motivations, and problem statements, followed by open discussion and polling to gauge community interest. While there was some support, the overall consensus was that the problem statement needs further refinement and clarification before a working group could be formed.
Key Discussion Points
- AI/ML Networking Requirements: Networks are growing rapidly, especially for AI/ML workloads. GPU clusters with hundreds of thousands of GPUs require high-performance, low-latency networks that are sensitive to slowdowns and packet loss. Current networking technologies may not be sufficient to meet these demands.
- Use Cases: Presented use cases included intra-datacenter AI/ML traffic, inter-datacenter AI/ML traffic, and general interconnection scenarios (e.g., peering, CDN caching, edge computing).
- Network Notification Definition: The meeting defined network notification as a mechanism to send specific network information from one node to one or a group of nodes.
- Problem Statement: Existing solutions (IGP/BGP, telemetry) have limitations in terms of speed, event-driven notification, and interoperability. There is a potential vendor lock-in issue with current solutions.
- Data Plane vs. Control Plane: Discussions centered on the need for data plane notifications to achieve the required speed and responsiveness. However, the interplay between data plane and control plane was also acknowledged.
- Action and Scope: Several participants emphasized the importance of defining the actions triggered by network notifications and the scope of the notifications.
- Multi-tenancy and Security: Multi-tenancy and security considerations were raised as important factors in designing a network notification mechanism.
- Congestion Control Collaboration: It was mentioned that in any future design of notification mechanisms, collaboration is required with the congestion control working group to ensure the end result will improve the network.
- Targeted Notification: Several people noted the importance of not broadcasting network notifications without a clear idea of who would consume them.
- Multi-Hop Challenges: Concern was raised about the challenges of distributing notifications to multiple hops, and the potential for unintended outcomes or instability.
Decisions and Action Items
- Action Item: Proponents need to refine the problem statement, clarifying the scope, target audience, and required actions related to network notifications.
- Action Item: Proponents need to conduct gap analysis to justify their approach and show the community why existing technologies aren't sufficient.
Next Steps
- Continue discussion on the mailing list.
- Refine the problem statement based on feedback from the BoF session.
- Potentially revisit the topic in a future IETF meeting after addressing the identified issues.