Elevated request errors

Downtime

Elevated request errors

Jun 16 at 10:00am PDT

Affected services

app.cluely.com

platform.cluely.com

Resolved
Jun 16 at 02:50pm PDT

On Monday, June 16, 2025, Cluely experienced a service interruption that affected our platform and enterprise customers for approximately 7 hours. The incident began at 8:17 AM PST when our monitoring systems detected elevated error rates across our infrastructure.

Our investigation revealed that a coordinated attack generated over 1 million malicious requests per hour, deliberately targeting our API endpoints. This attack pattern was designed to exploit potential weaknesses in our traffic management systems and successfully triggered cascading failures across our database layer and authentication infrastructure.

Our engineering team executed a systematic response protocol. We immediately implemented enhanced traffic filtering and activated our distributed denial-of-service protection systems. Within the first hour, we had deployed adaptive rate limiting across all platform endpoints with intelligent request validation.

The primary resolution involved upgrading our database infrastructure to a more resilient architecture that provides superior connection management and eliminates the single points of failure present in our previous configuration. We also optimized our authentication systems by implementing stateless session management, which resolved compatibility issues and improved overall system performance.

We have strengthened our infrastructure with comprehensive improvements designed to prevent similar incidents. Our enhanced monitoring systems now provide real-time threat detection with automated response capabilities. We've implemented mandatory performance validation for all platform updates and established rigorous capacity planning protocols.

Additionally, we've deployed advanced caching mechanisms for session management and authentication processes, significantly improving both performance and resilience. Our rollout procedures now include staged deployment with comprehensive load validation at each phase.

This incident has reinforced our commitment to providing enterprise-grade reliability. The infrastructure improvements and operational enhancements we've implemented represent a significant advancement in our platform's resilience and performance capabilities. We appreciate our customers' patience during this incident and remain dedicated to delivering the robust, reliable service they depend on.

Updated
Jun 16 at 02:14pm PDT

Resolution has been implemented and we are continuing to monitor service stability.

Updated
Jun 16 at 11:42am PDT

We have identified the root cause of the issue. It will be resolved soon. Thanks for your patience!

Created
Jun 16 at 10:00am PDT

We are currently experiencing service disruptions due to an unexpected surge in API traffic. At approximately 10:00 PST, our platform began receiving significantly higher than normal request volumes, which has triggered our rate limiting protections and is causing connection timeouts for some users.

Current Impact:
- Users may experience slow loading times or connection errors
- Some API requests are being rate limited
- Intermittent service availability across the platform

What We're Doing:
Our engineering team is actively working to scale our infrastructure to handle the increased load and adjust rate limiting thresholds where appropriate. We are monitoring the situation closely and implementing additional capacity.

Timeline:
- 10:00 PST: Traffic spike detected, rate limiting activated
- 10:20 PST: Connection timeout errors began occurring
- 10:30 PST: Engineering team notified and investigating

We apologize for any inconvenience and will provide updates as we work to resolve this issue.