
Unlocking Creativity with Azure OpenAI
By :

Azure OpenAI Pay-as-you-go model is a shared tenant GPU infrastructure for inferencing. Therefore, AOAI service has some service limit on how you can use this resource. In this section we will describe the various limits and quotas for different AOAI model and how to prevent throttling by following some best practices.
At this time of writing, each Azure subscription can access up to 30 OpenAI resources per region. For DALL-E models, the default quota limits are 2 concurrent requests for DALL-E 2 and 2 capacity units (equivalent to 6 requests per minute) for DALL-E 3. Whisper, another model, has a limit of 3 requests per minute. The maximum number of prompt tokens per request varies by model, and more detailed information can be found in the Azure OpenAI Service models documentation in the given link: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=python-secure%2Cglobal-standard%2Cstandard-chat-completions#gpt...