
Unlocking Creativity with Azure OpenAI
By :

Quota lets you control how the rate limits are distributed among the deployments in your subscription. In this section we will show you how to manage your Azure OpenAI quota.
Azure OpenAI’s quota feature allows you to allocate rate limits to your deployments, up to an overall limit known as your “quota.” This quota is assigned to your subscription on a per-region, per-model basis and is measured in Tokens-per-Minute (TPM). When you create Azure OpenAI service, you receive a default quota for most of the available models (refer to the previous section for default quotas for each model).
As you create deployments, you’ll assign TPM to each one, and the available quota for that model will decrease by the assigned amount. You can continue to create and assign TPM to deployments until you reach your quota limit. Once the quota is reached, you can only create new deployments of that model by reallocating TPM from existing deployments...