
Unlocking Creativity with Azure OpenAI
By :

Some applications require synchronous request handling, also known as real-time inferencing, where immediate responses are necessary. However, there are numerous situations wherein responses can be deferred or rate limits may restrict the speed at which multiple queries can be processed. In such cases, batch processing jobs prove useful, particularly for tasks such as the following:
The AOAI Batch API provides a user-friendly suite of endpoints. These allow you to bundle multiple requests into a single file, initiate a batch job to process these requests asynchronously, check the batch’s status as the tasks run, and, finally, retrieve the consolidated results once processing is complete.
Compared to traditional PAUG deployments, the Batch API offers the following: