Performance tuning: latency and throughput
Search latency is the measure of time between the start of a query and the delivery of a search result. Throughput is the number of actions the system can sustain in a given time period. Ideally, we want latency to be one second or less and throughput, obviously, to be as high as possible. Both of these measurements are important factors in determining your hardware needs and infrastructure design.
Latency and throughput are correlated; both depend heavily on the available system resources, for example, CPU, IO, memory, and so on. In a single instance, lower latency will naturally increase throughput. When lowering latency is not attainable, we will have to resort to add more hardware to boost the number of instances to serve concurrently. Hardware sizing, at this point, will be a simple math of the desired throughput divided by a single instance's throughput to determine the number of servers that are needed for the search.
How to do it…
There are...