July 12, 2011

Divide to conquer: Parallel load distribution and balance

My first encounter with a machine with several CPUs was back in the university where I developed a paper about Load Balancing in a Massive Parallel Processing environment. I was trying to develop an algorithm that would balance load amongst CPUs in a dynamic way. I guess the purpose was to investigate more than to create, but two decades later the challenges of Load Balancing (LB) in parallel processing still remain a key aspect to a successful use of several processing units. In times like these never the ability to scale in a linear way were so important, hence dividing is still the most effective way to conquer!

In Oracle database, parallel processing, specifically in the query processing arena, is of utmost importance. The last big machine I had my hands on, an X2-8 Exadata, has nothing less than 128 Cores of processing and the smallest error in balancing the load has an immediate effect.

The Parallel Query (PQ) model of processing was introduced in Oracle7 and the basic mechanism still remains the same as you can see here. In this documentation you can read the following:

"(...) there are actually eight query servers involved in the query even though the degree of parallelism is four. This is because a parent and child operator can be performed at the same time."

Later on in Oracle 11g the same piece of documentation referred to this relationship of parenthood in a different manner:

"(...) there are actually eight parallel execution servers involved in the query even though the DOP is 4. This is because a producer and consumer operator can be performed at the same time (inter-operation parallelism)"

My point is that, it's not enough to parallel your load if it's not balanced. With this in mind I think that your top priorities regarding PQ processing should be:

- Does your statement produce a good amount of scattered scanning in order to take advantage of a parallel operation?

- Is your operation load intensive or join intensive?

- How data is distributed amongst producer and consumer parallel processes?

The purpose of this post is to make you aware of the 3rd question!

The same doc I've quoted, has a section about "Balancing the Workload" from where you can read the common sense concept of "workload skewing, which occurs when some parallel execution servers perform significantly more work than the other processes". I bet that most of you out there would think that the Oracle optimizer would always do an equitative distribution of load among parallel processes, but I have news for you: it tries its best, and sometimes "best" and "optimal" are not the same. That's where the human knowledge of the nature of data comes in handy.

You can control this process in three ways:

- Through the wise use of the Degree Of Parallelism (DOP) versus the number of partitions in a table. using a 15 DOP in a 16 partitions table like the doc refers to is a classical error that occurs more times than you might think!

- Enabling correlated (incremental) stats where they apply (the classical example: Car Manufacturer and Model: When I say "Camaro" I don't need to scan the word "Chevrolet"!)

- Using the hint PQ_DISTRIBUTE, but like any other hint please test it thoroughly before you use it in production.

My goal was to make you aware that parallel distribution of load is important, and from here on I recommend that you read some more about it, and if you can, try it too with your data (not in production please!). So here's a list I've put for your reading pleasure:

* About the PQ_DISTRIBUTE hint

* Blog entries about the subject:
The Cost of PX Distribution Method by Christian Antognini

Parallel query distribution methods by Tony Hasler

* Oracle Forum discussion thread about PQ_DISTRIBUTE



No comments:

Post a Comment