From: John Hawkes <hawkes@oss.sgi.com>

A large number of processes that are pinned to a single CPU results in
every other CPU's load_balance() seeing this overloaded CPU as "busiest",
yet move_tasks() never finds a task to pull-migrate.  This condition occurs
during module unload, but can also occur as a denial-of-service using
sys_sched_setaffinity().  Several hundred CPUs performing this fruitless
load_balance() will livelock on the busiest CPU's runqueue lock.  A smaller
number of CPUs will livelock if the pinned task count gets high.  This
simple patch remedies the more common first problem: after a move_tasks()
failure to migrate anything, the balance_interval increments.  Using a
simple increment, vs.  the more dramatic doubling of the balance_interval,
is conservative and yet also effective.

Signed-off-by: John Hawkes <hawkes@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/kernel/sched.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff -puN kernel/sched.c~sched-improved-load_balance-tolerance-for-pinned-tasks kernel/sched.c
--- 25/kernel/sched.c~sched-improved-load_balance-tolerance-for-pinned-tasks	2004-10-21 14:54:28.489592416 -0700
+++ 25-akpm/kernel/sched.c	2004-10-21 14:54:28.495591504 -0700
@@ -1974,11 +1974,19 @@ static int load_balance(int this_cpu, ru
 			 */
 			sd->nr_balance_failed = sd->cache_nice_tries;
 		}
-	} else
-		sd->nr_balance_failed = 0;
 
-	/* We were unbalanced, so reset the balancing interval */
-	sd->balance_interval = sd->min_interval;
+		/*
+		 * We were unbalanced, but unsuccessful in move_tasks(),
+		 * so bump the balance_interval to lessen the lock contention.
+		 */
+		if (sd->balance_interval < sd->max_interval)
+			sd->balance_interval++;
+	} else {
+                sd->nr_balance_failed = 0;
+
+		/* We were unbalanced, so reset the balancing interval */
+		sd->balance_interval = sd->min_interval;
+	}
 
 	return nr_moved;
 
_