Descarga la aplicación para disfrutar aún más
Esta es una vista previa del archivo. Inicie sesión para ver el archivo original
RedBlack.pdf Left-Leaning Red-Black Trees Robert Sedgewick Princeton University Original version: Data structures seminar at Dagstuhl (Feb 2008) • red-black trees made simpler (!) • full delete() implementation This version: Analysis of Algorithms meeting at Maresias (Apr 2008) • back to balanced 4-nodes • back to 2-3 trees (!) • scientific analysis Addendum: observations developed after talk at Maresias Java code at www.cs.princeton.edu/~rs/talks/LLRB/Java Movies at www.cs.princeton.edu/~rs/talks/LLRB/movies http://www.cs.princeton.edu/~rs/talks/LLRB/Java http://www.cs.princeton.edu/~rs/talks/LLRB/Java http://www.cs.princeton.edu/~rs/talks/LLRB/Java http://www.cs.princeton.edu/~rs/talks/LLRB/Java Introduction 2-3-4 Trees Red-Black Trees Left-Leaning RB Trees Deletion Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Red-black trees are now found throughout our computational infrastructure Textbooks on algorithms Library search function in many programming environments Popular culture (stay tuned) Worth revisiting? Introduction . . . . . . Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Red-black trees are now found throughout our computational infrastructure Typical: Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Digression: Red-black trees are found in popular culture?? Mystery: black door? Mystery: red door? An explanation ? Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Primary goals Red-black trees (Guibas-Sedgewick, 1978) • reduce code complexity • minimize or eliminate space overhead • unify balanced tree algorithms • single top-down pass (for concurrent algorithms) • find version amenable to average-case analysis Current implementations • maintenance • migration • space not so important (??) • guaranteed performance • support full suite of operations Worth revisiting ? Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Primary goals Red-black trees (Guibas-Sedgewick, 1978) • reduce code complexity • minimize or eliminate space overhead • unify balanced tree algorithms • single top-down pass (for concurrent algorithms) • find version amenable to average-case analysis Current implementations • maintenance • migration • space not so important (??) • guaranteed performance • support full suite of operations Worth revisiting ? YES. Code complexity is out of hand. Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Introduction 2-3-4 Trees LLRB Trees Deletion Analysis 2-3-4 Tree Generalize BST node to allow multiple keys. Keep tree in perfect balance. Perfect balance. Every path from root to leaf has same length. Allow 1, 2, or 3 keys per node. • 2-node: one key, two children. • 3-node: two keys, three children. • 4-node: three keys, four children. W smaller than K larger than R between K and R K R C E M O A D L N Q S V Y ZF G J Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Search in a 2-3-4 Tree Compare node keys against search key to guide search. Search. • Compare search key against keys in node. • Find interval containing search key. • Follow associated link (recursively). W smaller than M found L between K and R C E M O A D L N Q S V Y ZF G J K REx: Search for L Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insertion in a 2-3-4 Tree Add new keys at the bottom of the tree. Insert. • Search to bottom for key. W smaller than K C E M O A D L N Q S V Y ZF G J K REx: Insert B smaller than C B not found Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insertion in a 2-3-4 Tree Add new keys at the bottom of the tree. Insert. • Search to bottom for key. • 2-node at bottom: convert to a 3-node. W smaller than K C E M O D L N Q S V Y ZF G J K REx: Insert B smaller than C B fits here A B Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insertion in a 2-3-4 Tree Add new keys at the bottom of the tree. Insert. • Search to bottom for key. W larger than R C E M O A D L N Q S V Y ZF G J K REx: Insert X larger than W X not found Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insertion in a 2-3-4 Tree Add new keys at the bottom of the tree. Insert. • Search to bottom for key. • 3-node at bottom: convert to a 4-node. W larger than R C E M O A D L N Q S VF G J K REx: Insert X larger than W X fits here X Y Z Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insertion in a 2-3-4 Tree Add new keys at the bottom of the tree. Insert. • Search to bottom for key. W smaller than K C E M O A D L N Q S V Y ZF G J K REx: Insert H larger than E H not found Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insertion in a 2-3-4 Tree Add new keys at the bottom of the tree. Insert. • Search to bottom for key. • 2-node at bottom: convert to a 3-node. • 3-node at bottom: convert to a 4-node. • 4-node at bottom: no room for new key. W smaller than K C E M O A D L N Q S V Y ZF G J K REx: Insert H larger than E no room for H Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Splitting 4-nodes in a 2-3-4 tree is an effective way to make room for insertions C E D F G JA B H does not fit here D C E G A B H does fit here ! F J move middle key to parent split remainder into two 2-nodes D C E G A B F H J Problem: Doesn’t work if parent is a 4-node Bottom-up solution (Bayer, 1972) • Use same method to split parent • Continue up the tree while necessary Top-down solution (Guibas-Sedgewick, 1978) • Split 4-nodes on the way down • Insert at bottom Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Splitting 4-nodes on the way down ensures that the “current” node is not a 4-node Transformations to split 4-nodes: local transformations that work anywhere in the tree Invariant: “Current” node is not a 4-node Consequences: • 4-node below a 4-node case never happens • Bottom node reached is always a 2-node or a 3-node Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Splitting a 4-node below a 2-node is a local transformation that works anywhere in the tree could be huge unchanged D Q K Q W D K W A-C E-J L-P R-V X-Z A-C E-J L-P R-V X-Z Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Splitting a 4-node below a 3-node is a local transformation that works anywhere in the tree could be huge unchanged K Q W K W A-C I-J L-P R-V X-Z I-J L-P R-V X-Z E-G D H A-C E-G D H Q Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Growth of a 2-3-4 tree happens upwards from the bottom insert A insert S insert E insert R split 4-node to and then insert insert C insert D tree grows up one level insert I A A S A E S E A R S E A S E R SA C E R SA C D E A C D I R S Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Growth of a 2-3-4 tree (continued) happens upwards from the bottom split 4-node to and then insert tree grows up one level split 4-node to and then insert split 4-node to and then insert E C R E R I S A D C E R E A C D I R S E R A C D I N insert N insert B insert X C E R S SD I N D I NA B S X C R E A B Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Balance in 2-3-4 trees Key property: All paths from root to leaf are the same length Tree height. • Worst case: lg N [all 2-nodes] • Best case: log4 N = 1/2 lg N [all 4-nodes] • Between 10 and 20 for 1 million nodes. • Between 15 and 30 for 1 billion nodes. Guaranteed logarithmic performance for both search and insert. Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Direct implementation of 2-3-4 trees is complicated because of code complexity. Maintaining multiple node types is cumbersome. • Representation? • Need multiple compares to move down in tree. • Large number of cases for splitting. • Need to convert 2-node to 3-node and 3-node to 4-node. Bottom line: Could do it, but stay tuned for an easier way. private void insert(Key key, Val val) { Node x = root; while (x.getTheCorrectChild(key) != null) { x = x.getTheCorrectChild(key); if (x.is4Node()) x.split(); } if (x.is2Node()) x.make3Node(key, val); else if (x.is3Node()) x.make4Node(key, val); return x; } fantasy code Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Red-black trees (Guibas-Sedgewick, 1978) 1. Represent 2-3-4 tree as a BST. 2. Use "internal" red edges for 3- and 4- nodes. Key Properties • elementary BST search works • easy to maintain a correspondence with 2-3-4 trees (and several other types of balanced trees) C E D F G JA B 3-node 4-node or B C D E FG J A Note: correspondence is not 1-1. (3-nodes can lean either way) A C D E FG J B B C D E FG JA C D E FG J A B Many variants studied ( details omitted. ) NEW VARIANT (this talk): Left-leaning red-black trees Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Left-leaning red-black trees 1. Represent 2-3-4 tree as a BST. 2. Use "internal" red edges for 3- and 4- nodes. 3. Require that 3-nodes be left-leaning. Key Properties • elementary BST search works • easy-to-maintain 1-1 correspondence with 2-3-4 trees • trees therefore have perfect black-link balance 3-node C E D F G JA B B C D E F J G A 4-node Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Left-leaning red-black trees 1. Represent 2-3-4 tree as a BST. 2. Use "internal" red edges for 3- and 4- nodes. 3. Require that 3-nodes be left-leaning. Disallowed • right-leaning 3-node representation • two reds in a row standard red-black trees allow this one single-rotation trees allow all of these original version of left-leaning trees used this 4-node representation 3-node 4-node Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Java data structure for red-black trees public class BST<Key extends Comparable<Key>, Value> { private static final boolean RED = true; private static final boolean BLACK = false; private Node root; private class Node { Key key; Value val; Node left, right; boolean color; Node(Key key, Value val, boolean color) { this.key = key; this.val = val; this.color = color; } } public Value get(Key key) // Search method. public void put(Key key, Value val) // Insert method. } color of incoming link private boolean isRed(Node x) { if (x == null) return false; return (x.color == RED); } helper method to test node color constants adds one bit for color to elementary BST data structure B C D E F J G A Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Search implementation for red-black trees is the same as for elementary BSTs ( but typically runs faster because of better balance in the tree). Important note: Other BST methods also work • order statistics • iteration public Value get(Key key) { Node x = root; while (x != null) { int cmp = key.compareTo(x.key); if (cmp == 0) return x.val; else if (cmp < 0) x = x.left; else if (cmp > 0) x = x.right; } return null; } public Key min() { Node x = root; while (x != null) x = x.left; if (x == null) return null; else return x.key; } BST (and LLRB tree) search implementation Ex: Find the minimum key B C D E F J G A Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insert implementation for LLRB trees is best expressed in a recursive implementation Note: effectively travels down the tree and then up the tree. • simplifies correctness proof • simplifies code for balanced BST implementations • could remove recursion to get stack-based single-pass algorithm private Node insert(Node h, Key key, Value val) { if (h == null) return new Node(key, val); int cmp = key.compareTo(h.key); if (cmp == 0) h.val = val; else if (cmp < 0) h.left = insert(h.left, key, val); else h.right = insert(h.right, key, val); return h; } associative model (no duplicate keys) Recursive insert() implementation for elementary BSTs Nonrecursive Recursive . . . Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Balanced tree code is based on local transformations known as rotations In red-black trees, we only rotate red links (to maintain perfect black-link balance) private Node rotateLeft(Node h) { Node x = h.right; h.right = x.left; x.left = h; x.color = x.left.color; x.left.color = RED; return x; } h F Q x F Q private Node rotateRight(Node h) { Node x = h.left; h.left = x.right; x.right = h; x.color = x.right.color; x.right.color = RED; return x; } A-E G-P R-Z R-Z A-E G-P x F Q h F Q A-E G-P R-Z R-Z A-E G-P Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insert a new node at the bottom in a LLRB tree follows directly from 1-1 correspondence with 2-3-4 trees 1. Add new node as usual, with red link to glue it to node above 2. Rotate if necessary to get correct 3-node or 4-node representation rotate right rotate left rotate left Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Splitting a 4-node is accomplished with a color flip Flip the colors of the three nodes Key points: • preserves prefect black-lin balance • passes a RED link up the tree • reduces problem to inserting (that link) into parent private Node colorFlip(Node h) { x.color = !x.color; x.left.color = !x.left.color; x.right.color = !x.right.color; return x; } h M Q N-P R-Z E F-LA-D h M Q N-P R-Z E F-LA-D Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Splitting a 4-node in a LLRB tree follows directly from 1-1 correspondence with 2-3-4 trees 1. Flip colors, which passes red link up one level 2. Rotate if necessary to get correct representation in parent (using precisely the same transformations as for insert at bottom) Parent is a 2-node: two cases rotate left color flip color flip Introduction 2-3-4 Trees LLRB Trees Deletion Analysis rotate right Splitting a 4-node in a LLRB tree Parent is a 3-node: three cases rotate left rotate right color flip color flip color flip follows directly from 1-1 correspondence with 2-3-4 trees 1. Flip colors, which passes red link up one level 2. Rotate if necessary to get correct representation in parent (using precisely the same transformations as for insert at bottom) Introduction 2-3-4 Trees LLRB Trees Deletion Analysis NEW TRICK: Do rotates on the way UP the tree. • left-rotate any right-leaning link on search path • right-rotate top link if two reds in a row found • trivial with recursion (do it after recursive calls) • no corrections needed elsewhere Inserting and splitting nodes in LLRB trees are easier when rotates are done on the way up the tree. Search as usual • if key found reset value, as usual • if key not found insert new red node at the bottom • might leave right-leaning red or two reds in a row higher up in the tree Split 4-nodes on the way down the tree. • flip color • might leave right-leaning red or two reds in a row higher up in the tree or Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insert code for LLRB trees is based on four simple operations. or if (h == null) return new Node(key, value, RED); 1. Insert a new node at the bottom. if (isRed(h.left) && isRed(h.right)) colorFlip(h); 2. Split a 4-node. if (isRed(h.right)) h = rotateLeft(h); 3. Enforce left-leaning condition. could be right or left if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); 4. Balance a 4-node. Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insert implementation for LLRB trees is a few lines of code added to elementary BST insert private Node insert(Node h, Key key, Value val) { if (h == null) return new Node(key, val, RED); if (isRed(h.left) && isRed(h.right)) colorFlip(h); int cmp = key.compareTo(h.key); if (cmp == 0) h.val = val; else if (cmp < 0) h.left = insert(h.left, key, val); else h.right = insert(h.right, key, val); if (isRed(h.right)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); return h; } split 4-nodes on the way down insert at the bottom standard BST insert code fix right-leaning reds on the way up fix two reds in a row on the way up Introduction 2-3-4 Trees LLRB Trees Deletion Analysis LLRB (top-down 2-3-4) insert movie Introduction 2-3-4 Trees LLRB Trees Deletion Analysis A surprise Q. What happens if we move color flip to the end? private Node insert(Node h, Key key, Value val) { if (h == null) return new Node(key, val, RED); if (isRed(h.left) && isRed(h.right)) colorFlip(h); int cmp = key.compareTo(h.key); if (cmp == 0) h.val = val; else if (cmp < 0) h.left = insert(h.left, key, val); else h.right = insert(h.right, key, val); if (isRed(h.right)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); return h; } Introduction 2-3-4 Trees LLRB Trees Deletion Analysis A surprise Q. What happens if we move color flip to the end? private Node insert(Node h, Key key, Value val) { if (h == null) return new Node(key, val, RED); int cmp = key.compareTo(h.key); if (cmp == 0) h.val = val; else if (cmp < 0) h.left = insert(h.left, key, val); else h.right = insert(h.right, key, val); if (isRed(h.right)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); if (isRed(h.left) && isRed(h.right)) colorFlip(h); return h; } Introduction 2-3-4 Trees LLRB Trees Deletion Analysis A surprise Q. What happens if we move color flip to the end? A. It becomes an implementation of 2-3 trees (!) private Node insert(Node h, Key key, Value val) { if (h == null) return new Node(key, val, RED); int cmp = key.compareTo(h.key); if (cmp == 0) h.val = val; else if (cmp < 0) h.left = insert(h.left, key, val); else h.right = insert(h.right, key, val); if (isRed(h.right)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); if (isRed(h.left) && isRed(h.right)) colorFlip(h); return h; } Insert in 2-3 tree: attach new node with red link 2-node → 3-node 3-node → 4-node split 4-node pass red link up to parent and repeat no 4-nodes left! Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Insert implementation for 2-3 trees (!) is a few lines of code added to elementary BST insert private Node insert(Node h, Key key, Value val) { if (h == null) return new Node(key, val, RED); int cmp = key.compareTo(h.key); if (cmp == 0) h.val = val; else if (cmp < 0) h.left = insert(h.left, key, val); else h.right = insert(h.right, key, val); if (isRed(h.right)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); if (isRed(h.left) && isRed(h.right)) colorFlip(h); return h; } insert at the bottom standard BST insert code fix right-leaning reds on the way up fix two reds in a row on the way up split 4-nodes on the way up Introduction 2-3-4 Trees LLRB Trees Deletion Analysis LLRB (bottom-up 2-3) insert movie Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Why revisit red-black trees? Which do you prefer? private Node insert(Node x, Key key, Value val, boolean sw) { if (x == null) return new Node(key, value, RED); int cmp = key.compareTo(x.key); if (isRed(x.left) && isRed(x.right)) { x.color = RED; x.left.color = BLACK; x.right.color = BLACK; } if (cmp == 0) x.val = val; else if (cmp < 0)) { x.left = insert(x.left, key, val, false); if (isRed(x) && isRed(x.left) && sw) x = rotR(x); if (isRed(x.left) && isRed(x.left.left)) { x = rotR(x); x.color = BLACK; x.right.color = RED; } } else // if (cmp > 0) { x.right = insert(x.right, key, val, true); if (isRed(h) && isRed(x.right) && !sw) x = rotL(x); if (isRed(h.right) && isRed(h.right.right)) { x = rotL(x); x.color = BLACK; x.left.color = RED; } } return x; } private Node insert(Node h, Key key, Value val) { if (h == null) return new Node(key, val, RED); int cmp = key.compareTo(h.key); if (cmp == 0) h.val = val; else if (cmp < 0) h.left = insert(h.left, key, val); else h.right = insert(h.right, key, val); if (isRed(h.right)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); if (isRed(h.left) && isRed(h.right)) colorFlip(h); return h; } very tricky straightforward Left-Leaning Red-Black Trees Robert Sedgewick Princeton University Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Why revisit red-black trees? Take your pick: 46 33150 lines of code for insert (lower is better!) TreeMap.java Adapted from CLR by experienced professional programmers (2004) wrong scale! Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Why revisit red-black trees? 1972 1978 2008 LLRB implementation is far simpler than previous attempts. • left-leaning restriction reduces number of cases • recursion gives two (easy) chances to fix each node • take your pick: top-down 2-3-4 or bottom-up 2-3 Improves widely used implementations • AVL, 2-3, and 2-3-4 trees • red-black trees Same ideas simplify implementation of other operations • delete min, max • arbitrary delete Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Why revisit red-black trees? 1972 1978 2008 LLRB implementation is far simpler than previous attempts. • left-leaning restriction reduces number of cases • recursion gives two (easy) chances to fix each node • take your pick: top-down 2-3-4 or bottom-up 2-3 Improves widely used implementations • AVL, 2-3, and 2-3-4 trees • red-black trees Same ideas simplify implementation of other operations • delete min, max • arbitrary delete Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Lessons learned from insert() implementation also simplify delete() implementations 1. Color flips and rotations preserve perfect black-link balance. 2. Fix right-leaning reds and eliminate 4-nodes on the way up. Delete strategy (works for 2-3 and 2-3-4 trees) • invariant: current node is not a 2-node • introduce 4-nodes if necessary • remove key from bottom • eliminate 4-nodes on the way up private Node fixUp(Node h) { if (isRed(h.right)) h = rotateLeft(h); if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h); if (isRed(h.left) && isRed(h.right)) colorFlip(h); return h; } rotate-left right-leaning reds rotate-right red-red pairs split 4-nodes Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Warmup 1: delete the maximum 1. Search down the right spine of the tree. 2. If search ends in a 3-node or 4-node: just remove it. 3. Removing a 2-node would destroy balance • transform tree on the way down the search path • Invariant: current node is not a 2-node Note: LLRB representation reduces number of cases (as for insert) combine siblings borrow from sibling Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Warmup 1: delete the maximum by carrying a red link down the right spine of the tree. Invariant: either h or h.right is RED Implication: deletion easy at bottom 1. Rotate red links to the right 2. Borrow from sibling if necessary • when h.right and h.right.left are both BLACK • Two cases, depending on color of h.left.left h h private Node moveRedRight(Node h) { colorFlip(h); if (isRed(h.left.left)) { h = rotateRight(h); colorFlip(h); } return h; } rotate right Harder case: h.left.left is RED h.right.right turns RED h.right is RED h Easy case: h.left.left is BLACK color flip h color flip after flip h Introduction 2-3-4 Trees LLRB Trees Deletion Analysis deleteMax() implementation for LLRB trees is otherwise a few lines of code public void deleteMax() { root = deleteMax(root); root.color = BLACK; } private Node deleteMax(Node h) { if (isRed(h.left)) h = rotateRight(h); if (h.right == null) return null; if (!isRed(h.right) && !isRed(h.right.left)) h = moveRedRight(h); h.left = deleteMax(h.left); return fixUp(h); } remove node on bottom level (h must be RED by invariant) borrow from sibling if necessary move down one level fix right-leaning red links and eliminate 4-nodes on the way up lean 3-nodes to the right Introduction 2-3-4 Trees LLRB Trees Deletion Analysis deleteMax() example 1 A B C E F G I J K M N O D H L A B C E F G I J K M N O D H L A B C E F G I J K M N O D H L A B C E F G I J K M N O D H L A B C E F G I J K M N D H L 1 2 3 4 5 5 6 7 push reds down fix right-leaning reds on the way up remove maximum A B C E F G I J K M N D H L A B C E F G I J K M N D H L A B C E F G I J K M N D H L (nothing to fix!) Introduction 2-3-4 Trees LLRB Trees Deletion Analysis deleteMax() example 2 1 2 push reds down fix right-leaning reds on the way up remove maximum A B C E F G I J K M N D H L A B C E F G I J K M N D H L 3 A B C E F G I J K M N D H L 4 A B C E F G I J K N M D H L 4 A B C E F G I J K N M D H L 5 A B C E F G M D H I J K L A B C E F G I J K D H L 6 M Introduction 2-3-4 Trees LLRB Trees Deletion Analysis LLRB deleteMax() movie Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Warmup 2: delete the minimum is similar but slightly different (since trees lean left). Invariant: either h or h.left is RED Implication: deletion easy at bottom Borrow from sibling • if h.left and h.left.left are both BLACK • two cases, depending on color of h.right.left private Node moveRedLeft(Node h) { colorFlip(h); if (isRed(h.right.left)) { h.right = rotateRight(h.right); h = rotateLeft(h); colorFlip(h); } return h; } rotate right h h rotate left Harder case: h.right.left is RED h.left.left turns RED h h.left is RED h Easy case: h.right.left is BLACK color flip h h color flip after flip Introduction 2-3-4 Trees LLRB Trees Deletion Analysis deleteMin() implementation for LLRB trees is a few lines of code public void deleteMin() { root = deleteMin(root); root.color = BLACK; } private Node deleteMin(Node h) { if (h.left == null) return null; if (!isRed(h.left) && !isRed(h.left.left)) h = moveRedLeft(h); h.left = deleteMin(h.left); return fixUp(h); } remove node on bottom level (h must be RED by invariant) push red link down if necessary move down one level fix right-leaning red links and eliminate 4-nodes on the way up Introduction 2-3-4 Trees LLRB Trees Deletion Analysis deleteMin() example A B C E F G I J K M N O D H L A B C E F G I J K M N O D H L A B C E F G I J K M N O D H L A B C E F G I J K M N O D H L B C E F G I J K M N O D H L B C E F G I J K M N O D H L C E F G I J K M N O D H L B B D E G I J K M N O F H L C B D E G I J K M N OF H L C 1 2 3 4 5 5 6 7 8 push reds down fix right-leaning reds on the way up remove minimum Introduction 2-3-4 Trees LLRB Trees Deletion Analysis LLRB deleteMin() movie Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Deleting an arbitrary node involves the same general strategy. 1. Search down the left spine of the tree. 2. If search ends in a 3-node or 4-node: just remove it. 3. Removing a 2-node would destroy balance • transform tree on the way down the search path • Invariant: current node is not a 2-node Difficulty: • Far too many cases! • LLRB representation dramatically reduces the number of cases. Q: How many possible search paths in two levels ? A: 9 * 6 + 27 * 9 + 81 * 12 = 1269 (! !) Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Deleting an arbitrary node reduces to deleteMin() A standard trick: F G N MK J IG F EC B A L H D to delete D replace its key, value with those of its successor then delete the successor deleteMin(right child of D) flip colors, delete node fix right-leaning red link h.key = min(h.right); h.value = get(h.right, h.key); h.right = deleteMin(h.right); h N MK J IG F EC B A L H E N MK J IG F EC B A L H E N MK J IC B A L H E Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Deleting an arbitrary node at the bottom can be implemented with the same helper methods used for deleteMin() and deleteMax(). Invariant: h or one of its children is RED • search path goes left: use moveRedLeft(). • search path goes right: use moveRedRight(). • delete node at bottom • fix right-leaning reds on the way up B D E G I J K M N OF I L C B D E G J K M N O F I L C Introduction 2-3-4 Trees LLRB Trees Deletion Analysis delete() implementation for LLRB trees private Node delete(Node h, Key key) { int cmp = key.compareTo(h.key); if (cmp < 0) { if (!isRed(h.left) && !isRed(h.left.left)) h = moveRedLeft(h); h.left = delete(h.left, key); } else { if (isRed(h.left)) h = leanRight(h); if (cmp == 0 && (h.right == null)) return null; if (!isRed(h.right) && !isRed(h.right.left)) h = moveRedRight(h); if (cmp == 0) { h.key = min(h.right); h.value = get(h.right, h.key); h.right = deleteMin(h.right); } else h.right = delete(h.right, key); } return fixUp(h); } push red right if necessary LEFT move down (left) push red right if necessary RIGHT or EQUAL move down (right) replace current node with successor key, value delete successor EQUAL (at bottom) delete node rotate to push red right EQUAL (not at bottom) fix right-leaning red links and eliminate 4-nodes on the way up Introduction 2-3-4 Trees LLRB Trees Deletion Analysis LLRB delete() movie Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Alternatives Red-black-tree implementations in widespread use: • are based on pseudocode with “case bloat” • use parent pointers (!) • 400+ lines of code for core algorithms Left-leaning red-black trees • you just saw all the code • single pass (remove recursion if concurrency matters) • <80 lines of code for core algorithms • less code implies faster insert, delete • less code implies easier maintenance and migration insert delete helper insert delete helper accomplishes the same result with less than 1/5 the code 1972 1978 2008 Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Worst-case analysis follows immediately from 2-3-4 tree correspondence 1. All trees have perfect black balance. 2. No two red links in a row on any path. Shortest path: lg N (all black) Longest path: 2 lg N (alternating red-black) Theorem: With red-black BSTs as the underlying data structure, we can implement an ordered symbol-table API that supports insert, delete, delete the minimum, delete the maximum, find the minimum, find the maximum, rank, select the kth largest, and range count in guaranteed logarithmic time. Red-black trees are the method of choice for many applications. Introduction 2-3-4 Trees LLRB Trees Deletion Analysis One remaining question that is of interest in typical applications The number of searches far exceeds the number of inserts. Q. What is the cost of a typical search? A. If each tree node is equally likely to be sought, compute the internal path length of the tree and divide by N. Q. What is the expected internal path length of a tree built with randomly ordered keys (average cost of a search)? N: 8 internal path length: 0 + 1 + 1 + 2 + 2 + 2 + 2 + 3 = 13 average search cost: 13/8 = 1.625 0 1 1 2 2 2 2 3 Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Average-case analysis of balanced trees deserves another look! Main questions: Is average path length in tree built from random keys ~ c lg N ? If so, is c = 1 ? Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Average-case analysis of balanced trees deserves another look! Main questions: Is average path length in tree built from random keys ~ c lg N ? If so, is c = 1 ? Experimental evidence Ex: Tufte plot of average path length in 2-3 trees • N = 100, 200, . . . , 50,000 • 100 trees each size Tufte plot sample σ sample mean 50,000100 5 14 lg N − 1.5 Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Average-case analysis of balanced trees deserves another look! Main questions: Is average path length in tree built from random keys ~ c lg N ? If so, is c = 1 ? Experimental evidence strongly suggests YES! Ex: Tufte plot of average path length in 2-3 trees • N = 100, 200, . . . , 50,000 • 100 trees each size Tufte plot sample σ sample mean 50,000100 5 14 lg N − 1.5 Average path length in 2-3 tree built from random keys Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Experimental evidence can suggest and confirm hypotheses 50,000100 5 14 lg N − 1.5 50,000100 5 14 lg N − 1.5 Average path length in 2-3 tree built from random keys Average path length in (top-down) 2-3-4 tree built from random keys Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Average-case analysis of balanced trees deserves another look! Main questions: Is average path length in tree built from random keys ~ c lg N ? If so, is c = 1 ? Some known facts: • worst case gives easy 2 lg N upper bound • fringe analysis of gives upper bound of ck lgN with ck > 1 • analytic combinatorics gives path length in random trees Are simpler implementations simpler to analyze? Is the better experimental evidence that is now available helpful? A starting point: study balance at the root (left subtree size) Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Left subtree size in left-leaning 2-3 trees 6 12 12 72 48 288 144 288 2160 864 1152 864 4 5 6 7 Exact distributions Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Left subtree size in left-leaning 2-3 trees Limiting distribution? 64 7 smoothed version (32-64) Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Left subtree size in left-leaning 2-3 trees Tufte plot 64 7 Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Left subtree size in left-leaning 2-3 trees Tufte plot 500 100 view of highway for bus driver who has had one Caipirinha too many ? Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Left subtree size in left-leaning 2-3 trees Limiting distribution? 400 350 10,000 trees for each size smooth factor 10 Introduction 2-3-4 Trees LLRB Trees Deletion Analysis An exercise in the analysis of algorithms Find a proof ! 50,000100 5 14 lg N − 1.5 Average path length in 2-3 tree built from random keys Addendum: Observations Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Observation 1 The percentage of red nodes in a 2-3 tree is between 25 and 25.5% 50,000100 0 25 Percentage of red nodes in 2-3 tree built from random keys 25.38168 Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Observation 2 The height of a 2-3 tree is ~2 ln N (!!!) 50,000100 Height of a 2-3 tree built from random keys lg N - 1.5 Very surprising because the average path length in an elementary BST is also ~2 ln N ≈ 1.386 lg N 5 14 2 ln N 22 21.66990 Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Observation 3 The percentage of red nodes on each path in a 2-3 tree rises to about 25%, then drops by 2 when the root splits Introduction 2-3-4 Trees LLRB Trees Deletion Analysis Observation 4 In aggregate, the observed number of red links per path log-alternates between periods of steady growth and not-so-steady decrease (because root-split times vary widely) 500 0 rojinegros.pdf self-adjusting.pdf Self-Adjusting Binary Search Trees DANIEL DOMINIC SLEATOR A N D ROBERT ENDRE TARJAN A T&T Bell Laboratories, Murray Hill, NJ Abstract. The splay tree, a self-adjusting form of binary search tree, is developed and analyzed. The binary search tree is a data structure for representing tables and lists so that accessing, inserting, and deleting items is easy. On an n-node splay tree, all the standard search tree operations have an amortized time bound of @log n) per operation, where by “amortized time” is meant the time per operation averaged over a worst-case sequence of operations. Thus splay trees are as efficient as balanced trees when total running time is the measure of interest. In addition, for sufficiently long access sequences, splay trees are as efficient, to within a constant factor, as static optimum search trees. The efficiency of splay trees comes not from an explicit structural constraint, as with balanced trees, but from applying a simple restructuring heuristic, called splaying, whenever the tree is accessed. Extensions of splaying give simplified forms of two other data structures: lexicographic or multidimensional search trees and link/ cut trees. Categories and Subject Descriptors: E. 1 [Data]: Data Structures-trees; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems-sorting and searching General Terms: Algorithms, Theory Additional Key Words and Phrases: Amortized complexity, balanced trees, multidimensional searching, network optimization, self-organizing data structures 1. Introduction In this paper we apply the related concepts of amortized complexity and se& adjustment to binary search trees. We are motivated by the observation that the known kinds of efficient search trees have various drawbacks. Balanced trees, such as height-balanced trees [2, 221, weight-balanced trees [26], and B-trees [6] and their variants [5, 18, 19,241 have a worst-case time bound of @log n) per operation on an n-node tree. However, balanced trees are not as efficient as possible if the access pattern is nonuniform, and they also need extra space for storage of balance information. Optimum search trees [ 16,20,22] guarantee minimum average access time, but only under the assumption of fixed, known access probabilities and no correlation among accesses. Their insertion and deletion costs are also very high. Biased search trees [7, 8, 131 combine the fast average access time of optimum trees with the fast updating of balanced trees but have structural constraints even more complicated and harder to maintain than the constraints of balanced trees. Finger search trees [ 11, 14, 19, 23, 241 allow fast access in the vicinity of one or more “fingers” but require the storage of extra pointers in each node. Authors’ address: AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ 07974. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1985 ACM 0004-541 1/85/0700-0652 $00.75 Journal of the Association for Computing Machinery. Vol. 32, No. 3, July 1985, pp. 652-686. Self-Adjusting Binary Search Trees 653 These data structures are all designed to reduce the worst-case time per operation. However, in typical applications of search trees, not one but a sequence of operations is performed, and what matters is the total time the sequence takes, not the individual times of the operations. In such applications, a better goal is to reduce the amortized times of the operations, where by “amortized time” we mean the average time of an operation in a worst-case sequence of operations. One way to obtain amortized efficiency is to use a “self-adjusting” data structure. We allow the structure to be in an arbitrary state, but during each operation we apply a simple restructuring rule intended to improve the efficiency of future operations. Examples of restructuring heuristics that give amortized efficiency are the move-to-front rule on linear lists [9, 301 and path compression in balanced trees [33, 371. Self-adjusting data structures have several possible advantages over balanced or otherwise explicitly constrained structures: (i) In an amortized sense, ignoring constant factors, they are never much worse than constrained structures, and since they adjust according to usage, they can be much more efficient if the usage pattern is skewed. (ii) They need less space, since no balance or other constraint information is stored. (iii) Their access and update algorithms are conceptually simple and easy to implement. Self-adjusting structures have two potential disadvantages: (i) They require more local adjustments, especially during accesses (look-up operations). (Explicitly constrained structures need adjusting only during up- dates, not during accesses.) (ii) Individual operations within a sequence can be expensive, which may be a drawback in real-time applications. In this paper we develop and analyze the spluy tree, a self-adjusting form of binary search tree. The restructuring heuristic used in splay trees is spluying, which moves a specified node to the root of the tree by performing a sequence of rotations along the (original) path from the node to the root. In an amortized sense and ignoring constant factors, splay trees are as efficient as both dynamically balanced trees and static optimum trees, and they may have even stronger optimality properties. In particular, we conjecture that splay trees are as efficient on any sufficiently long sequence of aecesses as any form of dynamically updated binary search tree, even one tailored to the exact access sequence. The paper contains seven sections. In Section 2 we define splaying and analyze its amortized complexity. In Section 3 we discuss update operations on splay trees. In Section 4 we study the practical efficiency and ease of implementation of splaying and some of its variants. In Section 5 we explore ways of reducing the amount of restructuring needed in splay trees. In Section 6 we use extensions of splaying to simplify two more complicated data structures: lexicographic or mul- tidimensional search trees [ 15, 251 and link/cut trees [29, 341. In Section 7 we make some final remarks and mention several open problems, including a formal version of our dynamic optimality conjecture. The appendix contains our tree terminology. The work described here is a continuation of our research on amortized com- plexity and self-adjusting data structures, which has included an amortized analysis 654 D. D. SLEATOR AND R. E. TARJAN of the move-to-front list update rule [9,30] and the development of a self-adjusting form of heap [31]. Some of our results on self-adjusting heaps and search trees appeared in preliminary form in a conference paper [28]. A survey paper by the second author examines a variety of amortized complexity results [35]. 2. Splay Trees We introduce splay trees by way of a specific application. Consider the problem of performing a sequence of access operations on a set of items selected from a totally ordered universe. Each item may have some associated information. The input to each operation is an item; the output of the operation is an indication of whether the item is in the set, along with the associated information if it is. One way to solve this problem is to represent the set by a binary search tree. This is a binary tree containing the items of the set, one item per node, with the items arranged in symmetric order: If x is a node containing an item i, the left subtree of x contains only items less than i and the right subtree only items greater than i. The symmetric- order position of an item is one plus the number of items preceding it in symmetric order in the tree. The “search” in “binary search tree” refers to the ability to access any item in the tree by searching down from the root, branching left or right at each step according to whether the item to be found is less than or greater than the item in the current node, and stopping when the node containing the item is reached. Such a search takes 8 ( d ) time, where d is the depth of the node containing the accessed item. If accessing items is the only operation required, then there are better solutions than binary search trees, e.g., hashing. However, as we shall see in Section 3, binary search trees also support several useful update operations. Furthermore, we can extend binary search trees to support accesses by symmetric-order position. To do this, we store in each node the number of descendants of the node. Alternatively, we can store in each node the number of nodes in its left subtree and store in a tree header the number of nodes in the entire tree. When referring to a binary search tree formally, as in a computer program, we shall generally denote the tree by a pointer to its root; a pointer to the null node denotes an empty tree. When analyzing operations on binary search trees, we shall use n to denote the number of nodes and m to denote the total number of operations: Suppose we wish to carry out a sequence of access operations on a binary search tree. For the total access time to be small, frequently accessed items should be near the root of the tree often. Our goal is to devise a simple way of restructuring the tree after each access that moves the accessed item closer to the root, on the plausible assumption that this item is likely to be accessed again soon. As an O( 1)- time restructuring primitive, we can use rotation, which preserves the symmetric order of the tree. (See Figure 1 .) Allen and Munro [4] and Bitner [ 101 proposed two restructuring heuristics (see Figure 2): Single rotation. After accessing an item i in a node x, rotate the edge joining x to its parent (unless x is the root). Move to root. After accessing an item i in a node x, rotate the edge joining x to its parent, and repeat this step until x is the root. Self-Adjusting Binary Search Trees 655 ro tate right -L FIG. 1. Rotation of the edge joining nodes x and y. Triangles denote sub- trees. The tree shown can be a subtree of a larger tree. # a B C single rotation move to 4 B C D FIG. 2. Restructuring heuristics. The node accessed is a. Unfortunately, neither of these heuristics is efficient in an amortized sense: for each, there are arbitrarily long access sequences such that the time per access is O(n) [4]. Allen and Munro did show that the move-to-root heuristic has an asymptotic average access time that is within a constant factor of minimum, but only under the assumption that the access probabilities of the various items are fixed and the accesses are independent. We seek heuristics that have much stronger properties. Our restructuring heuristic, called splaying, is similar to move-to-root in that it does rotations bottom-up along the access path and moves the accessed item all the way to the root. But it differs in that it does the rotations in pairs, in an order that depends on the structure of the access path. To splay a tree at a node x, we repeat the following splaying step until x is the root of the tree (see Figure 3): Splaying Step Case 1 (zig). If p(x) , the parent of x, is the tree root, rotate the edge joining x with p(x ) . (This case is terminal.) 656 D. D. SLEATOR AND R. E. TARJAN C g A X B C % C D A A B C D FIG. 3. A splaying step. The node accessed is x. Each case has a symmetric variant (not shown). (a) Zig: terminating single rotation. (b) Zig-zig: two single rotations. (c) Zig-zag: double rotation. Case 2 (zig-zig). If p(x ) is not the root and x and Ax) are both left or both right children, rotate the edge joining p(x) with its grandparent g(x) and then rotate the edge joining x with p(x). Case 3 (zig-zag). If p(x ) is not the root and x is a left child and p(x ) a right child, or vice- versa, rotate the edge joining x with Ax) and then rotate the edge joining x with the new p(x). Splaying at a node x of depth d takes 8 ( d ) time, that is, time proportional to the time to access the item in x. Splaying not only moves x to the root, but roughly halves the depth of every node along the access path. (See Figures 4 and 5.) This halving effect makes splaying efficient and is a property not shared by other, simpler heuristics, such as move to root. Producing this effect seems to require! dealing with the zig-zig and zig-zag cases differently. We shall analyze the amortized complexity of splaying by using a potential function [3 1,351 to carry out the amortization. The idea is to assign to each possible configuration of the data structure a real number called its potential and to define the amortized time a of an operation by a = t + d’ - d, where t is the actual time of the operation, d is the potential before the operation, and d’ is the potential after the operation. With this definition, we can estimate the total time of a sequence of m operations by m m m I: t j = I: (aj + dj-i - 3) = j-1 j-I i-1 aj + +O - d m , SelflAdjusting Binary Search Trees E F FIG. 4. Splaying at node a. C D 657 FIG. 5. Extreme cases of splaying. (a) All zig-zig steps. (b) All zig-zag steps. where tj and aj are the actual and amortized times of operation j , % is the initial potential, and 3 for j L 1 is the potential after operation j. That is, the total actual time equals the total amortized time plus the net decrease in potential from the initial to the final configuration. If the final potential is no less than the initial potential, then the total amortized time is an upper bound on the total actual time. 658 D. D. SLEATOR A N D R. E. TARJAN To define the potential of a splay tree, we assume that each item i has a positive weight w(i), whose value is arbitrary but fixed. We define the size s(x) of a node x in the tree to be the sum of the individual weights of all items in the subtree rooted at x. We define the rank r(x) of node x to be log s(x).’ Finally, we define the potential of the tree to be the sum of the ranks of all its nodes. As a measure of the running time of a splaying operation, we use the number of rotations done, unless there are no rotations, in which case we charge one for the splaying. LEMMA 1 (ACCESS LEMMA). The amortized time to splay a tree with root t at a node x is at most 3( r(t) - r(x)) + 1 = O( log($( t)/s(x))). PROOF. If there are no rotations, the bound is immediate. Thus suppose there is at least one rotation. Consider any splaying step. Let s and s’, r and r’ denote the size and rank functions just before and just after the step, respectively. We show that the amortized time for the step is at most 3(r’(x) - r(x)) + 1 in case 1 and at most 3(r’(x) - r(x)) in case 2 or case 3. Let y be the parent of x and z be the parent of y (if it exists) before the step. Case 1. One rotation is done, so the amortized time of the step is 1 + r’(x) + r ’ (y ) - r(x) - r (y) since only x and y can change rank I 1 + r’(x) - r(x) since r(y) 2 r’(y) I 1 + 3(r’(x) - r(x)) since r’(x) L r(x). Case 2. Two rotations are done, so the amortized time of the step is 2 + r’(x) + r’(y) + r’(z) - r(x) - r(y) - r(z) since only x, y, and z can change rank since r’(x) = r(z) since r’(x) z r’Q) = 2 + r’(y) + r’(z) - r(x) - r(y) I 2 + r’(x) + r’(z) - 2r(x) and r(y) L r(x). We claim that this last sum is at most 3(r’(x) - r(x)), that is, that 2r’(x) - r(x) - r’(z) 2 2. The convexity of the log function implies that log x + log y for x, y > 0, x + y I 1 is maximized at value -2 when x = y = f. It follows that r(x) + r’(z) - 2r’(x) = log(s(x)/s’(x)) + log(s’(z)/s’(x)) I -2, since s(x) + s’(z) I s’(x). Thus the claim is true, and the amortized time in case 2 is at most 3(r’(x) - r(x)). Case 3. The amortized time of the step is, 2 + r’(x) + r’(y) + r’(2) - r(x) - r(y) - r(z) I 2 + r’(y) + r’(z) - 2r(x) since r’(x) = r(z) and r(x) I r b ) . We claim that this last sum is at most 2(r’(x) - r(x)), that is, that 2r’(x) - r’(y) - r’(z) 2 2. This follows as in case 2 from the inequality s’Q + s’(z) I s’(x). Thus the amortized time in case 3 is at most 2(r’(x) - r(x)) I 3(r’(x) - r(x)). The lemma follows by summing the amortized time estimates for all the splaying steps, since the sum telescopes to yield an estimate of at most 3(r’(x) - r(x)) + 1 = 3(r(t) - r(x)) + 1, where r and r’ are the rank functions before and afier the entire splaying operation, respectively. 0 ’ Throughout this paper we use binary logarithms. Self-Adjusting Binary Search Trees 659 The analysis in Lemma 1 shows that the zig-zig case is the expensive case of a splaying step. The analysis differs from our original analysis [28] in that the definition of rank uses the continuous instead of the discrete logarithm. This gives us a bound that is tighter by a factor of two. The idea of tightening the analysis in this way was also discovered independently by Huddleston [ 171. The weights of the items are parameters of the analysis, not of the algorithm: Lemma 1 holds for any assignment of positive weights to items. By choosing weights cleverly, we can obtain surprisingly strong results from Lemma 1 . We shall give four examples. Consider a sequence of m accesses on an n-node splay tree. In analyzing the running time of such a sequence, it is useful to note that if the weights of all items remain fixed, then the net decrease in potential over the sequence is at most CYEl log( W/w(i)), where W = zFEl w(i), since the size of the node containing item i is at most Wand at least w(i). THEOREM 1 (BALANCE THEOREM). The total access time is O((m + n)log n + m). PROOF. Assign a weight of l / n to each item. Then W = 1 , the amortized access time is at most 3 log n + 1 for any item, and the net potential drop overdhe sequence is at most n log n. The theorem follows. 0 9 For any item i, let q(i) be the access frequency of item i, that is, the total number of times i is accessed. THEOREM 2 (STATIC OPTIMALITY THEOREM). Zfevery item is accessed at least once, then the total access time is 0 m + q(i)log - . ( i:l (;J) PROOF. Assign a weight of q(i)/m to item i. Then W = 1, the amortized access time of item i is O(log(m/q(i))), and the net ootential drop over the sequence is at most XYEl log(m/q(i)). The theorem follows. 0 Assume that the items are numbered from 1 through n in symmetric order. Let the sequence of accessed items be il, i2, . . . , i,. THEOREM 3 (STATIC FINGER THEOREM). Zffis anyfixed item, the total access t ime i sO(n logn+m+Cgl log ( ) i , - f l + 1)). PROOF. Assign a weight of 1 / ( I i - f I + 1)2 to item i. Then W I 2CLl 1/k2 = O( l) , the amortized time of the j th access is O(log( I i, - f I + I ) ) , and the net potential drop over the sequence is O(n log n), since the weight of any item is at least l/n2. The theorem follows. 0 We can obtain another interesting result by changing the item weights as the accesses take place. Number the accesses from 1 to m in the order they occur. For any accessj, let t ( j ) be the number of different items accessed before access j since the last access of item i,, or since the beginning of the sequence i f j is the first of item i,. (Note that t ( j ) + i is the position of item i, in a linear list maintained by the move-to-front heuristic [30] and initialized in order of first access.) THEOREM 4 (WORKING SET THEOREM). The total access time is O(n log n + m + log(t(j) + 1 ) ) . 660 D. D. SLEATOR A N D R. E. TARJAN PROOF. Assign the weights 1, 1/4, 1/9, . . . , l/nZ to the items in order by first access. (The item accessed earliest gets the largest weight; any items never accessed get the smallest weights.) As each access occurs, redefine the weights as follows. Suppose the weight of item i / during accessj is Ilk’. After accessj, assign weight 1 to i,, and for each item i having a weight of l/(k’)’ with k’ < k, assign weight l/(k’ + 1)2 to i. This reassignment permutes the weights 1, 1/4, 1/9, . . . , l/n2 among the items. Furthermore, it guarantees that, during accessj, the weight of item i / will be l/(t(j) + 1)2. We have W = EL, l/k2 = O( l) , so the amortized time of accessj is O(log(t(j) + 1)). The weight reassignment after an access increases the weight of the item in the root (because splaying at the accessed item moves it to the root) and only decreases the weights of other items in the tree. The size of the root is unchanged, but the sizes of other nodes can decrease. Thus the weight reassignment can only decrease the potential, and the amortized time for weight reassignment is either zero or negative. The net potential drop over the sequence is q n log n). The theorem follows. 0 Let us interpret the meaning of these theorems. The balance theorem states that on a sufficiently long sequence of accesses a splay tree is as efficient as any form of uniformly balanced tree. The static optimality theorem implies that a splay tree is as efficient as any fixed search tree, including the optimum tree for the given access sequence, since by a standard theorem of information theory [ 1 J the total access time for any fixed tree is Q(m + C L q(i)log(m/q(i))). The static finger theorem states that splay trees support accesses in the vicinity of a fixed finger with the same efficiency as finger search trees. The working set theorem states that the time to access an item can be estimated as the logarithm of one plus the number of different items accessed since the given item was last accessed. That is, the most recently accessed items, which can be thought of as forming the “working set,” are the easiest to access. All these results are to within a constant factor. Splay trees have all these behaviors automatically; the restructuring heuristic is blind to the properties of the access sequence and to the global structure of the tree. Indeed, splay trees have all these behaviors simultaneously; at the cost of a constant factor we can combine all four theorems into one. THEOREM 5 (UNIFIED THEOREM). The total time of a sequence of m accesses on an n-node splay tree is m j - I l i , - f l + l , f ( j ) + l where f is anyfixed item. PROOF. Assign to each item a weight equal to the sum of the weights assigned to it in Theorems 2-4 and combine the proofs of these theorems. 0 Remark. Since I i j - f I < n, Theorem 5 implies Theorem 1 as well as Theorems 2-4. If each item is accessed at least once, the additive term n log n in the bound of Theorem 5 can be dropped. 3. Update Operations on Splay Trees Using splaying, we can implement all the standard update operations on binary search trees. We consider the following operations: access(& t ) : If item i is in tree t , return a pointer to its location; otherwise, return a pointer to the null node. SegAdjusting Binary Search Trees 66 1 FIG. 6. Splaying after an unsuccessful access of item 80. insert(i, t): delete(i, t): join(t,, t2): Insert item i in tree t , assuming that it is not there already. Delete item i from tree t , assuming that it is present. Combine trees t l and t2 into a single tree containing all items from both trees and return the resulting tree. This operation assumes that all items in t l are less than all those in t2 and destroys both t l and t2. split(& t): Construct and return two trees t l and t2, where f 1 contains all items in t less than or equal to i, and t2 contains all items in t greater than i. This operation destroys t. We can carry out these operations on splay trees as follows. To perform uc- cess(i, t), we search down from the root of 1, looking for i. If the search reaches a node x containing i, we complete the access by splaying at x and returning a pointer to x. If the search reaches the null node, indicating that i is not in the tree, we complete the access by splaying at the last nonnull node reached during the search (the node from which the search ran off the bottom of the tree) and returning a pointer to null. If the tree is empty, we omit the splaying operation. (See Figure 6.) Splaying’s effect of moving a designated node to the root considerably simplifies the updating of splay trees. It is convenient to implement insert and delete using join and split. To carry out join(tl, t2), we begin by accessing the largest item, say 662 D. D. SLEATOR AND R. E. TARJAN (b) FIG. 7. Implementation of join and split: (a) join(t,, t2 ) . (b) split ( i, I ). split n - A A - join A n t A - A n - (b) FIG. 8. Implementation of insertion and deletion using join and split (a) insert(i, I). (b) delete(i, I ) . i, in t l . After the access, the root of tl contains i and thus has a null right child. We complete the join by making t2 the right subtree of this root and returning the resulting tree. To carry out split(& t), we perform uccess(i, t) and then return the two trees formed by breaking either the left link or the right link from the new root oft, depending on whether the root contains an item greater than i or not greater than i . (See Figure 7 . ) In both join and split we must deal specially with the case of an empty input tree (or trees). To carry out insert(i, t), we perform split(i, t ) and then replace t by a tree consisting of a new root node containing i, whose left and right subtrees are the trees t l and t2 returned by the split. To carry out delete(i, t), we perform uc- cess(& t ) and then replace t by the join of its left and right subtrees. (See Figure 8.) There are alternative implementations of insert and delete that have slightly better amortized time bounds. To carry out insert(i, t), we search for i, then replace the pointer to null reached during the search by a pointer to a new node containing i, and finally splay the tree at the new node. To carry out delete(i, t), we search for the node containing i. Let this node be x and let its parent be y. We replace x as a child of y by the join of the left and right subtrees of x, and then we splay at y. (See Figure 9.) SelflAdjusting Binary Search Trees 663 SPLAY AT 80 ___c ,-J FIG. 9. Alternative implementations of insertion and deletion. Insertion of 80 fol- lowed by deletion of 30. In analyzing the amortized complexity of the various operations on splay trees, let us assume that we begin with a collection of empty trees and that every item is only in one tree at a time. We define the potential of a collection of trees to be the sum of the potentials of the trees plus the sum of the logarithms of the weights of all items not currently in a tree. Thus the net potential drop over a sequence of operations is at most &u log(w(i)/w’(i)), where U is the universe of possible items and w and w’, respectively, are the initial and final weight functions. In particular, if the item weights never change, the net potential change over the sequence is nonnegative. For any item i in a tree t , let i- and i+, respectively, denote the item preceding i and the item following i in t (in symmetric order). If i- is undefined, let w(i-) = 00; if i+ is undefined, let Mi+) = 00. LEMMA 2 (UPDATE LEMMA). The amortized times of the splay tree operations have the following upper bounds, where W is the total weight of the items in the 664 D. D. SLEATOR A N D R. E. TARJAN tree or trees involved in the operation: access( i, t): join(t t2): split(i, t): insert( i, t): deletdi, t): 3 l o g ( E ) + q l ) , where i is the last item in t,. w(i) 3 log - + o(1) i f i is in t; 3 log( mintw(i-), w(i+)) )+a1) ( w 9 i f i i s n o t i n t . W - w(i) 3 lo&) + 3 log( w(i-) ) + o(1). Increasing the weight of the item in the root of a tree t by an amount 6 takes at most log( 1 +a/ W ) amortized time, and decreasing the weight of any item takes negative amortized time. PROOF. These bounds follow from Lemma 1 and a straightforward analysis of the potential change caused by each operation. Let s be the size function just before the operation. In the case of an access or split, the amortized time is at most 3 log(s(t)/s(x)) + 1 by Lemma 1, where x is the node at which the tree is splayed. If item i is in the tree, it is in x, and s(x) 2 w(i). If i is not in the tree, either i- or i+ is in the subtree rooted at x, and s(x) L min(w(i-), Mi+)). This gives the bound on access and split. The bound on join is immediate from the bound on access: the splaying time is at most 3 log(s(tl)/w(i)) + 1, and the increase in potential caused by linking t l and t2 is (We have W = s(tl) + s(tz).) The bound on insert also follows from the bound on access; the increase in potential caused by adding a new root node containing i is The bound on delete is immediate from the bounds on access and join. Ll Self-Adjusting Binary Search Trees 665 Remark. The alternative implementations of insertion and deletion have the following upper bounds on their amortized times: These bounds follow from a modification of the proof of Lemma 1. For the case of equal-weight items, the alternative forms of insertion and deletion each take at most 3 log n + O(1) amortized time on a n-node tree. This bound was obtained independently by Huddleston [ 171 for the same insertion algorithm and a slightly different deletion algorithm. The bounds in Lemma 2 are comparable to the bounds for the same operations on biased trees [7, 8, 131, but the biased tree algorithms depend explicitly on the weights. By using Lemma 2, we can extend the theorems of Section 2 in various ways to include update operations. (An exception is that we do not see how to include deletion in a theorem analogous to Theorem 3.) We give here only the simplest example of such an extension. THEOREM 6 (BALANCE THEOREM WITH UPDATES). A sequence o f m arbitrary operations on a collection of initially empty splay trees takes O(m + EE, log n,) time, where n, is the number of items in the tree or trees involved in operation j . PROOF. Assign to each item a fixed weight of one and apply Lemma 2. 0 We can use splay trees as we would use any other binary search tree; for example, we can use them to implement various abstract data structures consisting of sorted sets or lists subject to certain operations. Such structures include dictionaries, which are sorted sets with access, insertion, and deletion, and concatenatable queues, which are lists with access by position, insertion, deletion, concatenation, and splitting [3, 221. We investigate two further applications of splaying in Section 6. 4. Implementations of Splaying and Its Variants In this section we study the implementation of splaying and some of its variants. Our aim is to develop a version that is easy to program and efficient in practice. As a programming notation, we shall use a version of Dijkstra’s guarded command language [ 121, augmented by the addition of procedures and “initializing guards” (G. Nelson, private communication). We restrict our attention to successful ac- cesses, that is, accesses of items known to be in the tree. Splaying, as we have defined it, occurs during a second, bottom-up pass over an access path. Such a pass requires the ability to get from any node on the access path to its parent. To make this possible, we can save the access path as it is traversed (either by storing it in an auxiliary stack or by using “pointer reversal” to encode it in the tree structure), or we can maintain parent pointers for every node in the tree. If space is expensive, we can obtain the effect of having parent pointers without using extra space, by storing in each node a pointer to its leftmost child and a pointer to its right sibling, or to its parent if it has no right sibling. (See Figure 10.) This takes only two pointers per node, the same as the standard left- child-right-child representation, and allows accessing the left child, right child, or 666 D. D. SLEATOR AND R. E. TARJAN FIG. 10. Leftmostchild-right-sibling representation of the original tree in Fig- ure 6. parent of any node in at most two steps. The drawback of this representation is loss of time efficiency. We shall assume that parent pointers are stored explicitly; however, our programs can easily be modified to use other ways of retrieving the access path. The following procedure is a straightforward implementation of splaying. To avoid an explicit test for p(x) = null, we assume that p(nul1) = lefi(nul1) = right(nul1) = null. procedure splay(x); (p(nu1l) = left(nul1) = right(nul1) = null) do x = left( p (x ) ) + if g(x) = null + rotate right (p (x ) ) 1 p(x ) = left( g(x)) + rotate right( g(x)); rotate right( p (x) ) I p(x) = right(&)) + rotate right(p(x)); rotate left(p(x)) fi I x = right(p(x)) + if g(x) = null + rotate left( p (x ) ) I p(x) = right(g(x)) + rotate ld (g (x ) ) ; rotate leJi(p(x)) I p(x) = left(g(x)) + rotate left(p(x)); rotate right(p(x)) fi end splay; od ( p ( x ) = null) The grandparent function g is defined as follows: function g(x); &= P ( P ( X ) ) end g ; The procedure rotate left(y) rotates the edge joining y and its right child. The procedure rotate right(y), whose implementation we omit, is symmetric. Remark. In the following procedure, the initializing guard “x, z: x = right@) and z = &),” which is always true, declares two local variables, x and z, and initializes them to right@) and &), respectively. Self-Adjusting Binary Search Trees procedure rotate left(y); if x, z: x = right(y) and z = &) -+ if z = null 4 skip I left(z) = y + left(z) := x I right(z) = y 4 right(z) := x fi; left(x), right( y ) := y , left(x); 667 P ( X ) , P ( Y ) := z, x ; if right( y ) = null + skip I right( y ) # null -+ p(rig/zt(y)) := y fi fi end rotate le j ; Inspection of the splay program raises the possibility of simplifying it by omitting the second rotation in the zig-zag case. The resulting program, suggested by M. D. McIlroy (private communication), appears below. procedure simple splay(x); {p(null) = leJ(nul1) = right(nul1) = null) do x = le$(p(x)) -+ if p(x ) = left(g(x)) + rotate right( g(x)) I P ( X ) z k m x ) ) -.) skip ti; rotate right( p(x) ) I x = right(p(x)) 4 if p(x) = right( g(x) ) + rotate lej( g (x) ) I P ( X ) f right(g(x)) -+ SltiP fi; rotate left( p (x) ) odlp(x) = null] end simple splay; An amortized analysis of simple splaying shows that Lemma 1 holds with a constant factor of 3.16+ in place of three. Thus the method, though simpler than the original method, has a slightly higher bound on the number of rotations. The advantage of implementing splaying using rotation as a primitive is that we can encapsulate all the restructuring, including any required updating of auxiliary fields, in the rotation procedures. The disadvantage is that many unnecessary pointer assignments are done. We can achieve greater efficiency by eliminating the superfluous assignments by expanding the rotation procedures in-line, simplifying, and keeping extra information in the control state. A bottom-up splaying method is appropriate if we have direct access to the node at which the splaying is to occur. We shall see an example of this in Section 6. However, splaying as used in Sections 2 and 3 only occurs immediately after an access, and it is more eficient to splay on the way down the tree during the access. We present a topdown version of splaying that works in this way. Suppose we wish to aocess an item i in a tree. During the access and concurrent splaying operation, the tree is broken into three parts: a left tree, a middle tree, and a right tree. The left and right trees contain all items in the original tree so far known to be less than i and greater than i, respectively. The middle tree consists of the subtree of the original tree rooted at the current node on the access path. Initially the current node is the tree root and the left and right trees are empty. To do the splaying, we search down from the root looking for i, two nodes at a time, breaking links along the access path and adding each subtree thus detached to the bottom right of the left tree or the bottom left of the right tree, with the proviso 668 D. D. SLEATOR AND R. E. TARJAN FIG. 1 1 . Topdown splaying step. Symmetric variants of cases are omitted. Initial left tree is L, right tree is R. Labeled nodes are on the access path. (a) Zig: Node y contains the accessed item. (b) Zig-zig. (c) Zig-zag. A B FIG. 12. Completion of top-down splaying. Node x contains the accessed item. that if we take two steps down in the same direction (left-left or right-right) we do a rotation before breaking a link. More formally, the splaying operation consists of repeatedly applying the appropriate case among those shown in Figure 11 until none applies, then reassembling the three remaining trees as shown in Figure 12. A similar togdown restructuring method, but without the rotations and consequently with poor amortized efficiency, was proposed by Stephenson [32]. The following program implements this method. The program uses three local variables, t , 1, and r, denoting the current vertex, the last node in symmetric order in the left tree, and the first node in symmetric order in the right tree, respectively. The updating of the% variables occurs in the primitive procedures. These are rotate ldt and rotate righf, which ro$ate the edge joining t and ’its right or left child, respectively; link ldt and link right, which break the link between t and its left or right child and attach the resulting tree to the left or right tree, respectively; and assemble, which completes the splaying by performing the assembly of Figure 12. The program contains an initializing guard that declares 1 and r and initializes Self-Adjusting Binary Search Trees 669 them both to null. After the first leA link, the right child of null is the root of the left tree; after the first right link, the left child of null is the root of the right tree. procedure topdown splay(i, t); if 1, r: 1 = r = null + left(nul1) := right(nul1) := null; do i < item@) 3 if i = item(ldf(t)) + link right I i c item(left(t)) + rotate right; link right I i > item(left(t)) + link right; link left fi I i > item(t) -P if i = item(right(t)) + link leji I i > item(right(t)) + rotate left; link left I i < item(right(t)) + link left; link right fi od{i = item(t)); assemble end topdown splay; ti Implementations of rotate left, link left, and assemble appear below; rotate right and link right, whose definitions we omit, are symmetric to rotate left and link left, respectively. procedure link leji; end link left; procedure rotate left; end rotate left; t, 1, righ.1) := right(t), t, t f , righqt), left(right(t)) := right(t), left(right(t)), t procedure assemble; end assemble; lefir), right(1) := right(t), l@t((t); lef(t), right(t) := right(null), leJ(nul1) We can simplify topdown splaying as we did bottom-up splaying, by eliminating the second linking operation in the zig-zag case. The resulting program is as follows: procedure simple topdown splafli, t); if 1, r: 1 = r = null -., left(nul1) := right(nul1) := null; do i < item(t) + if i < item(1ef.t)) + rotate right I i 2 item(leff(t)) + skip fi; link right I i > item(t) + if i > item(right(t)) + rotate left I i 5 item(right(f)) + skip fi; link left od{i = ztem(t)); assemble fi end simple topdown splay, 670 D. D. SLEATOR AND R. E. TARJAN Lemma 1 holds for both top-down splaying methods with an unchanged constant factor of three. It is easy to modify the program for either method to carry out the various tree operations discussed in Section 3. For these operations a top-down method is more efficient than a bottom-up method, but the choice between the topdown methods is unclear; the simple method is easier to program but probably somewhat less efficient. Experiments are needed to discover the most practical splaying method. 5 . Semi-adjusting Search Trees A possible drawback of splaying is the large amount of restructuring it does. In this section we study three techniques that reduce the amount of restructuring but preserve
Compartir