Wednesday, 18 January 2017

Spliterator in Java

Spliterators, like iterators, are for traversing the elements of a source. The source of elements covered by a Spliterator could be, for example, an array, a Collection, an IO channel, or a generator function.

As the name suggests, spliterator can split the source and iterate the splitted parts in parallel. That way a huge data source can be divided into small sized units that can be traversed and processed parallely.

Spliterator interface

Spliterator is a generic interface in Java defined as -

Interface Spliterator<T>

Where T is the type of elements returned by this Spliterator.

Spliterator methods

Though spliterator will increase performance by traversing the collection in parallel but you can also use spliterator even if you are not using parallel execution.

If you use iterator you have to use two methods hasNext() to ensure that there is next element and then next() method to use that element. Spliterator provides methods that combines these two methods into one and making it more convenient to use.

Some of the frequently used methods of Spliterator are -

  • tryAdvance() - If a remaining element exists, performs the given action on it, returning true; else returns false. Its form is -
    tryAdvance(Consumer<? super T> action)
    
    Here action is an instance of Consumer, which is a functional interface, it specifies the function that has to be applied on the next element while traversing the collection (or any other source).
  • forEachRemaining - Performs the given action for each remaining element, sequentially in the current thread, until all elements have been processed or the action throws an exception. Its form is -
    default void forEachRemaining(Consumer<? super T> action)
    
  • estimateSize() - Returns an estimate of the number of elements that would be encountered by forEachRemaining traversal, or returns Long.MAX_VALUE if infinite, unknown, or too expensive to compute. Its form is -
    long estimateSize()
    
  • trySplit() - If current spliterator can be partitioned a new spliterator is created, it partitions the elements of the source so that new spliterator traverse one of the partition while original spliterator traverses the other partition.
  • characteristics() - Returns a set of characteristics of this Spliterator and its elements.

Spliterator characteristics

A Spliterator also reports a set of characteristics() of its structure, source, and elements from among ORDERED, DISTINCT, SORTED, SIZED, NONNULL, IMMUTABLE, CONCURRENT, and SUBSIZED.

These characteristics are defined as constant fields in the Spliterator interface.

Read more about them here - https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#characteristics--

To see constant values - https://docs.oracle.com/javase/8/docs/api/constant-values.html#java.util.Spliterator

Using characteristics() method will give you a result represented as ORed values of the characterstics relevant for the given source.

Spliterator example code

If you have a list of names and you want to iterate it and print the names, using iterator it can be done like -

public class IteratorDemo {

    public static void main(String[] args) {
        List<String> nameList = Arrays.asList("Ram", "Sheila", "Mukesh", "Rani", "Nick", "Amy", "Desi", "Margo");
        Iterator<String> itr = nameList.iterator();
        while (itr.hasNext()) {
            System.out.println("name - " + itr.next());   
        }
    }
}

Same thing can be done using spliterator like this -

public class SpliteratorDemo {

    public static void main(String[] args) {
        List<String> nameList = Arrays.asList("Ram", "Sheila", "Mukesh", "Rani", "Nick", "Amy", "Desi", "Margo");
        Spliterator<String> splitStr = nameList.spliterator();
        while(splitStr.tryAdvance((n) -> System.out.println("name - " + n)));
    }

}

Here you can see you need to use only one method tryAdvance() which combines both hasNext() and next() methods of the iterator.

Using forEachRemaining

If you want to convert all the names to lowercase you can use forEachRemaining method.

import java.util.Arrays;
import java.util.List;
import java.util.Spliterator;

public class SpliteratorDemo {

    public static void main(String[] args) {
        List<String> nameList = Arrays.asList("Ram", "Sheila", "Mukesh", "Rani", "Nick", "Amy", "Desi", "Margo");
        Spliterator<String> splitStr = nameList.spliterator();
        splitStr.forEachRemaining( 
            (n) -> {
                String x = n.toLowerCase();
                System.out.println("" + x);
            }
        );
    }

}

Using trySplit

If you want to split the original spliterator so that you can traverse the element parallely.

import java.util.Arrays;
import java.util.List;
import java.util.Spliterator;

public class SpliteratorDemo {

    public static void main(String[] args) {
        List<String> nameList = Arrays.asList("Ram", "Sheila", "Mukesh", "Rani", "Nick", "Amy", "Desi", "Margo");
        Spliterator<String> splitStr = nameList.spliterator();
        Spliterator<String> splitStr2 = splitStr.trySplit();
        // Check if splitting actually happened, then use it
        if(splitStr2 != null){
            System.out.println("Spliterator-2");
            while(splitStr2.tryAdvance((n) -> System.out.println("name - " + n)));
        }
        // Original spliterator
        System.out.println("Original Spliterator");
        while(splitStr.tryAdvance((n) -> System.out.println("name - " + n)));
    }
        
}

Output

Spliterator-2
name - Ram
name - Sheila
name - Mukesh
name - Rani
Original Spliterator
name - Nick
name - Amy
name - Desi
name - Margo

Here you are splitting the spliterator make sure to check that splitting actually happened by checking for null.

Here note one thing, according to Java docs -

If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance(), as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun.

So make sure you first do the splitting then only start any operation on the elements.

You can split a spliterator into many partitions, if it can’t be split further null would be returned.

See an example here -

 static <T> void parEach(TaggedArray<T> a, Consumer<T> action) {
   Spliterator<T> s = a.spliterator();
   long targetBatchSize = s.estimateSize() / (ForkJoinPool.getCommonPoolParallelism() * 8);
   new ParEach(null, s, action, targetBatchSize).invoke();
 }

 static class ParEach<T> extends CountedCompleter<Void> {
   final Spliterator<T> spliterator;
   final Consumer<T> action;
   final long targetBatchSize;

   ParEach(ParEach<T> parent, Spliterator<T> spliterator,
           Consumer<T> action, long targetBatchSize) {
     super(parent);
     this.spliterator = spliterator; this.action = action;
     this.targetBatchSize = targetBatchSize;
   }

   public void compute() {
     Spliterator<T> sub;
     while (spliterator.estimateSize() > targetBatchSize &&
            (sub = spliterator.trySplit()) != null) {
       addToPendingCount(1);
       new ParEach<>(this, sub, action, targetBatchSize).fork();
     }
     spliterator.forEachRemaining(action);
     propagateCompletion();
   }
 }

That's all for this topic Spliterator in Java. If you have any doubt or any suggestions to make please drop a comment. Thanks!

Referrence - https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html


Related Topics

  1. Stream API in Java 8
  2. Java Stream API Examples
  3. Primitive type streams in Java Stream API
  4. Collecting in Java Stream API
  5. Lambda expressions in Java 8

You may also like -

>>>Go to Java advance topics page

No comments:

Post a Comment