Free Java Tutorials >> Table of contents >> Collections

3. Sets

What is a Set?

A set is a collection of items where you are only allowed one of each item. For example, a set cannot be {1,2,2,3} because there are two 2's. Most Set's in Java also have no sense of order, so one would treat {1,2,3} as the same as {3,1,2}.

Declaring a HashSet

Like ArrayList, HashSet uses angle brackets to specify the inner type:

HashSet<Integer> s = new HashSet<Integer>();
s.add(5);
s.add(5); //no effect
s.add(42);
for( Integer v : s ) {
  println(v); 
}
//prints 5 and 42, although not necessarily in that order.

Hash sets also have remove(), clear(), and size() functions that are similar to that of ArrayList. A full list of functions (methods) can be found here (Java 6 API Doc): http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html

Why use a hashset? The primary advantage of a hash set is that you can check if it contains() an item very quickly. Unlike arrays and ArrayLists, a HashSet does not need to loop through all items to check if an item is already present.

LinkedHashSet and TreeSet

A linked hash set preserves the order that items are inserted in. A TreeSet, in contrast, keeps items sorted. Compare the output of these three sets:

Set<Integer> a = new HashSet<Integer>();
a.add(30);
a.add(10);
a.add(20);
a.add(40);
Set<Integer> b = new TreeSet<Integer>();
b.add(30);
b.add(10);
b.add(20);
b.add(40);
Set<Integer> c = new LinkedHashSet<Integer>();
c.add(30);
c.add(10);
c.add(20);
c.add(40);
System.out.println(a); //"random" order (using object.hashcode)
System.out.println(b); //sorted order (using object.compareTo)
System.out.println(c); //original order 

Changing set behavior

Hash Sets use both the .equals method and the hashCode method to determine if two items are equivalent. Thus, you must override these methods in your own classes if you want them to work properly in a Set. Otherwise, by default Sets will operator on the memory address of the object.

TreeSets do not use the hashcode method at all, and they instead use the compareTo of the class. Alternatively, you can pass a comparator to the constructor of the treeset. We will cover this type of functionality more in the inheritance section of this book.

4. Maps

What is a map?

A map, sometimes called dictionary, is a collection of pairs of items. For example, we may store the pair "Avatar Aang" with 12 (years old), "Bilbo Baggins" with 131, etc. In this example, the first item in each pair is a String. The first item in each pair of a Map is called the Key. The second item in each pair is called the Value. What makes a Map different than just having two ArrayLists are two properties: First, each key must be unique (you can't have Avatar Aang twice). Secondly, you get values by giving it a key rather than an index. An example:

//Here we define ages as a Map from keys of Strings to values of Integers:
HashMap<String,Integer> ages = new HashMap<String,Integer>();
ages.put("Avatar Aang",13);
ages.put("Avatar Aang",113);      //overwrites the first one!
ages.put("Bilbo Baggins",131);
ages.put("Arthur Dent",-1000);
System.out.println(ages.size());             //3, Aang, Bilbo and Arthur

System.out.println(ages.get("Avatar Aang")); //outputs 113
System.out.println(ages.get("Arthur Dent")); //-1000

The important difference is that "put" adds items in, and the function takes both the key and value as arguments. The get function, unlike an array, takes a key as an argument rather than index.

Looping through a HashMap

//continuing with ages:
HashMap<String,Integer> ages = new HashMap<String,Integer>();
ages.put("Avatar Aang",13);
ages.put("Avatar Aang",113);      //overwrites the first one!
ages.put("Bilbo Baggins",131);
ages.put("Arthur Dent",-1000);

//To loop through all keys, we need a for-each over keyset:
for(String key : ages.keySet()) {
  System.out.println(key); //prints Avatar Aang, Bilbo Baggins, and Arthur Dent
}

//Loop through each value:
for(Integer value : ages.values()) {
  System.out.println(value);
}

//You can also output values with keys with get
for(String key : ages.keySet()) {
  System.out.println(key+" is "+ages.get(key)); 
}
//This last outputs:
//Avatar Aang is 113
//Bilbo Baggins is 131
//Arthur Dent is -1000

More functions ("methods") can be found at http://docs.oracle.com/javase/6/docs/api/java/util/HashMap.html

LinkedHashMap and TreeMap

A linked hash map preserves the original order that items are added into the map. In practice, this means that the keySet of a LinkedHashMap is a LinkedHashSet.

A tree map sorts all items in the map by their key's compareTo. The keySet of a TreeMap is a TreeSet.

1. Lists

A List is similar to a resizable array, in that it has a number of elements each stored at a position (an index) counting from 0 up. However, Java's arrays cannot be resized. You cannot add things to the beginning middle or end of an array to increase the length. You also cannot remove items. Since there are many situations where we many need a variable number of items, we will introduce Java's built in List here.

The ArrayList class

The syntax for an arraylist is much more complicated than for an array. We could do int[] x = {10,20,30,40,50}; to create an integer array. With an ArrayList, this would be the code:

ArrayList<Integer> x = new ArrayList<Integer>();
x.add(5);  //puts 5 in
x.add(4);  //4 is now in the second position, index 1
x.add(42); //42 is in index 2
System.out.println(x.get(0)); //prints index 0, which is 5
System.out.println(x.get(1)); //prints 4
System.out.println(x.get(2)); //prints 42

Note that you can add into a specific index (rather than the end) with list.add(index, value).

You can change the value of an element with list.set(index, newValue).

Looping through an ArrayList

Looping through an arraylist is similar to looping through an array. However, the .size() method returns the length of the list. The .get method allows you to retrieve elements from the list (using their index as an argument):

ArrayList<Integer> y = new ArrayList<Integer>();
y.add(5);  
y.add(0,1000);  //puts 1000 in the first position, moving 5 to position 1
y.add(42);      //now the list is 1000,5,42
System.out.println(y.size()); //prints 3, the number of items
//notice that for arraylists, we use y.size()
//arrays used bla.length without the parenthesis

//Let's output everything in the ArrayList y:
for(int i=0;i<y.size();i++) {
    System.out.println(y.get(i)); //similar to println(bla[i])
}

Removing one or all items

If you have the arraylist x, then the function x.remove(i) will remove the item at position i. Items with higher indices will be moved down. Thus if you remove 42 from {10,42,20}, the arraylist becomes {10,20}

To remove all elements, use x.clear()

Searching a List

Lists like arraylist require a loop to search for an item. However, a convenience method called "contains" will perform the loop for you and return a boolean:

	ArrayList<Integer> y = new ArrayList<Integer>();
	y.add(5);  
	y.add(0,1000);  //puts 1000 in the first position, moving 5 to position 1
	y.add(42);      //now the list is 1000,5,42
	System.out.println(y.contains(42)); //true
	System.out.println(y.contains(24)); //false

The LinkedList class

The java.util.LinkedList class contains many of the same methods as ArrayList. However, the performance characteristics are very different. LinkedLists require a loop internally to retrieve an item at a particular index, unlike array lists. However, linked lists are faster when removing items in the middle during an "iterator loop". LinkedLists can also add items more quickly to the middle when using the "iterator loop". We will see how to build linked lists in a later section.

addAll(),removeAll,Collections constructors

All of the Java Collections classes have convenience methods for interacting with other collections. List and Set are the most commonly used Collections, as well as the .values() of a Map. The methods include:

  • .addAll(Collection c): Adds all items from the collection c into this collection.
  • .containsAll(Collection c): Returns true if this contains all elements in c.
  • .removeAll(Collection c): Removes all items in c from this.
  • .retainAll(Collection c): Removes all items not in c from this.
  • The constructor(Collection c): similar to a default constructor combined with .addAll.

2. For-each, Iterators

Two for-each loop examples

Both arrays and ArrayLists support a for loop that is shorter to write called a for-each loop. This is best shown through example:

int[] x = {10,20,30,40};
ArrayList y = new ArrayList();
y.add("apple");
y.add("bear");
y.add("cow");

for(int v : x) { //read "for each int v in x"
  println(v);
} //prints 10 20 30 40

for(String v : y) { //read "for each String v in y"
  println(v);
} //prints "apple" "bear" "cow"

The colon ":" is read "in". In both of these examples, the for each loop creates a variable called v that cycles between each value in the array/arraylist.

What are the advantages / disadvantages of a for each loop?

There are three primary advantages of a for each loop. First, a for each loop is shorter to write. Secondly, some data structures like the Set and Map structures do not store a position for each item. Because there's no way to "get index 0", we cannot use a traditional for loop. Finally, we will eventually learn that we can make our own variable types that contain other variables. We can get these types to also support a for each loop.

The primary disadvantage of for each loops is that we have no variable to keep track of index.

Advanced students should note that all Java classes that extend Iterable support the for each loop.

The iterator loop and remove

The for each loop is compiled into an iterator loop underneath:

ArrayList list = new ArrayList();
list.add("apple");
list.add("bear");
list.add("cow");
for (Iterator iter = list.iterator(); iter.hasNext(); ) {
	String element = iter.next();
	System.out.println(element);
	
	//Optionally remove items:
	if(element.equals("bear")) {
		iter.remove();
	}
}
System.out.println(list); //Now list has no bear

As you can see from the above example, you may use the .remove() method. This is the only way to remove items from sets and maps within a traversal loop.

Previous: Generics

Next: Collections Algorithms