Java Program to Remove Duplicate Words from a String

Introduction

Removing duplicate words from a string is a common text-processing task. This can be useful in various scenarios, such as cleaning up user input, preparing data for analysis, or simply improving the readability of text. In this blog post, we'll explore how to remove duplicate words from a string using traditional methods as well as Java 8 features.

Table of Contents

  1. Using a Traditional Approach
  2. Using Java 8 Streams
  3. Complete Example Program
  4. Conclusion

1. Using a Traditional Approach

The traditional approach involves using a HashSet to store words as we iterate through the string. Since a HashSet does not allow duplicate values, which helps remove duplicates.

Example:

import java.util.HashSet;
import java.util.Set;

public class RemoveDuplicateWordsTraditional {
    public static void main(String[] args) {
        String input = "Java is great and Java is fun and Java is powerful";

        String result = removeDuplicateWords(input);

        System.out.println("Original String: " + input);
        System.out.println("String after removing duplicates: " + result);
    }

    public static String removeDuplicateWords(String input) {
        String[] words = input.split("\\s+");
        Set<String> wordSet = new HashSet<>();
        StringBuilder result = new StringBuilder();

        for (String word : words) {
            if (!wordSet.contains(word)) {
                wordSet.add(word);
                result.append(word).append(" ");
            }
        }

        return result.toString().trim();
    }
}

Output:

Original String: Java is great and Java is fun and Java is powerful
String after removing duplicates: Java is great and fun powerful

2. Using Java 8 Streams

Java 8 Streams provide a modern and concise way to handle this task. We can use streams to filter out duplicate words and then join the result back into a string.

Example:

import java.util.Arrays;
import java.util.LinkedHashSet;
import java.util.Set;
import java.util.stream.Collectors;

public class RemoveDuplicateWordsStreams {
    public static void main(String[] args) {
        String input = "Java is great and Java is fun and Java is powerful";

        String result = removeDuplicateWords(input);

        System.out.println("Original String: " + input);
        System.out.println("String after removing duplicates: " + result);
    }

    public static String removeDuplicateWords(String input) {
        Set<String> wordSet = Arrays.stream(input.split("\\s+"))
                                    .collect(Collectors.toCollection(LinkedHashSet::new));
        return String.join(" ", wordSet);
    }
}

Output:

Original String: Java is great and Java is fun and Java is powerful
String after removing duplicates: Java is great and fun powerful

Explanation:

  • Traditional Approach:

    • Split the input string into words using split("\\s+").
    • Use a HashSet to store unique words.
    • Iterate through the words, adding each unique word to the HashSet and appending it to the result string.
    • Trim the result string to remove any trailing spaces.
  • Java 8 Streams:

    • Split the input string into a stream of words using Arrays.stream(input.split("\\s+")).
    • Collect the words into a LinkedHashSet to maintain insertion order while removing duplicates.
    • Join the words back into a single string using String.join(" ", wordSet).

3. Complete Example Program

Here is a complete program that demonstrates both methods to remove duplicate words from a string.

Example Code:

import java.util.Arrays;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Set;
import java.util.stream.Collectors;

public class RemoveDuplicateWordsExample {
    public static void main(String[] args) {
        String input = "Java is great and Java is fun and Java is powerful";

        // Using Traditional Approach
        String resultTraditional = removeDuplicateWordsTraditional(input);
        System.out.println("Using Traditional Approach:");
        System.out.println("Original String: " + input);
        System.out.println("String after removing duplicates: " + resultTraditional);

        // Using Java 8 Streams
        String resultStreams = removeDuplicateWordsStreams(input);
        System.out.println("\nUsing Java 8 Streams:");
        System.out.println("Original String: " + input);
        System.out.println("String after removing duplicates: " + resultStreams);
    }

    public static String removeDuplicateWordsTraditional(String input) {
        String[] words = input.split("\\s+");
        Set<String> wordSet = new HashSet<>();
        StringBuilder result = new StringBuilder();

        for (String word : words) {
            if (!wordSet.contains(word)) {
                wordSet.add(word);
                result.append(word).append(" ");
            }
        }

        return result.toString().trim();
    }

    public static String removeDuplicateWordsStreams(String input) {
        Set<String> wordSet = Arrays.stream(input.split("\\s+"))
                                    .collect(Collectors.toCollection(LinkedHashSet::new));
        return String.join(" ", wordSet);
    }
}

Output:

Using Traditional Approach:
Original String: Java is great and Java is fun and Java is powerful
String after removing duplicates: Java is great and fun powerful

Using Java 8 Streams:
Original String: Java is great and Java is fun and Java is powerful
String after removing duplicates: Java is great and fun powerful

4. Conclusion

Removing duplicate words from a string can be efficiently achieved using traditional approaches and Java 8 Streams. The traditional approach is straightforward and easy to understand, while Java 8 Streams provides a more modern and concise way to handle the task. Both methods ensure that the resulting string contains only unique words, maintaining the order of their first appearance in the input string.

By understanding these different methods, you can choose the one that best fits your needs and coding style. Happy coding!

Comments