Java Program to Find Duplicate Words in a String

Introduction

Finding duplicate words in a string is a common task in text processing and data analysis. This guide will show you how to create a Java program that identifies and displays duplicate words in a given string.

Problem Statement

Create a Java program that:

  • Takes a string as input.
  • Finds and displays all duplicate words in the string.

Example 1:

  • Input: "This is a test. This test is easy."
  • Output: Duplicate words: This, is, test

Example 2:

  • Input: "Java is great and Java is powerful"
  • Output: Duplicate words: Java, is

Solution Steps

  1. Prompt for Input: Use the Scanner class to read a string input from the user.
  2. Split the String into Words: Use the split() method to divide the string into individual words.
  3. Use a HashMap to Track Word Frequencies: Iterate through the array of words and store the frequency of each word in a HashMap.
  4. Identify Duplicate Words: Traverse the HashMap to identify words with a frequency greater than 1.
  5. Display the Duplicate Words: Print the duplicate words found in the string.

Java Program

Java Program to Find Duplicate Words in a String

import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;

/**
 * Java Program to Find Duplicate Words in a String
 * Author: https://www.javaguides.net/
 */
public class DuplicateWordsInString {

    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        // Step 1: Prompt the user for input
        System.out.print("Enter a string to find duplicate words: ");
        String input = scanner.nextLine();

        // Step 2: Split the string into words
        String[] words = input.toLowerCase().split("\\W+");

        // Step 3: Use a HashMap to track word frequencies
        Map<String, Integer> wordCountMap = new HashMap<>();

        for (String word : words) {
            wordCountMap.put(word, wordCountMap.getOrDefault(word, 0) + 1);
        }

        // Step 4: Identify and display duplicate words
        System.out.print("Duplicate words: ");
        boolean hasDuplicates = false;
        for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
            if (entry.getValue() > 1) {
                System.out.print(entry.getKey() + " ");
                hasDuplicates = true;
            }
        }

        if (!hasDuplicates) {
            System.out.print("No duplicates found.");
        }

        System.out.println();
    }
}

Explanation

  • Input: The program prompts the user to enter a string.
  • Splitting the String: The string is converted to lowercase and split into individual words using the split() method. The regular expression "\\s+" is used to match any whitespace characters (spaces, tabs, etc.), ensuring that all words are separated correctly.
  • Tracking Word Frequencies: The program uses a HashMap to count the occurrences of each word in the array. The getOrDefault method simplifies the process of incrementing the count.
  • Finding Duplicates: The program then iterates over the HashMap to identify words with a frequency greater than 1, indicating that they are duplicates.
  • Output: The program prints the duplicate words found in the string. If no duplicates are found, it prints a message indicating this.

Output Example

Example 1:

Enter a string to find duplicate words: This is a test. This test is easy.
Duplicate words: this is test 

Example 2:

Enter a string to find duplicate words: Java is great and Java is powerful
Duplicate words: java is 

Example 3:

Enter a string to find duplicate words: Hello world hello
Duplicate words: hello 

Example 4:

Enter a string to find duplicate words: I love programming in Java
Duplicate words: No duplicates found.

Conclusion

This Java program effectively identifies duplicate words in a string using a HashMap to track word frequencies. By counting the occurrences of each word and then filtering out those with a frequency greater than 1, the program efficiently finds and displays any duplicates. This approach is practical for various text-processing tasks, such as cleaning up data or analyzing textual content.

Comments