Java program to Count Number of Duplicate Words in String

Introduction

Counting the number of duplicate words in a string is a common task in text processing. Whether you're analyzing text data, cleaning up user inputs, or performing any other kind of text manipulation, understanding how to identify and count duplicate words can be very useful. In this blog post, we will walk you through the steps to create a Java program that counts the number of duplicate words in a given string.

Steps to Solve the Problem

  1. Normalize the String: Convert the string to lowercase to ensure the comparison is case-insensitive.
  2. Split the String: Use a regular expression to split the string into words.
  3. Use a Map: Use a HashMap to store each word and its count.
  4. Count Duplicates: Iterate through the map to count and display duplicate words.

Example Program

Here is a complete Java program that counts the number of duplicate words in a string.

Example Code:

import java.util.HashMap;
import java.util.Map;

public class DuplicateWordCounter {
    public static void main(String[] args) {
        String input = "Java is great and Java is fun. Programming in Java is great.";

        // Normalize the string by converting it to lower case
        String normalizedInput = input.toLowerCase();

        // Split the string into words using a regular expression
        String[] words = normalizedInput.split("\\W+");

        // Use a HashMap to store each word and its count
        Map<String, Integer> wordCountMap = new HashMap<>();

        // Count the occurrences of each word
        for (String word : words) {
            if (wordCountMap.containsKey(word)) {
                wordCountMap.put(word, wordCountMap.get(word) + 1);
            } else {
                wordCountMap.put(word, 1);
            }
        }

        // Display the duplicate words and their counts
        System.out.println("Duplicate words in the string:");
        for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
            if (entry.getValue() > 1) {
                System.out.println(entry.getKey() + ": " + entry.getValue());
            }
        }
    }
}

Output:

Duplicate words in the string:
java: 3
is: 3
great: 2

Explanation

  1. Normalize the String:

    • The input string is converted to lower case using toLowerCase() to make the comparison case-insensitive.
  2. Split the String:

    • The string is split into words using the regular expression \\W+, which matches any non-word character. This ensures that punctuation and other non-word characters are removed.
  3. Use a HashMap:

    • A HashMap is used to store each word and its count. The containsKey() method checks if a word is already in the map, and if so, increments its count. Otherwise, it adds the word to the map with a count of 1.
  4. Count Duplicates:

    • The program iterates through the map entries using an enhanced for loop. It checks if the count of a word is greater than 1 and prints the word and its count if it is a duplicate.

Conclusion

This Java program efficiently counts the number of duplicate words in a string by leveraging the HashMap data structure. This approach ensures that all words are treated equally regardless of their case and punctuation, providing an accurate count of duplicate words. This method can be adapted and extended for various text processing needs, making it a valuable tool for Java developers.

By understanding and implementing this program, you can handle text data more effectively, making your applications more robust and user-friendly. Happy coding!

Comments

  1. for duplicateWords("Super Man Bat Man Spider Man");
    the output should be 3
    but it is coming as 2
    can you check.
    word.toLowerCase() is causing issue

    ReplyDelete
    Replies
    1. You are correct. Due to toLowerCase() the result was calculating wrong. Fixed it. Thanks for reporting.

      Delete
  2. Where exactly on the program are supposed to to put the toLowerCase() ?

    ReplyDelete

Post a Comment

Leave Comment