Java program to Count Number of Duplicate Words in String

In this blog post, we will write a Java program that counts the number of duplicate words in a given string.

Duplicate words are words that appear more than once in the string. This program utilizes data structures and string manipulation techniques to achieve the desired result. Let's dive into the code and see how it works!

Java Program to Count Number of Duplicate Words in a Given String

In this program, please refer to the comments, each line of code is explained via comments. 
import java.util.HashMap;

import java.util.Map;
import java.util.Set;

/**
 * Java program to count number of duplicate words in given string
 * @author javaguides.net
 *
 */
public class DuplicateWordsInString {
    private static void duplicateWords(String inputString) {
        // Splitting inputString into words
        final String[] words = inputString.toLowerCase().split("\\W+");

        // Creating one HashMap with words as key and their count as value
        final Map < String, Integer > wordCount = new HashMap < String, Integer > ();

        // Checking each word
        for (String word: words) {
            // whether it is present in wordCount
            if (wordCount.containsKey(word)) {
                // If it is present, incrementing it's count by 1
                wordCount.put(word, wordCount.get(word) + 1);
            } else {
                // If it is not present, put that word into wordCount with 1 as
                // it's value
                wordCount.put(word, 1);
            }
        }

        // Extracting all keys of wordCount
        final Set < String > wordsInString = wordCount.keySet();

        // Iterating through all words in wordCount

        for (String word: wordsInString) {
            // if word count is greater than 1

            if (wordCount.get(word) > 1) {
                // Printing that word and it's count
                System.out.println(word + " : " + wordCount.get(word));
            }
        }
    }

    public static void main(String[] args) {

        duplicateWords("java guides java");

        duplicateWords("Java is java again java");

        duplicateWords("Super Man Bat Man Spider Man");
    }
}

Output:

java : 2
java : 3
man : 3

Explanation:

1. The string is split into words using the split() method, which uses the regular expression \\W+ to split the string based on non-word characters (e.g., punctuation, spaces).
 
2. We used HashMap to store the key-value pair that is a word with its count. 

3. Used containsKey() method of HashMap to check whether the word is present or not. If HashMap contains a word then increment its value by 1 and If a word is not present, put that word as key and value as 1.

4. Finally, iterate over the HashMap keyset and check with each key's value, if it is greater than 1 then it is a duplicate word. 
Live

Live

Conclusion

In this blog post, we explored a Java program that counts the number of duplicate words in a given string. By utilizing data structures like HashMap, we can efficiently store word frequencies and determine the duplicates. This program serves as a useful tool for analyzing text and identifying repetitive words. 

Feel free to modify the inputString variable to test the program with different texts. Happy coding!

Comments

  1. for duplicateWords("Super Man Bat Man Spider Man");
    the output should be 3
    but it is coming as 2
    can you check.
    word.toLowerCase() is causing issue

    ReplyDelete
    Replies
    1. You are correct. Due to toLowerCase() the result was calculating wrong. Fixed it. Thanks for reporting.

      Delete
  2. Where exactly on the program are supposed to to put the toLowerCase() ?

    ReplyDelete

Post a Comment

Leave Comment