R Program to Merge Two Dataframes

1. Introduction

In R, data frames are akin to tables in databases, spreadsheets, or datasets in other programming languages. One common operation when working with multiple data frames is merging them based on common columns (akin to a JOIN operation in SQL). The merge() function in R makes this operation straightforward.

2. Program Overview

1. We will create two sample data frames with a common column.

2. We will then merge these data frames using the merge() function based on the common column.

3. Code Program

# Loading necessary library
library(dplyr)

# Creating two sample data frames
df1 <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c('Alice', 'Bob', 'Charlie', 'David', 'Eva')
)

df2 <- data.frame(
  ID = c(4, 5, 6, 7, 8),
  Score = c(85, 90, 95, 80, 88)
)

# Merging data frames by ID
merged_df <- merge(df1, df2, by = "ID", all = TRUE)

# Displaying the merged data frame
print(merged_df)

Output:

  ID    Name Score
1  1   Alice    NA
2  2     Bob    NA
3  3 Charlie    NA
4  4   David    80
5  5     Eva    90
6  6    <NA>    95
7  7    <NA>    80
8  8    <NA>    88

4. Step By Step Explanation

- We first create two data frames, df1 and df2, each with a common column "ID".

- df1 contains names associated with IDs, while df2 contains scores associated with IDs.

- The merge() function is then used to join these data frames by the "ID" column.

- The argument all = TRUE ensures that all rows from both data frames are included in the merged data frame, even if there's no match in the other data frame (resulting in NAs for missing values). This behavior is akin to a FULL OUTER JOIN in SQL.

- The result shows rows where IDs matched between the two data frames (like ID 4 and 5) and where they didn't, resulting in NAs for the corresponding missing values.

Comments