<!-- Generated by galley: do not edit by hand -->

{dm} was designed to make connecting to and working with an RDBMS as
straightforward as possible. To this end, a dm object can be created
from any database that has a {[DBI](https://dbi.r-dbi.org/)} backend
available ([see list](https://github.com/r-dbi/backends)).

When a dm object is created via a DBI connection to an RDBMS it can
import all the tables in the database, the active schema or a limited
set. For some RDBMS, such as Postgres and SQL Server, primary and
foreign keys are also imported and do not have to be manually added
afterwards.

To demonstrate, we will connect to a [relational dataset
repository](https://relational.fel.cvut.cz/) with a database server that
is publicly accessible without registration. It hosts a [financial
dataset](https://relational.fel.cvut.cz/dataset/Financial) that contains
loan data along with relevant account information and transactions. We
chose this dataset because the relationships between `loan`, `account`,
and `transactions` tables are a good representation of databases that
record real-world business transactions. The repository uses a MariaDB
server for which {dm} does not currently import primary or foreign keys,
so we will need to add them.

Below we open a connection to the publicly accessible database server
using their documented connection parameters. Connection details vary
from database to database. Before connecting to your own RDBMS you may
want to read `vignette("DBI", package = "DBI")` for further information.

``` r
library(RMariaDB)
my_db <- dbConnect(
  MariaDB(),
  username = "guest",
  password = "ctu-relational",
  dbname = "Financial_ijs",
  host = "relational.fel.cvut.cz"
)
```

Creating a dm object takes a single call to `dm_from_con()` with the DBI
connection object as its argument.

``` r
library(dm)

my_dm <- dm_from_con(my_db)
my_dm
#> ── Table source ───────────────────────────────────────────────────────────
#> src:  mysql  [guest@relational.fel.cvut.cz:NA/Financial_ijs]
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `accounts`, `cards`, `clients`, `disps`, `districts`, … (9 total)
#> Columns: 57
#> Primary keys: 0
#> Foreign keys: 0
```

The components of the `my_dm` object are lazy tables powered by
{[dbplyr](https://dbplyr.tidyverse.org/)}. {dbplyr} translates the
{[dplyr](https://dplyr.tidyverse.org/)} grammar of data manipulation
into queries the database server understands. Lazy tables defer
downloading of table data until results are required for printing or
local processing.

## Building a dm from a subset of tables

A dm can also be constructed from individual tables or views. This is
useful for when you want to work with a subset of a database’s tables,
perhaps from different schemas.

Below we use the `$` notation to extract two tables from the financial
database. Then we create our dm by passing the tables in as arguments.
Note that the tables arguments have to all be from the same source, in
this case `my_db`.

``` r
dbListTables(my_db)
#> [1] "accounts"  "cards"     "clients"   "disps"     "districts" "loans"    
#> [7] "orders"    "tkeys"     "trans"

library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dm':
#> 
#>     ident, sql
loans <- tbl(my_db, "loans")
accounts <- tbl(my_db, "accounts")

my_manual_dm <- dm(loans, accounts)
my_manual_dm
#> ── Table source ───────────────────────────────────────────────────────────
#> src:  mysql  [guest@relational.fel.cvut.cz:NA/Financial_ijs]
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `loans`, `accounts`
#> Columns: 11
#> Primary keys: 0
#> Foreign keys: 0
```

## Define Primary and Foreign Keys

Primary keys and foreign keys are how relational database tables are
linked with each other. A primary key is a column or column tuple that
has a unique value for each row within a table. A foreign key is a
column or column tuple containing the primary key for a row in another
table. Foreign keys act as cross references between tables. They specify
the relationships that gives us the *relational* database. For more
information on keys and a crash course on databases, see
`vignette("howto-dm-theory")`.

The [model
diagram](https://relational.fel.cvut.cz/assets/img/datasets-generated/financial.svg)
provided by our test database loosely illustrates the intended
relationships between tables. In the diagram we can see that the `loans`
table should be linked to the `accounts` table. Below we create those
links in 3 steps:

1.  Add a primary key `id` to the `accounts` table
2.  Add a primary key `id` to the `loans` table
3.  Add a foreign key `account_id` to the `loans` table referencing the
    `accounts` table

Then we assign colors to the tables and draw the structure of the dm.

Note that when the foreign key is created the primary key in the
referenced table does not need to be specified, but the primary key must
already be defined. And, as mentioned above, primary and foreign key
constraints on the database are currently only imported for Postgres and
SQL Server databases, and only when `dm_from_con()` is used. This
process of key definition needs to be done manually for other databases.

``` r
my_dm_keys <-
  my_manual_dm %>%
  dm_add_pk(accounts, id) %>%
  dm_add_pk(loans, id) %>%
  dm_add_fk(loans, account_id, accounts) %>%
  dm_set_colors(green = loans, orange = accounts)

my_dm_keys %>%
  dm_draw()
```

![](/home/kirill/git/R/dm/vignettes/out/howto-dm-db_files/figure-gfm/unnamed-chunk-4-1.png)<!-- -->

Once you have instantiated a dm object you can continue to add tables to
it. For tables from the original source for the dm, use `dm()`

``` r
trans <- tbl(my_db, "trans")

my_dm_keys %>%
  dm(trans)
#> ── Table source ───────────────────────────────────────────────────────────
#> src:  mysql  [guest@relational.fel.cvut.cz:NA/Financial_ijs]
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `loans`, `accounts`, `trans`
#> Columns: 21
#> Primary keys: 2
#> Foreign keys: 1
```

For tables from other sources or from the local environment
`dplyr::copy_to()` is used. `copy_to()` is discussed later in this
article.

## Transient nature of operations

Like other R objects, a dm is immutable and all operations performed on
it are transient unless stored in a new variable.

``` r
my_dm_keys
#> ── Table source ───────────────────────────────────────────────────────────
#> src:  mysql  [guest@relational.fel.cvut.cz:NA/Financial_ijs]
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `loans`, `accounts`
#> Columns: 11
#> Primary keys: 2
#> Foreign keys: 1

my_dm_trans <-
  my_dm_keys %>%
  dm(trans)

my_dm_trans
#> ── Table source ───────────────────────────────────────────────────────────
#> src:  mysql  [guest@relational.fel.cvut.cz:NA/Financial_ijs]
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `loans`, `accounts`, `trans`
#> Columns: 21
#> Primary keys: 2
#> Foreign keys: 1
```

And, like {dbplyr}, results are never written to a database unless
explicitly requested.

``` r
my_dm_keys %>%
  dm_flatten_to_tbl(loans)
#> Renamed columns:
#> * date -> loans.date, accounts.date
#> # Source:   lazy query [?? x 10]
#> # Database: mysql [guest@relational.fel.cvut.cz:NA/Financial_ijs]
#>       id account_id loans.date amount duration payments status district_id
#>    <int>      <int> <date>      <dbl>    <int>    <dbl> <chr>        <int>
#>  1  4959          2 1994-01-05  80952       24     3373 A                1
#>  2  4961         19 1996-04-29  30276       12     2523 B               21
#>  3  4962         25 1997-12-08  30276       12     2523 A               68
#>  4  4967         37 1998-10-14 318480       60     5308 D               20
#>  5  4968         38 1998-04-19 110736       48     2307 C               19
#>  6  4973         67 1996-05-02 165960       24     6915 A               16
#>  7  4986         97 1997-08-10 102876       12     8573 A               74
#>  8  4988        103 1997-12-06 265320       36     7370 D               44
#>  9  4989        105 1998-12-05 352704       48     7348 C               21
#> 10  4990        110 1997-09-08 162576       36     4516 C               36
#> # … with more rows, and 2 more variables: frequency <chr>,
#> #   accounts.date <date>

my_dm_keys %>%
  dm_flatten_to_tbl(loans) %>%
  sql_render()
#> Renamed columns:
#> * date -> loans.date, accounts.date
#> <SQL> SELECT `LHS`.`id` AS `id`, `account_id`, `loans.date`, `amount`, `duration`, `payments`, `status`, `district_id`, `frequency`, `accounts.date`
#> FROM (SELECT `id`, `account_id`, `date` AS `loans.date`, `amount`, `duration`, `payments`, `status`
#> FROM `loans`) `LHS`
#> LEFT JOIN (SELECT `id`, `district_id`, `frequency`, `date` AS `accounts.date`
#> FROM `accounts`) `RHS`
#> ON (`LHS`.`account_id` = `RHS`.`id`)
```

## Performing operations on tables by “zooming”

As the dm is a collection of tables, if we wish to perform operations on
an individual table we set it as the context for those operations using
`dm_zoom_to()`. See `vignette("tech-dm-zoom")` for more detail on
zooming.

dm operations are transient unless persistence is explicitly requested.
To make our chain of manipulations on the selected table permanent we
assign the result of `dm_insert_zoomed()` to a new object,
`my_dm_total`. This is a new dm object, derived from `my_dm_keys`, with
a new lazy table `total_loans` linked to the `accounts` table.

``` r
my_dm_total <-
  my_dm_keys %>%
  dm_zoom_to(loans) %>%
  group_by(account_id) %>%
  summarize(total_amount = sum(amount, na.rm = TRUE)) %>%
  ungroup() %>%
  dm_insert_zoomed("total_loans")
```

Context is set to the table “loans” using `dm_zoom_to(loans)`. You can
learn more about zooming in the tutorial `vignette("tech-dm-zoom")`. We
then use {[dplyr](https://dplyr.tidyverse.org/)} functions on the zoomed
table to generate a new summary table.

`summarize()` returns a temporary table with one row for each group
created by the preceding `group_by()` function. The columns in the
temporary table are constrained to the columns passed as arguments to
the `group_by()` function and the column(s) created by the `summarize()`
function.

`dm_insert_zoomed("total_loans")` adds the temporary table created by
`summarize()` to the data model under a new name, `total_loans`. Because
the grouping variable `account_id` is a primary key, the new derived
table is automatically linked to the `accounts` table.

``` r
my_dm_total %>%
  dm_set_colors(violet = total_loans) %>%
  dm_draw()
```

![](/home/kirill/git/R/dm/vignettes/out/howto-dm-db_files/figure-gfm/unnamed-chunk-9-1.png)<!-- -->

The resulting table `total_loans` can be accessed like any other table
in the dm object.

``` r
my_dm_total$total_loans
#> # Source:   lazy query [?? x 2]
#> # Database: mysql [guest@relational.fel.cvut.cz:NA/Financial_ijs]
#>    account_id total_amount
#>         <int>        <dbl>
#>  1          2        80952
#>  2         19        30276
#>  3         25        30276
#>  4         37       318480
#>  5         38       110736
#>  6         67       165960
#>  7         97       102876
#>  8        103       265320
#>  9        105       352704
#> 10        110       162576
#> # … with more rows
```

It is a *lazy table* powered by the
{[dbplyr](https://dbplyr.tidyverse.org/)} package: the results are not
materialized, instead an SQL query is built and executed each time the
data is requested.

``` r
my_dm_total$total_loans %>%
  sql_render()
#> <SQL> SELECT `account_id`, SUM(`amount`) AS `total_amount`
#> FROM `loans`
#> GROUP BY `account_id`
```

Use `compute()` on a zoomed table to materialize it to a temporary table
and avoid recomputing. See `vignette("howto-dm-copy")` for more details.

## Downloading data

When it becomes necessary to move data locally for analysis or
reporting, the {dm} method `collect()` is used. Operations on dm objects
for databases are limited to report only the first ten results.
`collect()` forces the evaluation of all SQL queries and the generation
of the complete set of results. The resulting tables are transferred
from the RDBMS and stored as local tibbles.

``` r
my_dm_local <-
  my_dm_total %>%
  collect()

my_dm_local$total_loans
#> # A tibble: 682 x 2
#>    account_id total_amount
#>         <int>        <dbl>
#>  1          2        80952
#>  2         19        30276
#>  3         25        30276
#>  4         37       318480
#>  5         38       110736
#>  6         67       165960
#>  7         97       102876
#>  8        103       265320
#>  9        105       352704
#> 10        110       162576
#> # … with 672 more rows
```

Use this method with caution. If you are not sure of the size of the
dataset you will be downloading you can call `dm_nrow()` on your dm for
the row count of your data model’s tables.

``` r
my_dm_total %>%
  dm_nrow()
#>       loans    accounts total_loans 
#>         682        4500         682
```

## Persisting results

It is just as simple to move a local relational model into an RDBMS as
is using `collect()` to download it. The method used is `copy_dm_to()`
and it takes as arguments a database connection and a dm object. In the
example below a local SQLite database is used to demonstrate it, but
{dm} is designed to work with any RDBMS supported by {DBI}.

``` r
destination_db <- DBI::dbConnect(RSQLite::SQLite())

deployed_dm <- copy_dm_to(destination_db, my_dm_local)

deployed_dm
#> ── Table source ───────────────────────────────────────────────────────────
#> src:  sqlite 3.35.5 []
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `loans`, `accounts`, `total_loans`
#> Columns: 13
#> Primary keys: 2
#> Foreign keys: 2
my_dm_local
#> ── Metadata ───────────────────────────────────────────────────────────────
#> Tables: `loans`, `accounts`, `total_loans`
#> Columns: 13
#> Primary keys: 2
#> Foreign keys: 2
```

In the output you can observe that the `src` for `deployed_dm` is the
SQLite database, while for `my_dm_local` the source is the local R
environment.

Persisting tables are covered in more detail in
`vignette("howto-dm-copy")`.

## Conclusion

In this tutorial we have demonstrated how simple it is to load a
database into a dm object and begin working with it. While loading a dm
from most RDBMS currently requires you to manually set key relations, dm
provides methods to make this straightforward. It is planned that future
versions of dm will support automatic key creation for more RDBMS.

The next step is to read `vignette("howto-dm-copy")`, where copying your
tables to and from an RDBMS is covered. `vignette("howto-dm-rows")`
discusses manipulation of individual rows in a database.

## Further reading

`vignette("howto-dm-df")` -- Is your data in local data frames? This
article covers creating a data model from your local data frames,
including building the relationships in your data model, verifying your
model, and leveraging the power of dplyr to operate on your data model.

`vignette("howto-dm-theory")` -- Do you know all about data frames but
very little about relational data models? This quick introduction will
walk you through the key similarities and differences, and show you how
to move from individual data frames to a relational data model.
