Apache Calcite is a query optimizer that does not have a storage system or any other aspect related to RDBMS. That way, it can be attached to engines like Cassandra and MySQL.

Today, we are going to clone it, introduce it with a CSV example, then we will try to build a basic tree.

First we used calcite’s examples, after that. we tried to create our own

Table of Contents

The Setup

First, we clone it like(Git installation https://github.com/git-guides/install-git):

cd d:\\your_directory
#now we are in so we clone
git clone <https://github.com/apache/calcite.git>
# after that, in that directory we can build using the gradle wrapper, this is going
# to take a while 
./gradlew build

After that, we will have our Calcite installed. Now, we need Java 8 - 21 to run Calcite and IntelliJ IDEA for the demo.

Now, reset git and enter this directory:

cd d:\\yourdirectory\\calcite\\example\\csv
# some of the versions might not have it so make sure you have that directory while trying
# to run this demo 
# after that we need to run sqlline, which is a shell a command line that is used 
# to run sql in JDBC connections here it is installed already, but if you wan the githyb
# <https://github.com/julianhyde/sqlline>
./sqlline

Now, here we can run SQL queries on CSV files using calcite as a query optimizer,

Calicte has these things called adapters, which are basically a connection to the source(database, here it is a CSV file, but I could be anything, like MySQL), so we could run queries on it.

So, these are the connections to the source, meaning we have to connect, and to do that, we need:

!connect jdbc:calcite:model=src/test/resources/model.json admin admin

The connect is a connect command line

JDBC is used to specify the kind of adapter to use, here it is calcite, and then we specify the path to the file, here it is a JSON file created by Apache, which is the model.json

Now we can run queries, like:

0: jdbc:calcite:model=src/test/resources/mode> select* from emps;
+-------+-------+--------+--------+---------------+-------+------+---------+---+
| EMPNO | NAME  | DEPTNO | GENDER |     CITY      | EMPID | AGE  | SLACKER | M |
+-------+-------+--------+--------+---------------+-------+------+---------+---+
| 100   | Fred  | 10     |        |               | 30    | 25   | true    | f |
| 110   | Eric  | 20     | M      | San Francisco | 3     | 80   |         | f |
| 110   | John  | 40     | M      | Vancouver     | 2     | null | false   | t |
| 120   | Wilma | 20     | F      |               | 1     | 5    |         | t |
| 130   | Alice | 40     | F      | Vancouver     | 2     | null | false   | t |
+-------+-------+--------+--------+---------------+-------+------+---------+---+
5 rows selected (0.978 seconds)

So, we ran a regular select* on the EMPS table, and we got these, so we have tables, rows, etc.

There are a lot of SQL commands we can run, like:

0: jdbc:calcite:model=src/test/resources/mode> explain plan for select* from emps;
+---------------------------------------------+
|                    PLAN                     |
+---------------------------------------------+
| EnumerableTableScan(table=[[SALES, EMPS]])
 |
+---------------------------------------------+
1 row selected (0.024 seconds)

So, here we asked for the estimated execution plan, and we got

For more tutorials like this, check out the docs.

And, for more on the basic architecture of Calcite.

And, for more on query optimizers, check out our blog here

RelBuilder: building a logical tree

Now, in the previous example, we had an engine built for us, what if we wanted to build a logical tree

One of the key blocks of that is logical operators, now RelBuilder is a simplified way of showing how it does that,

Now in calicte\core\src\test\java\org.apache.calcite\examples\RelBuilderExample

We can find the docs demo(git)

There, they did like the following:

package org.apache.calcite.examples;

//First it starts with the imports
import org.apache.calcite.plan.RelOptUtil;
import org.apache.calcite.rel.RelNode;
import org.apache.calcite.rel.core.JoinRelType;
import org.apache.calcite.sql.fun.SqlStdOperatorTable;
import org.apache.calcite.test.RelBuilderTest;
import org.apache.calcite.tools.FrameworkConfig;
import org.apache.calcite.tools.RelBuilder;

We have:

RelOptUtil: gives us the tools to troubleshoot and see the plans
RelNode: the class that contains the nodes of the logical operators
JoinRelType: we need it to specify the join type like inner or outer etc.
SqlStdOperatorTable: contains the standard operators, like equality operators, for example, “=”
RelBuilderTest: contains the in-memory table that we need, here it is the SCOTT database
FrameworkConfig: it is what mixes these together, the glue
RelBuilder: a way to connect the nodes created by RelNode

After that, it declares the class for the example:

public class RelBuilderExample {
  private final boolean verbose;

So, if it is true, it would output it, which is what we want here, since we want to see the relational expressions

public RelBuilderExample(boolean verbose) {
    this.verbose = verbose;
  }

In this constructor, it defines that the variable verbose should equal this verbose; nothing new here

public static void main(String[] args) {
    new RelBuilderExample(true).runAllExamples();
  }

Now, here they create a method that could use command-line arguments,

This method calls the RelBuilderExample we created in the previous line, and sets verbose = true, which means it will run all the examples.

After that, it calls another method, runAllExamples. which would pass the examples

 public void runAllExamples() {
   
    final FrameworkConfig config = RelBuilderTest.config().build();
    final RelBuilder builder = RelBuilder.create(config);
    for (int i = 0; i <= 4; i++) {
      doExample(builder, i);
      final RelNode node = builder.build();
      if (verbose) {
        System.out.println(RelOptUtil.toString(node));
      }
    }
  }

Now, here it declares the method we saw before, runAllExamples()

Now here it is, calling FrameworkConfig where it configures the RelBuilder stuff using the RelBuilderTest and builds it.

Then it calls the RelBuilder, where builder = calling it using create, where the schema is coming from the config

Now, here it loops for each example0, exmaple1 which we can see as a parameters(builder,i),

After that, we see that it calls RelNode to make sure that it would create a node for each logical operator

if we have the verbose true, now here it prints out the utility we imported, specifically the method to string, so we could see the logical operators that we have built:

  private RelBuilder doExample(RelBuilder builder, int i) {
    switch (i) {
    case 0:
      return example0(builder);
    case 1:
      return example1(builder);
    case 2:
      return example2(builder);
    case 3:
      return example3(builder);
    case 4:
      return example4(builder);
    default:
      throw new AssertionError("unknown example" + i);
    }
  }

Now here it calls doExample with the parameters builder and i, based on each i value

If there is no example, the verbose gets false, and no printing

If there is an example, but the example is not defined here, then it outputs the unknown example

Caclite’s first example:

/* Creates a relational expression
   *  equivalent to
   * <blockquote><pre>VALUES(1,TRUE),
   * (NULL,FALSE)</pre></blockquote>
   * this is our fix, we submitted that as a pull request
*/
 private RelBuilder example0(RelBuilder builder) {
    return builder
        .values(new String[] {"a", "b"}, 1, true, null, false);
  }

Here it calls RelBuilders, it defines the Example, inputs RelBuilder builder we defined as create previously as a parameter, it returns the builder, so whatever we create we would see as output, which would be printed if it fits the conditions we provided previously

Now here we are defining example0 as a call to the method .values then we are defining the columns to the call as a and b, which has 1 and null for a, and true and false for b

this equals

values(1,true),(null,false)

If you run the program, you can see the output as:

LogicalValues(tuples=[[{ 1, true }, { null, false }]])

So, you can see the logical operator for this,

Calcite’s example1: creating a table scan

 private RelBuilder example1(RelBuilder builder) {
    return builder
        .scan("EMP");
  }

So, it calls the RelBuilder, creates example1, inputs RelBuilder builder, it outputs the builder with a call to the scan with the table EMP as a parameter,

We could see in the output the following:

LogicalTableScan(table=[[scott, EMP]])

This equals:

select * from EMP

Calcite’s example2: the projection

The projection filters the table scan, the example:

  private RelBuilder example2(RelBuilder builder) {
    return builder
        .scan("EMP")
        .project(builder.field("DEPTNO"), builder.field("ENAME"));
  }

It does the same as before, but here it selects DEPTNO and ENAME, meaning it filters the results by these, meaning:

select DEPTNO,ENAME FROM EMP

Calcite’s example3: group by, simple aggregates, and having

  private RelBuilder example3(RelBuilder builder) {
    return builder
        .scan("EMP")
        .aggregate(builder.groupKey("DEPTNO"),
            builder.count(false, "C"),
            builder.sum(false, "S", builder.field("SAL")))
        .filter(
            builder.call(SqlStdOperatorTable.GREATER_THAN, builder.field("C"),
                builder.literal(10)));
  }

SO:

It builds a scan
Two aggregates, one count and one sum, each of them had an alias, for the first is C and the Second is S
The groupKey = group by DEPTNO
The builder.field SAL, salaries meaning the count and the sum, where salary is grouped by deptno
Now the Filter is the having clause, it gets from SqlStdOperatorTable > and builds based on the count or C, so here it filters by C
where the count> 10 or build.literal(10)

So it equals:

select
    deptno
    count(*) c,
    sum(sal) s,
from 
EMP
group by deptno
having c>10

Calcite’s example4: Joins

  private RelBuilder example4(RelBuilder builder) {
    final RelNode left = builder
        .scan("EMP")
        .scan("DEPT")
        .join(JoinRelType.INNER, "DEPTNO")
        .build();

    final RelNode right = builder
        .scan("EMP")
        .scan("BONUS")
        .join(JoinRelType.INNER, "ENAME")
        .build();

    return builder
        .push(left)
        .push(right)
        .join(JoinRelType.INNER, "ENAME");
  }
}

So:

On left, it called a node for a join
Scans both tables
Then an inner join between them based on the column DEPTNO
The same goes for the right
Then it joins both left and right in the new builder variable inside the builder
joins both on ENAME

equals this:

select * 
from (emp join dept on deptno) as left 
join (emp join bonus on ename) as right 
on left.ename = right.ename;

And the output was like:

LogicalJoin(condition=[=($1, $12)], joinType=[inner])
  LogicalJoin(condition=[=($7, $8)], joinType=[inner])
    LogicalTableScan(table=[[scott, EMP]])
    LogicalTableScan(table=[[scott, DEPT]])
  LogicalJoin(condition=[=($1, $8)], joinType=[inner])
    LogicalTableScan(table=[[scott, EMP]])
    LogicalTableScan(table=[[scott, BONUS]])

or something like that, and right and left here are internal, they are replaced by $etc. here, that we can see, but if we want to see the convention that we want, we need a new import, which complicates a complicated example

So here, for each, it substitutes with an ordinal reference indexed based on the node number for each, which is $1, and the other stuff

What if we want to create our own examples:

Re-Creating RelBuilderExample: RelBuilderExample1

Copy and paste this file(https://github.com/sulees/calcite-relbuilder-examples/blob/main/RelBuilderExample1.java) in a new class you create in the same directory as RelBuilderExample, name the class RelBuilderExample1.

Now, here we did the same:

public class RelBuilderExample1 {
  private final boolean verbose;

So, we created a class that has a verbose instance variable, meaning it would output if it is true, we close it at the end

  public RelBuilderExample1(boolean verbose) {
    this.verbose = verbose;
  }

Now, here we create a method that accepts the parameter where it is this, one not the instance variable

public static void main(String[] args) {
new RelBuilderExample1(true).runAllExamples();
}

Now, here we say that it if it is true, then call the runAllExamples method

public void runAllExamples() {
    final FrameworkConfig config = RelBuilderTest.config().build();
    final RelBuilder builder = RelBuilder.create(config);

    for (int i = 0; i <= 4; i++) {
      RelNode node = doExample(builder, i).build();
      if (verbose) {
        System.out.println(RelOptUtil.toString(node));
      }
    }
  }

Now, here we are gluing the content together, We are calling the config from the FrameworkConfig to build RelBuilderTest and configure, so we could have our in-memory tables and schema to build the logical operators on

Then, we are configuring the builder that will be our parameter for the examples. This would be our logical operator. We create based on config

After that, we are defining a loop that adds one for each value, and calls the doExample method, by passing the builder and i as parameters, where the builder references the example

And, if it is verbose = true, then it outputs the logical operators for the node

 private RelBuilder doExample(RelBuilder builder, int i) {
    switch (i) {
    case 0: return example0(builder);
    case 1: return example1(builder);
    case 2: return example2(builder);
    case 3: return example3(builder);
    case 4: return example4(builder);
    default: throw new AssertionError("Unknown example: " + i);
    }
  }

Now, here we pass the switch, which would define the example0 or whatever that we want to build using the above-created setup

So the doExample would be called for each of these examples, otherwise it would either stop due to verbose = false, or it would throw Unknown example.

Select* from bonus

// now here we have our first example simple select * from BONUS
  private RelBuilder example0(RelBuilder builder) {
    return builder.scan("BONUS");
  }

So, a call, for example0, that passes the builder we configured above as a parameter, which would return the logical operator table scan of bonus, if you run it you would get:

LogicalTableScan(table=[[scott, BONUS]])

So far, so good

Where clause

//here we are scanning and filtering here, as you can see SELECT * FROM EMP WHERE SAL > 2000;
//as you can see here, the filter is not based on the aggregate's column but rather the SAL column itself
  private RelBuilder example1(RelBuilder builder) {
    return builder
        .scan("EMP")
        .filter(
            builder.call(
                SqlStdOperatorTable.GREATER_THAN,
                builder.field("SAL"),
                builder.literal(2000)
            )
        );
  }
  //here we just changed the operators
  private RelBuilder example2(RelBuilder builder) {
    return builder
        .scan("EMP")
        .filter(
            builder.call(
                SqlStdOperatorTable.LESS_THAN,
                builder.field("SAL"),
                builder.literal(2000)
            )
        );
  }

As you can see, call the builder for a scan and a filter, where in there we call the call method to specify the inequality operators, the field that we want to reference, and the literal we want to compare to

We did it with < and> so we got these

LogicalFilter(condition=[>($5, 2000)])
  LogicalTableScan(table=[[scott, EMP]])

LogicalFilter(condition=[<($5, 2000)])
  LogicalTableScan(table=[[scott, EMP]])

$5 is a reference to the column SAL as indexed from zero by the engine, we said before we can change the naming convention, but it would complicate the example

A join

  //now here we are joining two scans, and we projecting two fields which would be like:
  // SELECT ENAME, DNAME FROM EMP INNER JOIN DEPT ON DEPT.DEPTNO = EMP.DEPTNO
  private RelBuilder example3(RelBuilder builder) {
    return builder
        .scan("EMP")
        .scan("DEPT")
        .join(JoinRelType.INNER, "DEPTNO") 
        .project(
            builder.field("ENAME"),
            builder.field("DNAME")
        );
  }

So we call the RelBuilder builder, it calls the scan twice and joins them on DEPTNO while only projecting two columns

We get the following:

LogicalProject(ENAME=[$1], DNAME=[$9])
  LogicalJoin(condition=[=($7, $8)], joinType=[inner])
    LogicalTableScan(table=[[scott, EMP]])
    LogicalTableScan(table=[[scott, DEPT]])

group by and a having based on an aggregate

SELECT DEPTNO, COUNT(*) AS EMP_COUNT 
FROM EMP 
GROUP BY DEPTNO 
HAVING COUNT(*) > 3;

private RelBuilder example4(RelBuilder builder) {
    return builder
        .scan("EMP")
        .aggregate(
            builder.groupKey("DEPTNO"),
            builder.count(false, "EMP_COUNT")
        )
        .filter(
            builder.call(
                SqlStdOperatorTable.GREATER_THAN,
                builder.field("EMP_COUNT"),
                builder.literal(3)
            )
        );
  }

So, we call the builder, that scans EMP

Then, it counts with an alias called EMP_COUNT, while grouping by the group key which is DEPTNO

Which is not selected here, but it would not change much

After that, it filters based on the alias, meaning it is a having clause, so, the where and the having have the same method as logical operators Filter, then we start specifying the details under the call method

The literal would be 3, the field is the alias, and the inequality operator is >.

And with that we finish this blog.