Presto is a distributed query engine that allows querying different data sources such as Kafka, MySQL, MongoDB, Oracle, Cassandra, Hive, etc. using SQL. It has the ability to analyze big data and query multiple data sources together.
In this article, we will discuss how Presto can be used to query Kafka topics. Below is the step-by-step process to set up Presto and Kafka, and connect them together. Here, I have considered MacOS, but similar setups can be done on any other system.
- A MacOS machine with Homebrew and Java installed.
$ brew install presto
- Presto will get installed at /usr/local/Cellar/prestodb by default.
- The following 4 configuration files will be created by default at /usr/local/Cellar/prestodb/<presto version>/libexec/etc:
Note: The settings in the above files can be updated based on the requirements. Though, for the basic working, the default settings will suffice. So we can leave the settings unchanged and proceed. Here, the Presto version refers to the latest version of Presto that has been downloaded. A directory is created by that name (e.g., 0.270). So the actual path, in this case, would be /usr/local/Cellar/prestodb/0.270/libexec/etc
- The data source connectors need to be created at /usr/local/Cellar/prestodb/<presto version>/libexec/etc/catalog. By default, one connector (JMX) is created. Other connectors can be setup as per the requirements. We’ll cover this in later sections of this article.
- Start the Presto server.
$ presto-server start
Note: The above command will start the Presto server in the background. Alternatively, use the following command to start the Presto server in the foreground. Sometimes it is useful to run the server in the foreground to check for any issues that are causing the server start to fail. Any start-up issues will be shown in the console log and can be debugged.
$ presto-server run
- Use the following command to stop the Presto server any time you need to
$ presto-server stop.Note: Alternatively, the Presto server can be stopped using
ctrl+cwhile running in the foreground.
- Access Presto Web UI at http://localhost:8080 while the Presto server is running.
- Presto CLI is downloaded automatically as part of the Presto installation at /usr/local/Cellar/prestodb/<presto version>/libexec. It is in the form of an executable jar (e.g.: presto-cli-0.270-executable.jar).
- Rename the CLI jar to simply ‘presto’ for convenience.
$ mv presto-cli-0.270-executable.jar presto
- Provide execute permissions to CLI.
$ chmod +x presto
- Launch Presto CLI. Presto prompt will appear if CLI is launched successfully.
$ ./presto --server localhost:8080 --catalog <catalog name> presto>
Note: Here, catalog name refers to data source connector (e.g., Kafka, MySQL, jmx, etc.).
If you are able to launch the Presto server, Presto UI, and Presto CLI seamlessly, that means the Presto setup was successful.
$ brew install kafka
- Start Zookeeper service (in terminal 1).
$ zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties
- Start Kafka broker service (in terminal 2).
$ kafka-server-start /usr/local/etc/kafka/server.properties
- Create a topic (in terminal 3).
$ kafka-topics --create --topic test_events --bootstrap-server localhost:9092
- Start the Kafka producer (in terminal 3) and write events to the above topic.
$ kafka-console-producer --topic test_events --bootstrap-server localhost:9092
- Start Kafka consumer (in terminal 4) and read events from the above topic.
$ kafka-console-consumer --topic test_events --from-beginning --bootstrap-server localhost:9092
If you are able to produce and consume the events seamlessly, that means the Kafka setup is successful and we can move on.
Connecting Presto to Kafka and Querying Kafka Topics
- Set up Kafka connector at /usr/local/Cellar/prestodb/<presto version>/libexec/etc/catalog by creating the file kafka.properties and adding the following configurations to it:
connector.name=kafka kafka.nodes=localhost:9092 kafka.table-names=test_schema.test_events kafka.hide-internal-columns=false
Note: Make sure there are no trailing whitespaces in the above settings, else the Presto server may not start successfully. In kafka.table-names, the Kafka topic names are provided (comma separated). Optionally, the schema name can be prefixed to the topic names. If no schema name is prefixed, the topic tables will go into default schema.
- Restart the Presto server after creating the above connector. The Presto server needs to be restarted each time new connectors are added or existing connectors are updated to pick up the latest settings.
$ presto-server start
$ ./presto --server localhost:8080 --catalog kafka --schema test_schema presto:test_schema>
Note: Alternatively, Presto CLI can be launched without providing the schema and schema can be chosen afterwards.
$ ./presto --server localhost:8080 --catalog kafka presto> use test_schema; presto:test_schema>
presto:test_schema> select * from test_events;
- Produce new events to the above Kafka topic and see that they appear in the results of the above query.
$ kafka-console-producer --topic test_events --bootstrap-server localhost:9092 >message1 >message2 >message3
If you are able to produce and query the events seamlessly, that means the above setup was successful.
This brings us to the end of this article. We saw how we can set up Presto and Kafka and query the Kafka topics using Presto. This can also be done programatically using JDBC connection (jdbc:presto://localhost:8080/kafka/test_schema).