Running MySQL/MariaDB and Sphinx in Docker for powerful search

Want to setup a relational database with advanced search features? Just merge two specific software in one powerful application: MariaDB and Sphinx.

We already know, the world of data storing is always changing and as time goes by more and more solutions became available to solve any known problem in information technology. But the most complete solution is not always the most comfortable one, specially in small projects.

We know also that there are some very trendy tool when it comes to searching, for example ElasticSearch. I really like it and using it will give you a lot of satisfaction.
The problems come when you have an already existing website on a MySQL/MariaDB relational database and you want to add a powerful search feature. There will be some obvious solutions.

  • You can use a FULLTEXT index, but that’s not really powerful and probably not what you want.
  • You can send to ElasticSearch only the data you want to search on, but on you website you will probably need to fetch some data also from the database, so two data connection are needed and you app has to be partially refactored.
  • You can send to ElasticSearch all the data you need to generate your pages in form of documents. It’s often a good solution, but you have to synchronize the data anytime something changes.
  • You can move the whole dataset to ElasticSearch dropping the database. That’s ok, but your application has to be totally refactored.

This are all good solutions, with some drawbacks, but if you don’t have the resources to face such a move, and maybe you don’t need the very all features offered by tools like ElasticSearch, you can try a cheaper solution that I will explain you here. The solution will be based on a MySQL/MariaDB database and the Sphinx Search Engine, running here in a Docker service for convenience.

Environment Setup

The only system requirement is a running Docker service. Everything else will run inside containers, so you don’t need nothing more.

And for this example we will need only two containers: one with a database and one for Sphinx.

For the database I will use a official MariaDB container. Please note that a historical bug in MySQL/MariaDB will prevent run the example if you use MariaDB before 10.2.29, 10.3.20 or 10.4.10. As far as I know in MySQL the bug still exists. The bug prevents you to connect to Sphinx using a hostname. With only IP addresses allowed it is very annoying to setup your Docker containers. So update your DB!

For the Sphinx container I will use a custon Dockerfile to build it using Alpine 3.10 and simply addind the Sphinx SE package. You will have the good old 2.2.11 release, but I suppose everything will work also with the 2.3 betas or the more recent version 3.*

The MariaDB database

For this example we will use a very very simple database schema. Supposing we are managing a website we will have only one table for your documents with an ID, a title, a body and a tag. So, we don’t need a Dockerfile for the database.

We have a docker-compose.yml that is something like this.

version: "2"

services:
  sphinx:
    build:
      context: sphinx/
      dockerfile: Dockerfile
  database:
    image: mariadb:10.2
    environment:
      MYSQL_ROOT_PASSWORD: root
    ports:
      - "3306:3306"
    volumes:
      - "./database/:/var/lib/mysql/"

As you can see we will expose the 3306 port of MySQL and use a volume to keep persistent data.
I suggest you to first run only the database container (with docker-compose up -d database) to initialize the database. Create a new database called “test” and create a new table using something like this:

CREATE TABLE foobar (id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, title VARCHAR(100), body MEDIUMTEXT, tag VARCHAR(32));

Then put some random data into the table. Now we are ready for setting up Sphinx.

The Sphinx SE container

The Dockerfile for Sphinx will look like this.

FROM alpine:latest

RUN apk --update add sphinx
RUN mkdir -p /var/lib/sphinx \
	&& mkdir -p /var/lib/sphinx/data \
	&& mkdir -p /var/log/sphinx \
	&& mkdir -p /var/run/sphinx

COPY sphinx.conf /etc/sphinx/sphinx.conf

EXPOSE 9306
EXPOSE 9312

CMD indexer --all && searchd --nodetach

Install Sphinx, add some working folder, copy the configuration file, expose some ports, index the content and start the service. I will not go into details here because it’s out of the scope of this post, and I think it’is pretty straightforward.
The sphinx.conf is really simple too.

searchd
{
	listen          = 0.0.0.0:9306:mysql41
	listen          = 0.0.0.0:9312
	log             = /var/log/sphinx/searchd.log
	query_log       = /var/log/sphinx/query.log
	read_timeout    = 5
	max_children    = 30
	pid_file        = /var/run/sphinx/searchd.pid
	seamless_rotate = 1
	preopen_indexes = 1
	unlink_old      = 1
	workers         = none # threads # for RT to work
	binlog_path     = /var/lib/sphinx
}

source test_src {
	type        = mysql
	sql_host    = database
	sql_user    = root
	sql_pass    = pass
	sql_db      = test
	sql_query   = SELECT id, title, body, tag FROM foobar
	sql_attr_string = tag
}

index test_idx
{
	source = test_src
	path = /var/lib/sphinx/data/test
	docinfo = extern
}

We configure our search service (ports, paths and so on). Then define our source with a query and setting a attribute. Finally there is our index. This is a very basic Sphinx configuration so I don’t go to explain it in details.

Now you can start the search container: docker-compose up -d sphinx. The container will start, then fetch the data from the database and finally start the service.

Putting pieces together

Now, back to the database. MySQL/MariaDB ships with a SphinxSE engine by a long time. You have to manually enable it, but it’s really straightforward. Just access you database and run this to enable the engine.

INSTALL SONAME 'ha_sphinx';

Finally we have to create a table that will be used as a bridge to the search engine.

CREATE TABLE `foobar_idx` (
  `id` bigint(20) unsigned NOT NULL,
  `weight` int(11) NOT NULL,
  `query` varchar(3072) NOT NULL,
  `tag` varchar(32) DEFAULT NULL,
  KEY `query` (`query`)
) ENGINE=SPHINX CONNECTION='sphinx://sphinx:9312/test_idx'

The id, weight and query columns are mandatory. The tag column reflects the “tag” attribute we defined above, so you have to set this columns in relation to your search schema. The table uses obviously the “sphinx” engine and we have a connection URL where we use the name of the search container as a hostname.
Wow, we are ready for our first query!

SELECT * FROM foobar_idx, foobar WHERE query = 'hello' AND foobar.id = foobar_idx.id;

Conclusions

What are the advantages of such a solution?
Well Sphinx will be used as an extended fulltext search engine for MySQL/MariaDB, so there is some work to do to copy your data info the search engine, but the most of your database remains unchanged and also your web application.
Data retrieving will need only some minor changes. You have to edit your search queries, but the result sets can remain the same and so the rest of your logic. No extra calls are needed, no extra connection to other services, everything will run through the database server as before, a good way to save development time.

Obviously Sphinx is not the most complete search engine, but it covers the most important features. A good solution for the most common needs.