Unlock your Kubernetes to run custom resource based microservices in any scale

🌅 The Rise of Custom Resources and Their Challenges

I don’t need to introduce Kubernetes in this post, as it is the de facto standard for running microservices at scale. Kubernetes provides a robust infrastructure not only for running workloads, but also for exposing services, securing applications, configuring authorization, and much more. The introduction of custom defined resources extended Kubernetes’ capabilities, enabling developers to implement custom services and controllers without worrying about the underlying infrastructure. This innovation promised faster development and portability. Over the past few years, custom resources and operators/controllers have become increasingly popular. Crossplane marked another significant shift, allowing management of resources outside the cluster.

However, custom controllers face significant challenges when handling large volumes of data. Kubernetes relies on ETCD for all data storage, which limits scalability, flexibility, and performance for complex or high-volume workloads. What are the main issues?

ETCD is designed for configuration data, and cluster size is limited.
ETCD’s consistency model means all members are full replicas, and Kubernetes only connects to the leader.
ETCD cannot filter data at the storage level; the Kubernetes API server must fetch all records and filter them client-side.
With a single ETCD instance, both cluster configuration and custom data are mixed. High-load custom services can block or delay normal cluster operations, risking overall cluster health.
Kubernetes is multi-tenant with isolation methods, but ultimately all data is stored in a single database. A database outage affects the entire cluster, and a malicious attacker could potentially access everything.

These are some of the reasons why custom resources and controllers are used to solve infrastructure problems, and why high-volume data applications are often avoided.

We are not without hope…

✨ The Future of Data Management on Kubernetes

Before discussing the solution, let’s highlight the limitations of custom resource controllers in Kubernetes.

Limitation	Description
Eventual Consistency	Updates may not be immediately visible across the system
Non-Transactional	No support for ACID transactions
Relational Logic	Complex joins or relations between data entities are not supported
Limited Data Filtering	No advanced query engine included within Kubernetes

💡 Don’t worry, the Kubernetes API aggregation layer can help overcome the limitations of the core API server by allowing you to extend the API with custom APIs that are served by a separate backend, or extension API server. This setup enables you to implement specific logic and capabilities that aren’t available in the core API.

It’s time to meet HariKube. HariKube is a middleware that transparently distributes database load across multiple vendor-agnostic databases, delivering low latency, high throughput, and a true cloud-native development experience. It achieves exceptional performance through data distribution and optimized database routing. By offloading resource-intensive workloads from ETCD, HariKube ensures consistent responsiveness and operational efficiency at scale. It enables strict data separation across namespaces, resource types, or services—helping organizations meet security and compliance requirements without sacrificing scalability or performance. Additionally, HariKube simplifies developer workflows by abstracting infrastructure complexity. Developers can focus on data structures and business logic while the platform handles data routing and storage. HariKube is fully transparent; Kubernetes does not notice it is not communicating with an ETCD instance. You can use vanilla Kubernetes for development with limited data, and deploy HariKube in production to handle large datasets and distribute data across multiple backends.

HariKube supports multiple backends, each with different capabilities for data access and filtering. The table below outlines which storage engines are compatible and whether they support storage-side filtering for efficient querying. Find full list of databases here.

👷 Getting Started

First, configure your databases in docker-compose.yaml.

docker-compose.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
version: '3.8'
services:
  etcd2479:
    image: bitnamilegacy/etcd:3.6.4
    container_name: etcd2379
    network_mode: "host"
    command: etcd --auto-compaction-mode=revision --auto-compaction-retention=5m

  mysql3306:
    image: linuxserver/mariadb:10.11.8
    container_name: mysql3306
    network_mode: "host"
    environment:
      - MYSQL_ROOT_PASSWORD=passwd

  pgsql5432:
    image: postgres:17-alpine3.20
    container_name: pgsql5432
    network_mode: "host"
    environment:
      - POSTGRES_PASSWORD=passwd

Run the following command to start the databases.

docker-compose.yaml

1
docker compose up -d

Next, create a data routing configuration file called topology.yaml.

Update your endpoints, because the example uses Docker bridge IP!

topology.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
backends:
- name: rbac
  endpoint: http://172.17.0.1:2379
  regexp:
    prefix: events/
    key: events/
- name: kube-system
  endpoint: mysql://root:passwd@tcp(172.17.0.1:3306)/kube_system
  namespace:
    namespace: kube-system
- name: pods
  endpoint: postgres://postgres:passwd@172.17.0.1:5432/pods
  prefix:
    prefix: pods
- name: shirts
  endpoint: sqlite:///db/shirts.db?_journal=WAL&cache=shared
  customresource:
    group: stable.example.com
    kind: shirts

Routing Configuration Explained:

ETCD with regular expression routing: Routes events to an ETCD store.
MySQL endpoint with namespace matching: All objects in the kube-system namespace are routed to a MySQL backend.
PostgreSQL endpoint with prefix matching: All pod resources—except pods in the kube-system namespace—are routed to a PostgreSQL backend.
SQLite endpoint for specific custom resources: Routes all resources of type shirts in the group stable.example.com to a lightweight embedded SQLite database.
All other objects are stored in the default SQLite database.

After preparing the environment, start the middleware.

⚠️ HariKube images aren’t public yet. If you’d like to try them, request a free trial version on the Open Beta invitation page.

Start by authenticating your local Docker client with the private registry at registry.harikube.info. This step is essential for pulling images from the registry.

1
docker login registry.harikube.info

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
docker run -d \
  --name harikube_middleware \
  --stop-timeout=-1 \
  --net=host \
  -e TOPOLOGY_CONFIG=/topology.yaml \
  -e ENABLE_TELEMETRY_PUSH=true \
  -v $(pwd)/topology.yaml:/topology.yaml \
  -v harikube_db:/db \
  registry.harikube.info/harikube/middleware:beta-v1.0.0-19 \
  --listen-address=0.0.0.0:2369 --endpoint='multi://sqlite:///db/main.db?_journal=WAL&cache=shared'

The final step is to start the Kubernetes cluster. As mentioned, HariKube is transparent to Kubernetes and works out of the box. However, supporting large datasets requires recompiling the Kubernetes API Server and Controller Manager. You can follow the guide here, but for simplicity, this tutorial uses Kind with vanilla Kubernetes.

Create a Kind config in kind-config.yaml.

kind-config.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
featureGates:
  "CustomResourceFieldSelectors": true
  "WatchList": true
  "WatchListClient": true
nodes:
- role: control-plane
  image: kindest/node:v1.34.0
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer:
        extraArgs:
          etcd-servers: "http://172.17.0.1:2369"

Start the cluster with:

1
kind create cluster --name harikube-cluster --config kind-config.yaml

You can validate that HariKube has distributed your data according to the topology configuration:

1
2
3
4
5
6
7
8
# Default database for other objects
docker run -it --rm -v harikube_db:/data alpine/sqlite /data/main.db "select name from kine"
# Kubernetes events only
etcdctl --endpoints=http://172.17.0.1:2379 get / --prefix --keys-only
# Objects in the `kube-system` namespace
docker exec -t mysql3306 mysql -uroot -ppasswd -Dkube_system -e "select name from kine"
# All pods except those in `kube-system`
docker exec -it pgsql5432 su postgres -c "psql -d pods -c 'select name from kine'"

Now, create your first custom resource. Apply the definition file:

1
kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/customresourcedefinition/shirt-resource-definition.yaml

Then add a few resources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cat | kubectl apply -f - <<EOF
apiVersion: stable.example.com/v1
kind: Shirt
metadata:
  name: example1
spec:
  color: blue
  size: S
---
apiVersion: stable.example.com/v1
kind: Shirt
metadata:
  name: example2
spec:
  color: blue
  size: M
---
apiVersion: stable.example.com/v1
kind: Shirt
metadata:
  name: example3
spec:
  color: green
  size: M
EOF

Verify the resources are exists.

1
kubectl get shirts

NAME       COLOR   SIZE
example1   blue    S
example2   blue    M
example3   green   M

You can verify that HariKube has stored all shirts in the selected SQLite database:

1
2
# All three `shirts` you created
docker run -it --rm -v harikube_db:/data alpine/sqlite /data/shirts.db "select name from kine"

That’s it! Imagine your own data topology and enhance your Kubernetes experience. Enjoy lower latency, higher throughput, data isolation, virtually unlimited storage, and simplified development. HariKube supports both flat and hierarchical topologies, allowing you to organize your databases like leaves on a tree.

Thank you for reading, and feel free to share your thoughts.

Unlock your Kubernetes to run custom resource based microservices in any scale

Table of Contents

🌅 The Rise of Custom Resources and Their Challenges

✨ The Future of Data Management on Kubernetes

👷 Getting Started

Ready to Get Started?