Back
Aug
19
2015

The code of Clusterpoint: how a search engine became a database

Curious how Latvian database as a service grew from an idea of indexed search to a worldwide AngelHack sponsor? We sat down with Clusterpoint CEO Zigmars Rascevskis to find out. 

Tell us briefly - how was Clusterpoint started?

Clusterpoint was started in 2006 by three Latvians - Gints ErnestonsJurgis Orups and Oskars Viksna. They started with a vision of a search engine that could easily index data. Step by step, the project grew into becoming a database. Big things started happening after 2014 when Clusterpoint attracted more capital, hired salespeople and headhunted more engineers with international experience.

I joined Clusterpoint in October 2014. One of the reasons I did was because of the global vision. We’ve been approached by investors who are interested in helping us support this vision, which we’re thrilled about. I would not be interested in building a company with just a regional reach. What drives me is creating a technology that is relevant to the whole world - a great solution with a big market.  

Image: Clusterpoint at QCon New York 2015, Source: Team archives

Did your vision change before or after the investment? 

Yes, the first change came around 2008 to 2009 when we started transforming our search engine into a database platform. Then we had another more strategic change in 2014. Earlier we positioned ourselves as an enterprise database, which essentially means, first, finding a customer that needs our solution, then entering a long process of negotiation, tailoring the product to the specific needs of the customer and finally deployment. This is not an ideal model, in our case, that would allow the Clusterpoint business to grow quickly. You may be able to find a few success cases, but generally it is hard to repeat similar achievements without building a larger organization. This is how we’ve changed the vision of our product and sales strategy. We created a cloud service - our clients do not need to install anything and all of the resources they could need are available right away. 

We’re also flexible and provide on-premise installations. However, we believe that this form of database deployment is on its way to extinction. That being said, we are keeping this option for our clients. If they decide to switch to the on-premise installable version, we can support that; but the Cloud is where we see the future. The Cloud provides speed and scalability benefits that no other on-premise deployment can match.

Because of this, we’ve invested in a slew of developer-oriented marketing initiatives in the first quarter of 2015. We had the Clusterpoint Challenge developer contest, supported Garage48 hackathon, exhibited at several technology conferences, and are continuing with the AngelHack offline hackathon events worldwide. What we essentially do is, we go to developers and say: "Hey, here is a useful tool we created! Try it out. You may like it." 

Image: Clusterpoint at hackathon with winning team, Source: Team archives

So, what do those developers say?

Really good things. In fact, many developers have signed up for our service. We do not want to publish exact numbers, but several thousand people have signed up within the first 6 months, exceeding our initial expectations.  

How actively do they use Clusterpoint?

Right now, it’s too early to tell, but we are currently measuring whether or not our users have uploaded any data. Not all of them have, but a considerable number are actively working with the service.

Hackathons and other events have proven to work well for us. We have supported events in the US and Europe, and we have invested in AngelHack hackathons worldwide, which we support globally along with Amazon, IBM and other well known tech industry players. We joined 50 hackathon series over the course of four months, partly in the US, partly in Europe, and also in India and Australia.

Who finds Clusterpoint useful?

Any organization that has more data than fits into a regular Excel spreadsheet needs a database. Since the 1980s the database market has been dominated by relational databases - Oracle being the most prominent example. Data in relational databases are divided into tables, and these tables have links, making the data interconnected. Consequently, when the database grows, you need to buy a bigger server. 

At some point people realized that relational databases are not necessarily the solution, as there is data that cannot be processed in this way. You can think of Google Websearch. Do you think it is stored in a relational database? I know it is not! At Google and elsewhere people are starting to understand that they need to divide their databases across multiple servers. This was the start of NoSQL. 

The scalability aspect of this new technology is still a very hot topic. The Clusterpoint database itself is a pretty generic product. You can use it as a database for any purpose. So, even those users who are currently content with relational databases can do the same with Clusterpoint. But the key question is: why switch? 

Image source: Clusterpoint web page

One advantage of the Clusterpoint cloud database, which is particularly appealing to startups, is our fair pricing policy. Initially, the database is available for free, needs no configuration and works right away after signing up. As your data amount grows, you start paying exactly for the amount of computational resources that you use. The database scales itself, as your startup scales.

Another major competitive differentiator is our focus on cloud computing. A regular database is as fast as the speed of the hardware that the database is installed on. For a company that needs the request fulfilled at a particular time, this solution requires the presence of very powerful servers. You, as a customer, would buy a server or rent it from Amazon Cloud, and install it on a relational database. Or you would buy five machines, install Mongo on each of them and receive the exact computing power that you have provisioned. With Clusterpoint, you receive access to instant scalability, which means that when you put data in our cloud, it is spread across multiple machines. When our client sends the request, the database determines which servers to send it to and how many resources it needs. Consequently, you can receive hundreds of processors for your request, and the cloud calculates exactly how much capacity you use, similarly to an electric meter. 

Also, if you need to generate a resource-consuming report once a week, Clusterpoint is a very powerful solution. You send the request, the system calculates it and returns the results to you. Additionally, you do not have to keep 100 processors working all the time. The usual pattern for resource utilization, particularly for smaller companies, is irregular. This is why our pricing model - pay exactly for what you use -  is very fair, effective and appreciated by startups. 

Our offer is just that - receive the resources you need exactly when you need them, and you won’t pay for the time you remained idle. Our clients no longer need to worry whether they will need five or 100 processors, what RAM they should get, or choose between solid state drives and more RAM. Our cloud takes care of everything. 

The first corporate customer for our Cloud service was e-mail marketing company Mailigen, which provides services all over the world and distributes millions of e-mails on behalf of their customers. We help Mailigen solve their database scalability issue and remove barriers for business growth by replacing previously implemented SQL-based on-premises database infrastructure with our Cloud service. Our instant scalability together with our pay-per-use model enables Mailigen to get instant access to the necessary computational resources within our Cloud for carrying out demanding requests during multi-million email campaigns. During such campaigns, Mailigen pays only for the actual amount of the resources used on monthly basis without needing to provision anything in advance and avoiding spending extra for smaller-scale campaigns. 

How did your experience at Google help in designing Clusterpoint? 

I worked on the Google Websearch backend, and this was an excellent experience to understand how large-scale systems work. Because of this, I saw an application of my design skills in a different field - data management software.

A massive challenge that I see is that developer teams at big tech companies (Google, Facebook, Yahoo) have already abandoned relational databases completely. However, if we look at market analytics reports, there is roughly a six percent NoSQL penetration. It is also one of my motivators - to create a technology that is relevant to people and enables them to use their infrastructure more effectively. 

Additionally, I suspect many of those six percent who use NoSQL still try to apply it in the same way they were using SQL technology. This means - they purchase two or three servers and launch, hoping to scale later, but this does not give enough them enough parallelism. It simply does not have critical mass. To make use of the distributed architecture and the new technology you need at least 100 servers, to launch the stack on them and use many servers simultaneously. 

Comparing the capacity of these two approaches is like comparing a car to a spaceship. Both can get you from one point to the other, and you probably would not use a spaceship to go grocery shopping, but if you were to go around the world or travel to another galaxy, the car is simply not enough.  The NoSQL database combined with cloud computing has the capacity of a spaceship.

Another important aspect is to give sufficient control to the user. Making the user feel in charge is one of our key design principles. Our users can see and control how they are using our system. For example, we keep logs of all requests that they send to the database, and they can see how long it takes to process, how much processor capacity has been utilized, which step is the bottleneck in their system, and all other information that allows the user to understand how he is interacting with our system. This is one of key principles that differentiates a good infrastructure program from a bad one. A good program provides you with useful information so that you can improve your application. 

What paid services do you plan for Clusterpoint?

We do not plan on introducing anything new at the moment. There surely is an endless amount of potential to offer various services on top of the database, but we are now focused on developing the user base for our core business. If our clients' database is in the cloud, we can complement it with things like business analytics, natural language processing and predictive analytics. There is a whole range of possibilities to choose from.

Where are your data centres located? 

Currently, our servers are in Latvia. We also have another datacenter in the United States. After the next investment round we plan to build out our worldwide network, covering all major regions.   

When will you have the next funding round?

We are planning our next funding round closer to the end of 2015.  

Who are your clients in Latvia?

Our startup roster includes Mailigen, PhoneAd and a few others. For business customers we could highlight  Latvijas Talrunis, various business directories, management solutions, as well as Riga City Council, several government institutions, and the article search functionality for Delfi news portal.  

Currently we have 30 employees and more than half of them are engineers. We also have a technical support and system administrators, sales and marketing people, finance and an office administration department. After the next funding round we plan to significantly grow the team. We already have an office in London and will need to open a space in the US.

Gathering the team was initially Gints's responsibility, as he had formulated a technically exciting and challenging task, at least for the existing team. We created the technology, database and the search engine. It is not an easy-to-create technology, and we have ambitions to create a working solution fast which motivates people. We should look forward and cannot relax for a minute about our position in the market as everything around us is changing very quickly. I am checking news of big companies every day to monitor their latest developments.

In my last two years at Google, I was the engineering manager of the Websearch back-end team in Zurich, which is Google’s biggest development center in Europe. Prior to that,  I had the opportunity to work on various projects, including: cluster management and product search.  

Our sales and marketing organization is lead by Peteris Janovskis who worked at Oracle for 12 years. 

Do Latvian universities research and teach your technology? In what format, and is it useful for your business? 

Yes. We work together with professor Girts Karnitis. He teaches database science and theory to computer science students at the University of Latvia, and covers Clusterpoint as a Latvian-built database. Sometimes students even use our database in their workshops.

Also, we have the transaction algorithm - that is hard to implement for NoSQL or distributed databases - to create transactionally consistent document updates, which they cannot do, as these documents are physically located on different servers. We have built quite a complex algorithm that can do this and we have discussed this with professor Karnitis extensively. 

Are there ready-made solutions, based on Clusterpoint?

We have a daughter company, ClusterPark, with one particular solution, a log analysis tool called GOL. We also have an NTSS (Network Traffic Security System) tool created in cooperation with our partners. You can attach it to your infrastructure, and it examines all passing packets, performs reverse engineering of the protocols and puts them into the database. After that, you can access and analyze the data. This is a network surveillance and security tool that helps manage intrusions and incidents or simply view the activity statistics. It is an interesting network security product, but we are not developing it - it is done by our partner company.

Another great Clusterpoint-based solution is WikiSearch.net. It is a website containing a Wikipedia database stored in a Clusterpoint database. You can call it our promotional project. It allows us to evaluate and assess our search possibilities. We also collaborate with ZoomCharts to enhance their visual side. This is an experimental project, which is an alternative search and information visualization solution, based on Wikipedia data.

Image source: WikiSearch.net, query: Latvia

We are also considering other partnership models. Even though we are currently focused on business growth, we consider partnership forms that can enhance the value proposition to our end customers. The customer would be provided with the Cloud resources by Clusterpoint, and our partner would receive the agency fee for the intermediary services. The idea is that we want to help our partners migrate their end-clients' data to the Cloud environment. If the end customers migrate to the Cloud, our partners receive an ongoing commission for the Cloud services, including hardware resources. If the customers choose to deploy the on-premise model, the client has to buy the hardware from hardware vendors, meaning the partners do not receive any share from the deal. This ultimately motivates the partner to encourage his clients to choose Clusterpoint, migrate to Cloud and save on hardware.

We stress that Clusterpoint provides database as a service, not a technology in a box. 

What are your planned next steps?

We will improve the developer community support with various educational and informative materials. We want to ensure that any user with any level of prior knowledge is equipped with the necessary documentation, video tutorials and other informational resources. By the end of the year, we are committed to creating multi-search support. 

We are also looking at events outside Latvia, not only hackatons, but also some traditional conferences. We are interested in reaching a veteran developer audience, working with ready-made solutions, as our cloud service is ready for use right away. Whenever a company is ready to migrate its data infrastructure to the cloud, we are ready to engage in serious business discussions.  

Would you sell the company if there was a serious buyer?

Ultimately this question should be addressed to shareholders. I, as an executive, can advise them with my opinion of the future prospects of the company. Surely the key question is about the price. What is the offer now compared to how big the business will grow in the foreseeable future?

Were there any offers so far?

No, but we were also not actively looking for exit opportunities. Even though we’ve started to see some consolidation in the database market, there is so much growth potential at the moment that it does not appear to make sense for us right now.