Tuesday, July 1, 2014

MongoDB map-reduce recommendation implementation

Challenge: 

Modern applications often require extended storage spaces and real-time processing of large amounts of data. In the traditional storage, data is spread across multiple related tables in SQL database. This database is queried from server application layer where additional calculations take place. Salesforce, even though it can be customized in detail, provides similar architecture. As a result, the performance may lag behind, because the database collects data from multiple tables and sends it across the network. These time response delays may negatively affect real-time features.

Solution:

NoSQL solutions meet the requirements of modern software development and deal with the limitations mentioned above. MongoDB is the leading NoSQL database, which gained popularity among the Fortune 500 and Global 500 companies. NoSQL database is a document store and its key features include: dynamic schema, horizontal scaling, replication and MapReduce support.

Case:

In the following example of Mongo DB usage, VRP developers created a recommendation engine capable of searching for similar opportunities from the pool of predefined opportunities. We used a collection of 500k predefined opportunities. A sample implementation of recommender request is simple: compare input opportunity with all opportunities from the pool and count the number of matching products for each pair of opportunities. With the help of this algorithm, we can find related products to recommend together with the main product. The traditional implementation of the same functionality in a relational database may take hours to complete.


MongoDB recommender implementation: