The revenue gained with Big Data solutions rose by 66% up to 73.5 billion euro world-wide and 59% up to 6.1 billion in Germany over the past year. One of the core technologies used is Hadoop which creates the base for a broad and rich eco-system containing distributed databases, data and graph processing libraries, query and workflow engines and much more. In one of our former blog posts, we have described how we use Hadoop for storing log messages. Since then, a lot has happened in the Hadoop universe and ecosystem. With the start of our new Big Data series, we want to cover those changes and show best practices in the Big Data world.
In the previous post I have shown that the GarbageFirst (G1) collector in Java 7 (and also 8ea) does a reasonable job but cannot reach the GC throughput of the “classic” collectors as soon as old generation collections come about. This article focuses on G1’s ability to control the duration of GC pauses. To this end, I refined my benchmark from the previous tests and also ran it with a huge heap size of 50 GB for which G1 was designed. I learnt that G1’s control of GC pauses is not only costly but, unfortunately, also weaker than expected.
As mentioned in a first post of this series, Oracle’s GarbageFirst (G1) collector has been a supported option in Java 7 for some time. This post examines in more detail the performance of the G1 garbage collector compared to the other collectors available in the Hotspot JVM. I used benchmark tests for this purpose instead of a real application because they can be executed and modified more easily. I found surprising strengths and weaknesses in several of Hotspot’s garbage collectors and even disclose a fully-fledged bug.
We are currently experiencing a Geospatial Revolution that changes in how we navigate from A to B and how we search for locations like a specific sight or restaurants nearby. Geospatial search technology provides such information. This article shows how commercial applications can utilize geospatial search, e.g. for real estate search (qualifing real estates by their distance to the nearest kindergartens, schools, doctors, etc.), calculating building density in cities and so on.
The Fat Controller is a parallel execution handler that repeatedly runs other programs, a bit like cron and Apache Daemon. It is simple to use yet has some nice features that makes it a great tool for simple and complex background processing tasks. The software is Open Source and licensed under GNU GPL v3.
In May 2013, six publishers of big German online quality news sites started a campaign asking their visitors to turn off their adblockers to “ensure the continuance of a multifaceted journalistic reporting in high quality”. The results? Huge discussions, an increase of adblocker downloads and a reactivation of the paid content debate. mgm technology partners took up the issue to ask its staff: Developers, do you use an adblocker? Here’s what they said.
We recently finished a subproject to integrate our mgm Cosmo insurance software with an external CRM system. Both systems had to exchange XML documents in a reliable and robust manner in order to keep their data in sync. We used Apache Camel as the middleware to handle all the transfers between the Java and .NET based systems. This blog series discusses our solution and shares our experiences with Apache Camel.
I recently had the opportunity to test and tune the performance of several shop and portal applications built with Java and running on the Sun/Oracle JVM, among them some of the most visited in Germany. In many cases garbage collection is a key aspect of Java server performance. In the following article we take a look at the state-of-the-art advanced GC algorithms and important tuning options and compare them for diverse real-world scenarios.
Do you also spend sleepless nights because you have saved the passwords of your users in clear text or near-clear text (MD5)? We will show you a simple method how you can smoothly migrate your password database to a much more secure format. The transition is transparent to the users and instant, i.e. as soon as you have implemented the process, your passwords are safe. If you still store your passwords in an insecure format, you should convert them to a secure format as soon as possible. Do it now!
One of the assets of mgm is dedicated quality for software, including especially portal technology for applications with high-safety and reliance demands. In the first blog within this series, “Using Domain Specific Languages to Implement Interactive Frontends“, we described an approach using a specification language (DSL) family on customer level to specify valid inputs and frontend computations for forms-based interactive or batch systems. Let us continue and focus on the quality benefits of this approach.
When we developed this sales reporting solution for the insurance sector, we went for a mobile, browser-based dashboard that renders the reports on the client-side and thus enable a high degree of interactivity. That means that once the reporting data is delivered, the client should be able to e.g. drill down into the data or slide along the time axis. This article focuses on the technical aspects of the data delivery in JSON format and interactive charting in the browser.
In this blog article we discuss how Fredhopper, an advanced site search and merchandising product, can be integrated into the hybris eCommerce suite not only to search for products, but to create cross selling and campaigns as well. In the used scenario hybris is the foundation of a marketplace with a few million products from thousands of vendors.
My colleague Slavomír Jeleň and I are currently working on a logistics management application for an international food retailer. It’s a data-oriented application that performs pre-calculation steps on billions of rows with PL/SQL stored procedures. In order to ensure the correctness of these calculations, we devised a solution for unit testing the stored procedures in Oracle based on DBUnit.
This is the next of my episode on regular expressions. Today, we look at the worst regex you can possibly come up with, although it looks innocent and simple. You will learn about this backtracking trap that let’s you easily wait for 10^30 steps, as an example of an errant email regex will illustrate. One possible solution we investigate is the use of possessive quantifiers.