The Big Data technology landscape is expanding. Choosing the right frameworks from a growing range of technologies is increasingly becoming a challenge in projects. Helpful for the selection process are performance comparisons of frequently used operations such as cluster analyses. This is exactly what a cooperation between mgm and the University of Leipzig has been about. In the context of the master thesis “Skalierbares Clustering geotemporaler Daten auf verteilten Systemen” (“Scalable clustering of geotemporal data in distributed systems”) of Paul Röwer at the chair of Prof. Dr. Martin Middendorf for parallel processing and complex systems, the k-means algorithm has been implemented for four open source technologies of the Apache Software Foundation — Hadoop, Mahout, Spark, and Flink. Benchmarks have been carried out which compare runtime and scalability.
This second part of the two-part article continues to describe the successful introduction of a proven methodology for quality assurance on the development side according to the principle “Very Early testing” with ERiC (ELSTER Rich Client) by showing in detail how the VET method was introduced to the ERiC project, done by mgm technology partners with Alexander von Helmersen of the Bayerisches Landesamt für Steuern and their teams.
This two-part article describes the successful introduction of a proven methodology for quality assurance on the development side according to the principle “Very Early Testing” with ERiC (ELSTER Rich Client), a project done by mgm technology partners together with the Bavarian Tax Administration (Bayerisches Landesamt für Steuern) and their teams. Within the framework of ELSTER  all software producers are provided with the library ERiC by the German tax authorities and it is embedded in all commercial and governmental software to file tax reports. It validates, compresses and encrypts tax data for the communication with the tax authorities. More than 100 million tax reports are filed via ERiC every year. Due to tax legislation ERiC development must meet rigid requirements.
The article’s first part describes the very efficient QA method of “Very early Testing”. Its second part shows in detail the introduction of the method to the ERiC project.
The revenue gained with Big Data solutions rose by 66% up to 73.5 billion euro world-wide and 59% up to 6.1 billion in Germany over the past year. One of the core technologies used is Hadoop which creates the base for a broad and rich eco-system containing distributed databases, data and graph processing libraries, query and workflow engines and much more. In one of our former blog posts, we have described how we use Hadoop for storing log messages. Since then, a lot has happened in the Hadoop universe and ecosystem. With the start of our new Big Data series, we want to cover those changes and show best practices in the Big Data world.
In the previous post I have shown that the GarbageFirst (G1) collector in Java 7 (and also 8ea) does a reasonable job but cannot reach the GC throughput of the “classic” collectors as soon as old generation collections come about. This article focuses on G1’s ability to control the duration of GC pauses. To this end, I refined my benchmark from the previous tests and also ran it with a huge heap size of 50 GB for which G1 was designed. I learnt that G1’s control of GC pauses is not only costly but, unfortunately, also weaker than expected.
As mentioned in a first post of this series, Oracle’s GarbageFirst (G1) collector has been a supported option in Java 7 for some time. This post examines in more detail the performance of the G1 garbage collector compared to the other collectors available in the Hotspot JVM. I used benchmark tests for this purpose instead of a real application because they can be executed and modified more easily. I found surprising strengths and weaknesses in several of Hotspot’s garbage collectors and even disclose a fully-fledged bug.
We are currently experiencing a Geospatial Revolution that changes in how we navigate from A to B and how we search for locations like a specific sight or restaurants nearby. Geospatial search technology provides such information. This article shows how commercial applications can utilize geospatial search, e.g. for real estate search (qualifing real estates by their distance to the nearest kindergartens, schools, doctors, etc.), calculating building density in cities and so on.
The Fat Controller is a parallel execution handler that repeatedly runs other programs, a bit like cron and Apache Daemon. It is simple to use yet has some nice features that makes it a great tool for simple and complex background processing tasks. The software is Open Source and licensed under GNU GPL v3.
In May 2013, six publishers of big German online quality news sites started a campaign asking their visitors to turn off their adblockers to “ensure the continuance of a multifaceted journalistic reporting in high quality”. The results? Huge discussions, an increase of adblocker downloads and a reactivation of the paid content debate. mgm technology partners took up the issue to ask its staff: Developers, do you use an adblocker? Here’s what they said.
We recently finished a subproject to integrate our mgm Cosmo insurance software with an external CRM system. Both systems had to exchange XML documents in a reliable and robust manner in order to keep their data in sync. We used Apache Camel as the middleware to handle all the transfers between the Java and .NET based systems. This blog series discusses our solution and shares our experiences with Apache Camel.
I recently had the opportunity to test and tune the performance of several shop and portal applications built with Java and running on the Sun/Oracle JVM, among them some of the most visited in Germany. In many cases garbage collection is a key aspect of Java server performance. In the following article we take a look at the state-of-the-art advanced GC algorithms and important tuning options and compare them for diverse real-world scenarios.
Do you also spend sleepless nights because you have saved the passwords of your users in clear text or near-clear text (MD5)? We will show you a simple method how you can smoothly migrate your password database to a much more secure format. The transition is transparent to the users and instant, i.e. as soon as you have implemented the process, your passwords are safe. If you still store your passwords in an insecure format, you should convert them to a secure format as soon as possible. Do it now!
One of the assets of mgm is dedicated quality for software, including especially portal technology for applications with high-safety and reliance demands. In the first blog within this series, “Using Domain Specific Languages to Implement Interactive Frontends“, we described an approach using a specification language (DSL) family on customer level to specify valid inputs and frontend computations for forms-based interactive or batch systems. Let us continue and focus on the quality benefits of this approach.
When we developed this sales reporting solution for the insurance sector, we went for a mobile, browser-based dashboard that renders the reports on the client-side and thus enable a high degree of interactivity. That means that once the reporting data is delivered, the client should be able to e.g. drill down into the data or slide along the time axis. This article focuses on the technical aspects of the data delivery in JSON format and interactive charting in the browser.