Modern user interfaces and exceptional employees aren’t just enough to deliver optimal DX/Digital Experience. Mission-critical technologies that perform well, easy to use, exceptional interaction with every touchpoint of an organization. Operational excellence has become more challenging as it relies on more complex infrastructure and technologies, and users with different background and experience, and an ever evolving world that demands more, faster and better. Organizations need a tool to manage and improve DX, ideally proactively or reactively.
These tools are called all-in-one-APM-RUM-BPM-Automation solutions. In this article, we are focusing on the RUM side, yet it is critical to note that organizations need a modern and all-in-one monitoring, analytics, and automation platform, that provides User Experience, Process and Technology insights and not only helps proactively detect issues but also helps resolve them.
6 of the best Real User Monitoring Tools and how to choose one
Here is a list comparing 6 of the best real user monitoring (RUM) tools (and we have been in this since 2001):
germainAPM’s real user experience platform is helping companies like American Airlines, eBay, Anz Bank, General Electric, Volvo, Pepsi, United Healthcare improve their user experience at a fraction of the cost of other solutions.
A Real User Monitoring solution that offers 100% visibility in your User Experience of your website or other web application.
This technology is vastly different than traditional APM and RUM systems. See how quickly you can detect and resolve issues when your APM system records user sessions and performs automatic root-cause analysis. This, in turn, leads to lower churn rates and improved user experience.
In light of the corona crisis, businesses are being forced to cut costs, work smarter, and eliminate inefficiencies. GermainAPM is offering financial relief through a special offer that has cut expenses by 50-600% for companies.
- Real User Session Recording & Replay
- Proactive Detection
- Automatic Resolution (flexible automation)
- DX to Technology mapping
- $29->$5K/month (1-12 months) & $51->$12K/month (after) | Unlimited & Free Help | Pricing details here
- The cheapest solution, by far!
- End-to-end insights (ux, process, tech)
- Correlation (e.g. for transaction tracking across data sources)
- All-in-one APM, RUM, BPM, BI and Automation
- Lack of collaboration features
- Support from US and Europe timezones
Read more about our RUM solution
Fullstory is one of the best website monitoring service that offers complete real-time visibility of digital experience. You can monitor the activity of all mobile and web application users across all devices and browsers to assess and improve user satisfaction.
With Fullstory you can also collect business-relevant metrics, allowing you to correlate performance issues with potential business impact.
- Record and Replay
- Expensive. Available on request here
- Modern UI
- Collaboration features
- Lack User Experience insights
Hotjar‘s User Experience monitoring solution that offers complete User Experience insights for your website.
Great heatmap and analytics helping find UX frictions. Built for Marketers, Product Managers and UX Designers.
- User Session Recording
- Conversion funnel
- Pricey and available here
- Session Replay
- No User Issue Diagnosis
- Lack of User Behavior insights
- Old UI
Dynatrace is one of the oldest solution on the market, Dynatrace has clearly a significant amount of features in the area of monitoring, analytics and automation for your website, technologies. You can map the whole user journey.
Awareness into performance issues and potential business impact. Ability to resolve problems proactively with real-time data
- Digital Experience
- Application Performance
- Infrastructure Monitoring
- Digital Business Experience
- Significantly expensive
- Lots of features
- Deep dive in Tech stack
- All-in-one APM, RUM and BPM
- Missing tones of User Behavior and Issue insights
- Not Easy to use
- End-to-end visibility
- Incident Troubleshooting (Ajax, Java, Calls, etc)
- Distributed tracing
- Synthetic Lab
- Browser Pageviews and Page Load Times
- Session Traces
- Single page application(SPA) support
- Expensive, check it out here
- Easy to deploy
- Supports many Technologies
- Missing UX Insights
- Lack of Issue Diagnosis
- Hard to scale
AppDynamics Browser RUM
AppDynamics’s RUM tool monitors users’ customer journey and experience across the globe through a single pane of glass. Understands the regional variability on your website or single page application (SPA) experience. Resolves web performance bottlenecks
- Single page application (SPA) experience
- Customer journey and experience
- Browser snapshot waterfalls
- Dynamic baselining
- User Sessions Tracking
- Pricing available on request here
- Quickly resolve web performance
- Accurate browser-user insights
- Missing tones of UX insights
- No end-to-end issue diagnosis
- Another pricey tool
How to Choose a Real User Monitoring Tool
Some points to take into account before selecting a RUM:
Focus on the actual benefits. Features are obviously important, but more importantly, select the tool that will help your organization solve mission-critical issues, at a fraction of the cost. The most basic RUM tool should be able to:
- Real User Session Record/Replay
- End-to-end User Issue diagnosis (Behavior insights, Technology insights @code, sql… levels)
- Browser-level synthetic click
- Smart & Flexible Automation
- Intelligent Alerting
- Easy-to-create your own Dashboard and Reporting
- No-technical skills needed to Find Insights
- Collaboration features
Ease of Use
Focus on how quickly it takes to solve your critical business problems, don’t get impressed with tools that quickly show data on a pretty dashboard, focus on the ones that effectively help you detect and solve your mission-critical problems. Also critical to select the ones that follow a consistent approach when building their UI, making easier to use
Exceptional & Free-of-charge Support
Choose the organizations that are willing to help you, at no charge, grow your RUM within your organization. You will find it very hard to find them…most charge for this. And that support needs to be effective, provided by devops, ux or bizops experts
Forget about all these feature-driven pricing models and opt of the ones that focus on benefit/volume. And when you look at volume, is it in the thousands, hundred of thousands, millions…of transactions. and what type of transactions are they talking about. And finally select the RUM that provide a clear roadmap, a product roadmap, that is inline with your mission
What RUM Tool Will You Use?
Great user experience on a web application is just one of piece of your business puzzle, which is what RUM solutions deliver. if you are looking at growing our organization with a platform, you need more than a RUM, you need an all-in-one APM-RUM-BPM-Automation platform. A platform that is easy to use, can scale. A platform that can track your Digital Experience and is able to correlate customer events across technologies, application, etc
Check out germainAPM,an all-in-one APM-RUM-BPM-Automation platform that will help your organization (and your career) scale! ????
Result of an internal pilot of Apache SolR and ElasticSearch, with regards to Indexing and Searching. We performed that pilot 1.5 years ago..but several asked for this so sharing it on this blog. hope this helps..feel free to reach out to us if you need more details.
- Index Speed based on 100 million rows is 5.5x slower with ElasticSearch (22min with OOTB, 8min with simple optimization) than with Apache Solr (4min with OOTB).
- Apache SolR does not require additional tool like ElasticSearch does (e.g. Logstash).
- Both offer simple Query API.
- ElasticSearch has a built-in scheduler for updates, not Apache SolR.
- Both return the entire document as search result.
- Full-Text Search Features (misspelling, synonyms, ..) are significantly more advanced with Apache Solr.
- When it comes to application, ElasticSearch offers Analytical Querying, Filtering, and Grouping, ApacheSolr offers Text Search.
- ElasticSearch offers nested document support.
- Apache Solr is easier to maintain.
- When it comes to full Search Text, Apache Sols is slightly better.
Creating The Ultimate UX Isn’t Magic: It’s Real-Time Metrics and Automation (and, which we don’t cover here, but equally if not more important, an organization that effectively executes to leverage these metrics and complement these automation)
What’s the magic formula for attracting customers, keeping them attracted, convincing them to recommend your company to their peers, and converting it all into a steady stream of sales and revenue to boost your bottom line?
Turns out the answer isn’t magic: it’s real-time metrics and automation.
germainAPM is a monitoring and metrics master, equipping Fortune 500 companies and start-ups alike with a proprietary software solution – real-time monitoring and user experience tool – they need to maximize customer satisfaction and deliver the ultimate user experience (UX).
germainAPM’s UX monitoring, analytics, and automatic software tool enable companies to either avoid user issues via a series of automation features or drill down into their data to deliver smarter, faster, more seamless service to their customers, and if something goes wrong to immediately locate the root cause of the problem, why it happened, and how they can prevent it from happening again.
What separates germainAPM UX is the Web-based Real User Recording and Replay and End-to-End Root-Cause Analysis feature.
These real-time Replay sessions and Bottleneck identification features – rather than traditional User Interface screenshots – empower companies to (1) actively monitor every customer’s Web-site mouse click from end to end – in real time, as if they are sitting next to them – (2) reconstruct that customer journey to identify any potential problem points along the way, and (3) deliver a set of targeted solutions within minutes that companies can immediately implement to maximize customer satisfaction and deliver smoother UX navigation (4) identify the technology root-cause of any customer issues and offer insights and solutions, and (5) use our proprietary UX behavior flow graph to identify the most-frequent UX frictions most customers experience.
There’s no need to call the end-user (employee, partner, customer) to understand what a customer experienced while browsing your eCommerce, CRM or ERP business application. You can instead visualize the exact scenario of that real user and identify the root of any technology or process issue.
Clients are not only utilizing germainAPM’s platform of UX applications as a competitive weapon to increase customer satisfaction and build brand loyalty, they are also exploiting them as a critical tool to boost their bottom line.
germainAPM’s UX monitoring, analytics, and automation software tool deliver measurable value: enabling companies to identify and eliminate internal inefficiencies, reduce redundancies, and streamline operations, as well as increase customer response times, reduce customer churn, and increase their sales conversion rates.
germainAPM’s UX tools are more than an off-the-shelf, cookie-cutter software package you rent and plug in. Every germainAPM platform is backed with experienced engineers who will configure a specific software solution tailored to your company’s specific customer needs – for FREE.
And the beauty is that germainAPM is an all-in-one APM/BPM/RUM/Automation software platform, so beyond just UX metrics, you can consolidate all your monitoring, analytics and automation needs on one germainAPM platform.
Click here for a closer look at the proprietary, targeted UX software solutions germainAPM can deliver for your business to help drive your bottom line. There’s a reason we’ve had a 100-percent customer retention rate since 2014.
ps: Best-practices on building UI, for optimal UX, will be discussed in a separate post
Mobile devices have long replaced desktops, and user issues have simply moved over to these mobile devices, and a phone or system upgrade won’t just always be the solution to these user frustrations. germainAPM can significantly help reduce these user issues by proactively monitoring performance and user experience of these users on their mobile phones, tablets and apps that run on those.
Metrics (@ the mobile DEVICE level):
- Application usage
- Battery State
- CPU usage
- Disk usage internal & external (sd card, …)
- IP address
- Memory usage
- Network Bandwidth
- Network State (wifi, 3g, 4g, ….)
- Process (including app name, pid, memory usage, state..)
- Service usage (including service crash count, pid)
- Device information (manufacturer, brand, osVersion, …)
- Integrated “device id” which can be used to correlate data for a given device.
UUID for a device (this becomes critical for device-server data correlation, when no sessionId for instance)
Metrics (@ the Mobile NATIVE APP level):
- User Clicks
- Mouse Moves
- User Errors
Metrics (@ the Mobile HTML APP level):
- User Clicks
- Mouse Moves
- User Errors
- User Session Replays
- For germainAPM cloud users:
- For germainAPM on-premise users: download the JS script here
- and ask for our help in setting this up, we don’t charge for our help, just schedule a work call here
Any metric that Google Chrome provides, including:
- Stall duration
- Download duration
- Wait duration
and some insights on whether this is stall, connection, wait, download ,etc
- Any html
Download (our chrome extension)
Configure (our chrome extension)
Try It (15-day free trial on our cloud):
cross-browser, no extensions or other installation, no interruption of end-user’s experience
The profiling functionality offered in most web browsers can be used to understand where the largest performance bottlenecks are, but they have to be triggered by the end user and therefore that data will only be available to those with access to the specific machine some time after it becomes known that there is an issue to investigate. From this perspective, the goal is to be able to display a breakdown of time spent in script, rendering/painting, and waiting on network requests, similar to what the Chrome profiler provides, but using only data that a Web Application can log for itself by using an extra bit of script.
A screenshot of the donut chart in Chrome’s profiler
There are a wide variety of catalysts that can trigger work within the browser. The Chrome browser refers to these as “activities”. The following table from the profiler includes the vast majority of activity types.
In this article we focus on only the activities that cause the web application’s own script to run. We categorize them into the following five groups based on the different challenges they pose with respect to instrumentation:
1. Initial evaluation: Evaluate Script
2. Time-driven callbacks: Timer Fired, Fire Idle Callback, Animation Frame Fired
3. Event-driven callbacks: Event, Run Microtasks, XHR Ready State Change, XHR Load
4. Incidental: Major GC, Minor GC, DOM GC, Parse HTML, Parse Stylesheet, Recalculate Style, Layout, Hit Test
5. Display: Update Layer Tree, Paint, Composite Layers, Image Decode
Only categories 2 and 3 can be directly instrumented. Incidental activities (4) can get triggered at semi-unpredictable times as a side-effect of other script, and cause that other script to take longer to complete. Initial evaluation (1) and display (5) activities can only be measured indirectly using some tricky heuristics. The heuristics for measuring the time spent in display activities, and purple and green categories in general, are outside the scope of this article.
Any root-level occurrences of Function Call are a bit of an outlier, they typically only occur outside of the context of any other activity when the browser is calling into an extension the end-user may have installed. Browser-extension callbacks are also not able to be instrumented by script.
For the activity types that are callback-based, each has a registration function to setup a callback into some custom script at a future time or event – e.g. addEventListener, setTimeout, requestIdleCallback, etc. These registration functions can be replaced with instrumentation code, which adds timestamps to each future call to the callback. This instrumentation is only taking place at the base of the callstacks, so the overhead, per callstack, is a negligible constant factor.
Many details have to be taken into consideration in the instrumentation, such as:
● letting unhandled exceptions through without interrupting time measurements
● ensuring that calls to removeEventListener get passed the identical function that was passed to addEventListener
● in order to not break some client code, properties of the registration function may need to be mimicked in the instrumentation version, such as name, length, prototype, constructor, toString value, etc.
● using the ‘new’ operator when the callback is an object constructor and cannot be called as a function
● setTimeout and setInterval can be called with a string (of code) instead of a function
● event callbacks can also be registered through assignments to .onclick, .onload, etc.
● event callbacks can also be registered through Element attributes “onclick”, “onload”, etc. whether procedurally or as HTML
Heuristics for Initial Evaluation
The content of script tags is evaluated as part of the process of being added to the DOM. There are a few variants of that, the most fundamental being the parsing of HTML. DOM mutation events are fired for each Element parsed, but script content is only executed at that time if not marked as ‘async’ or ‘defer’. Script Elements may also by added programmatically as part of the evaluation of other script, or later during callbacks. The text content or ‘src’ attribute of a script Element may also change procedurally, which can cause additional script evaluation in certain cases. Each of these variations and their combinations require their own heuristics, as there is no direct way to instrument timings for the initial evaluation of scripts.
Our application performance monitoring product, germain APM, includes an implementation of measuring script, render/paint, and network time as part of its UX monitoring suite. For any web application, it tracks the total time spent in each of those categories for each meaningful interval of activity between the application and its web services, such as a page navigation or batch of asynchronous transactions. The presentation is designed to closely resemble the donut chart in Chrome though the categories are slightly different.
A screenshot from germain APM
In summary, we can report, with the above-stated caveats:
● the total or proportion of time spent executing script over some interval
● the start and end times of each execution interval or “activity”
● total script time spent per script file
● periods during which the main thread was hung/non-responsive
1. Performance API to collect resource URLs… (rather than scanning and tracking changes to the DOM for them)
2. DOM only when it makes sense
3. Assess/Reduce the impact (response, memory and network traffic). Here is an idea of how much data is stored within germainAPM database while monitoring an eCommerce site that has 1000 concurrent users and depending on how heavily these users browse the eCommerce site:
4. Asynchronous Transaction as much as possible
5. Away from the Main Thread whenever possible
6. And focus on providing metrics that will help your developers understand and fix issues, such as:
Breakdown the user click response to understand how much time is spent on the browser, network, infrastructure and application.
i.e. collect in real-time: User Click, Move, Page response time are broken down and includes the following:
Network Time/Server Time (code analysis)/Database time(sql, etc)
Rendering & Painting time
Waiting on User input
Then deep dive within the transaction, at the browser, network, application code, integration and database sql (…) levels
at the java/jvm process and code level:
or within the .net/clr process,…:
or within your heavily customized CRM or ERP applications and the transactions that got executed:
Are you ready to hit the ground running with your Application Performance Management resolutions? We certainly are. To better prepare, we asked our experts what they saw in the field last year, and decided to bring you the worst of it. We sincerely hope none of these will happen to you. Unfortunately they probably will…
Our APM predictions
7 bad things that will happen to your mission-critical business applications this year
1. Teams will point fingers at each other for the root cause of a major performance issue or unavailability
You know those meetings called by your VP to discuss the major slowdowns or disconnects that got the front-office team screaming this morning? The ones where the different teams (application, network, infrastructure, DB, etc.) are all looking defensive and blaming each others’ systems? Well, you will still have these. Unless, of course, you implement an end-to-end APM solution that can tell you in what layer – application or infrastructure – the problems were before you are even invited to the meeting.
2. You will learn about a major problem from the business community
You don’t like surprises. And there is nothing worse than hearing from the VP of Sales or the Director of the Call Center that several offices cannot connect or are having major performance issues with one of your applications. Instead of being able to calmly respond that you know about the problem and are working on a solution, you will be the proverbial deer in the headlights and be scrambling to find a solution. Unless, of course, you implemented an APM solution that pro-actively monitors the application and the underlying infrastructure layers so you stay ahead of the users.
3. You will never know the root cause of several severe production issues
You will have a severe issue with one of your applications. Teams will scramble to get the system stable and performing. Components will be restarted, servers will be rebooted, memory will be increased, maybe even a recent patch will be rolled back. Bottom line, the system will be stable for a while, and the symptoms will have disappeared. You will be nervous though, because you will have no idea what the real root cause was, and you will never know: you won’t have the time and you might not even have the data to reconstruct what happened and get to the root cause of the problem. You know you won’t sleep well that night. Unless, of course, you implement an APM solution that helps you get to the root cause of problems very fast, and lets you roll back the clock to understand what happened.
4. A release will introduce a major issue that will have you fumbling for answers in production
It’s human. Developers make mistake in designing, coding and customizing your application. These mistakes are mostly caught through the different phases of QA, but problems do fall through the cracks. Sometimes this year, you will have to deal with a bad customization or code getting into production and causing crashes, memory leaks or slow performance. You and your colleagues will drop everything to find the source of the problem and patch it, or, even worse, roll back the release. Sure there is no magic wand, but you might have prevented most of these by having a customization analysis tool perform a thorough review of the customizations, perform regression analysis, and flag and prioritize potential issues, and having this review run as early as initial customization or as late as final release.
5. You will not know for sure if a new feature is being used, by whom and how
You don’t know what all of your users are doing with your application, so it can be embarrassing when a business manager or a CIO asks how a new feature is being received by the user community. Sure you have anecdotal data: Julie in accounting will say great things about the new screen. You might even have a survey result to share with management. But beyond that, there will be no hard data to confidently discuss usage of the new feature, how, by whom, and how that’s impacting their experience using the application. Unless you have an APM solution that automatically tracks and analyzes user clicks, transactions, business processes and lets you mine the end-user experience to provide definitive answers.
6. You will miss an important date or kid’s game on a Friday evening because of an issue blowing up
How come nobody will see the signs of trouble ahead? Performance will start to degrade, a couple of application components will run out of memory. The team will restart the servers and things will be fine for a while, but after a couple of days the problem will start blowing up. Restarting the application components will prevent the problem for only a few hours. It will be Friday afternoon; you will have no choice other than calling an all-hands team meeting to investigate and work into the night to find the cause of the leak. And yes, you will miss that date or that game that you had promised to your family. Unless, of course, you have an APM solution that provides early warnings when performance degrades and servers run out of memory, and then helps you troubleshoot what caused the mory leaks.
7. Good old “restart” process won’t help
There will be a problem. The Call Center won’t be able to process orders; hitting submit just causes the screen to hang. You know the process behind that button is complex with several systems involved. After a quick look at the network and infrastructure, you see that all systems are up. Still, you decide to reset a couple of critical network routers, and then proceed to restart the different components in your integrated Call Center, Order Management, Order Procurement and Financials applications. After this is done – ouch, the users still have the same problem. You start sweating, realizing that something is seriously wrong behind one of the most critical business processes of your company: taking orders. After 24 hours of scrambling in the war room to find the problem, you will trace it back to an API change in one system that was missed by the pre-release process and caused an integration to fail. It will be resolved in 5 minutes, then tested and brought into production within hours. Now you will be able to tell the Front Office team to get back to these 1,456 customers who placed an order and tell them that you are now able to process them…. Unless, of course, you have an APM solution that can immediately tell you where the business process failed, what API was called that threw the error, what the error was, and what layers of code were involved.
Mean Time to Identify (MTTI) refers to the time it takes to detect an incident (incident that affect your users, business processes and/or technologies performance) and identify its root-cause.
Mean Time to Resolve (MTTR) is the average time between the start and resolution of that incident.
Business needs are evolving significantly more quickly than before, causing more issues with your applications. As a result, Dev, UX and Ops teams are spending more time troubleshooting. Bouncing the application, or just analyzing the database stats and application logs isn’t just enough any longer.
How to Improve DevOps MTTR
By monitoring deployments in near-real time and 24×7, and applying intelligence to the collected data, you can drastically improve MTTI from days to minutes. Germain APM delivers a comprehensive strategy for monitoring user experience, hardware and software performance and events, in real time. Germain APM automatically correlates events to provide better alerting and root-cause analysis. Germain APM also applies predictive models to the data it collects to foresee upcoming issues. All these capabilities are there in germain APM, to enable DevOps organizations proactively identify frictions, at the user, process and/or technology levels.
Once you’ve identified an issue, germain APM helps you quickly troubleshoot issues, perform root-cause analysis. Then comes germain CRT into the mix and help a developer understand what part of the code/object to fix and how, which dramatically decreases MTTR.
Leveraging Expertise and Automation for Better MTTR
Existing approaches for application monitoring and application performance management are no longer sufficient to provide the complete view into the volume, variety, and velocity of data being generated across the full stack, from bare metal to microservices.
Machine learning are great to identify patterns, but don’t add up when it comes to fixing performance issues.
Using the germain APM and germain CRT tools, you can get down to actual root-cause of an issue and know what to do to fix it, down to the code, object or sql level, saving tremendous amount of times.
And in addition, with built-in pattern detection, anomaly detection, transaction analytics, and predictive analytics, germain APM provides real-time visibility across thousands of data streams and seamlessly detects and predicts conditions that indicate potential performance, reliability or security issues.