✨ Capture HTTP bodies on Undertow #321

ryandens · 2021-06-10T14:33:14Z

✨ Capture request bodies on Undertow

Background

Undertow is a web server written in Java
It's leveraged by/integrated with several large projects including Wildfly and RESTEasy.
Projects that build on top of Undertow often leverage its Java EE Servlet API Implementation
Its implementation of the Servlet API does not always leverage the Servlet API javax.servlet.http.HttpServletRequest.getInputStream()

Overview

✨ Adds Instrumentation to associate org.xnio.channels.StreamSourceChannel instances with a SpanAndBuffer instance to give the proper context
✨ Adds instrumentation to copy data from a ByteBuffer after it is read from an org.xnio.channels.StreamSourceChannel into a ByteBuffer and associate that data with the request body Span attribute.

Testing

✅ Adds a test case that reproduces a situation where our existing implementation previously did not capture request body on an Undertow deployment that we would otherwise expect to work on
Manual tests on a Wildfly servlet container with a RESTEasy JAX-RS deployment

Checklist:

My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
Any dependent changes have been merged and published in downstream modules

Documentation

No new documentation is required, as we broadly claim Java Servlet support of which this undertow use-case is an implementation. This just happens to be a corner case.

pavolloffay

good progress 💯

A couple of things to look at:

avoid using final for local variables in short methods.
the body capture should for content type and capture only specific types - see the spec in the HT org. Also check for content length to preallocate size of the buffer and charset.
the instrumentation code should use HT config to check if capture is enabled.

...y/javaagent/instrumentation/hypertrace/undertow/v1_4/StreamSourceChannelInstrumentation.java

pavolloffay · 2021-06-11T07:01:34Z

...y/javaagent/instrumentation/hypertrace/undertow/v1_4/StreamSourceChannelInstrumentation.java

+      HypertraceCallDepthThreadLocalMap.reset(StreamSourceChannel.class);
+      final ContextStore<StreamSourceChannel, SpanAndObjectPair> contextStore =
+          InstrumentationContext.get(StreamSourceChannel.class, SpanAndObjectPair.class);
+      final SpanAndObjectPair spanAndObjectPair = contextStore.get(streamSourceChannel);


nit: remove final it does not help here and it just adds noise.

I disagree here - I find that defaulting local variables to final discourages mutating references to objects and results in code that is easier to read and maintain.

final is useful in large methods where there is a higher probability of making a mistake by accidentally reusing the variable. The final should be used to improve the readability of large code blocks or as a "documentation" of the design.
In this PR the final keyword is used for all local variables in very short methods (e.g. 5-10 LOC), which does not make sense to me.

The objective is to write as little code as possible because that is the easiest to read, maintain.... Look at Java std lib/JDK implementation or read the effective java book.

I'm a big fan of effective java! I don't recall an item about final and/or non-final local variables, but I personally find that flipping the switch in the IDE and defaulting to final local variables makes the code easier to maintain and build upon over time. The cost of making variables final is low (because it can be done automatically) and IMO it helps make the code a little more readable because its clear at variable declaration time when a variable is reassigned, rather than relying on the reader to figure out whether a variable is effectively final (because it's never reassigned) or if it's actually mutated. This scales well over time as a function organically grows in length over time and is modified by multiple authors, who may or may not have context on what the previous person wrote. I guess I just disagree that the least amount of code possible is the easiest to read and maintain. A good counterexample to that point, in my opinion, is how books like effective java encourage the use of inheritance over composition. In general, using composition rather than inheritance results in more code (as measured by LOC) but most folks agree that composition is a better default method of sharing code because it is easier to understand what is happening when read in the future. That's not to say that there isn't a place for inheritance (or non-final local variables), just that it's part of what influences what I tend to do by default.

That being said, in the context of this PR it isn't terribly important, so if you feel strongly about this I'd be happy to table the discussion for another time and remove the final keyword from local variables in this PR. We can address it at a time when we can also add in automation to enforce whatever decision we come to.

...gent/instrumentation/hypertrace/undertow/v1_4/UndertowHttpServerExchangeInstrumentation.java

...in/java/io/opentelemetry/javaagent/instrumentation/hypertrace/undertow/v1_4/utils/Utils.java

pavolloffay

Undertow support should be added to the readme.

Should this instrumentation capture headers as well? This is the first servlet container instrumentation in this project. Could the payloads be captured twice? E.g. first in undertow and then in the servlet instrumentation?

.github/workflows/build.yaml

...y/javaagent/instrumentation/hypertrace/undertow/v1_4/StreamSourceChannelInstrumentation.java

...gent/instrumentation/hypertrace/undertow/v1_4/UndertowHttpServerExchangeInstrumentation.java

...emetry/javaagent/instrumentation/hypertrace/undertow/v1_4/UndertowInstrumentationModule.java

pavolloffay · 2021-06-16T06:49:18Z

...in/java/io/opentelemetry/javaagent/instrumentation/hypertrace/undertow/v1_4/utils/Utils.java

+    final Charset charset = Charset.forName(httpServerExchange.getRequestCharset());
+    final BoundedByteArrayOutputStream boundedByteArrayOutputStream =
+        BoundedBuffersFactory.createStream(
+            (int) httpServerExchange.getRequestContentLength(), charset);


what size is used if the content length is 0 e.g. for chunked transfer

The BoundedBuffersFactory has some logic to help us out with that. In the case of stream being created in this way, we'll simply use the default ByteArrayOutputStream constructor which uses the default initial size of 32. Regardless, the ByteArrayOutputStream will grow to fit the size of its content as we write to it (until restricted by the BoundedByteArrayOutputStream). I'm sure we can improve on this, but I'm trying to keep this PR scoped to the bug with the customer as I don't have great data on how/when we can improve this right now

...elemetry/javaagent/instrumentation/hypertrace/undertow/v1_4/UndertowInstrumentationTest.java

ryandens · 2021-06-22T19:25:27Z

Undertow support should be added to the readme.

I'm avoiding adding undertow support to the readme because we specifically only will only fully work on Undertow servlets. Due to the implementation details of APIs like io.undertow.servlet.spec.HttpServletRequestImpl.getParameterMap(), we know that the request body won't always be accessed via the standard servlet body access APIs getReader and getInputStream() so we add this as a way to support servlets running on Undertow, but it does not mean that we support running on a raw Undertow HTTP server ( we don't).

Should this instrumentation capture headers as well?

Theoretically, we could add that over time, but the goal of this PR is to solve the request bodies missing when running undertow with Servlets, one of our supported use cases. Right now, adding full-out undertow support isn't our top priority

This is the first servlet container instrumentation in this project. Could the payloads be captured twice? E.g. first in Undertow and then in the servlet instrumentation?

I'm really glad you asked this! This was indeed happening when you brought this up. The solution I came up with was to add some lightweight instrumentation to undertow-servlet to mark and store in the instrumentation context whether or not an HttpServerExchange should be instrumented with undertow specific body-capturing.

pavolloffay · 2021-06-23T08:06:05Z

...t-core/src/main/java/org/hypertrace/agent/core/instrumentation/RequestBodyCaptureMethod.java

+ * #APP_SERVER} instrumentation can short-circuit when it is know that that the preferred {@link
+ * #SERVLET} instrumentation is sufficient to capture the request body
+ */
+public enum RequestBodyCaptureMethod {


We literally do not need this type in the javaagent-core. javaagent-core should contain only types that are used by multiple instrumentations. Which does not seem to be the case here. This type can be moved to undertow module. Also only SERVLET value is used in the codebase.

I think I know your intentions with this type and overall approach, however it won't work as a generic solution across instrumentations. We should design an API that will tell you if payload or headers for a specific request were captured and it should work for both client and server instrumentations (clients have the same problem e.g. jax-rs client implementation is often times backed by apache HTTP client...). The approach in this PR won't work bc it uses instrumentation context with specific types 3rd party library types e.g. io.undertow.server.HttpServerExchange.

You might want to look at https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/3e28b01e42311d755db17d698966d8f3bfaea1e3/instrumentation-api/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/Instrumenter.java#L88 to see how to design this using the APIs that are available for all instrumentations.

Definitely agreed, I'll move this type into a common library project shared by both undertow instrumentation projects. Over time, when we have more use-cases like this one, we can design a more general solution. This was definitely premature 🙂

pavolloffay · 2021-06-23T08:11:23Z

...in/java/io/opentelemetry/javaagent/instrumentation/hypertrace/undertow/v1_4/utils/Utils.java

+    final BoundedByteArrayOutputStream boundedByteArrayOutputStream = spanAndBuffer.byteArrayBuffer;
+    for (int i = 0; i < numBytesRead; i++) {
+      final byte b = readOnlyBuffer.get();
+      boundedByteArrayOutputStream.write(b);


This does not seem efficient, ByteBuffer should have an API to return the allocated array.

ByteBuffer.array() does exist, but it's not guaranteed to work (depending on the kind of ByteBuffer being used. As far as I know, it does not work on DirectByteBuffers, only on heap-based buffers. In the case of undertow, the ByteBuffer subclass being used in the test case is a DirectByteBuffer so ByteBuffer.array() will throw an UnsupportedOperationException.

My first pass at this, after discovering that ByteBuffer.array() does not suit our needs, was to do something like:

byte[] dst = new byte[numBytesRead]; readOnlyBuffer.get(dst); boundedByteArrayOutputStream.write(dst);

However, after doing this, I looked at the implementation of ByteBuffer.get(byte[], int, int). It looks like this:

public ByteBuffer get(byte[] dst, int offset, int length) { checkBounds(offset, length, dst.length); if (length > remaining()) throw new BufferUnderflowException(); int end = offset + length; for (int i = offset; i < end; i++) dst[i] = get(); return this; }

Note ByteBuffer.get() is called n times, as it is in this method as well. So, by calling ByteBuffer.get() directly here, it is invoked the same number of times as ByteBuffer.get(byte[]), except that we get to avoid allocating a duplicate byte[] just to have it eligible for GC at the end of this utility method. This could put additional GC pressure on the application and result in degraded performance. I'm definitely open to other options of copying data from the ByteBuffer! But as far as I know, this is our best option today.

ryandens · 2021-06-24T16:11:58Z

Hey @pavolloffay, thanks for taking the time to review these changes! Do you have any further comments/concerns about this PR?

davexroth

Approving this PR as is, as it's time sensitive.

Any additional changes regarding code style etc can be picked up in a later PR.

pavolloffay

Why there are 3 gradle modules? Any real justification?

There is:

common
undertow
undertow-servlet

Does the undertow instrumentation work without servlet? I guess the servlet depends on the core so if the core is not reused I would merge them together.

pavolloffay · 2021-06-25T16:30:15Z

...telemetry/javaagent/instrumentation/hypertrace/undertow/common/RequestBodyCaptureMethod.java

+   * Signals that request body should be captured relying on the instrumentation of APIs defined
+   * inside the app server's implementation that do not relate to {@code javax.servlet} APIs
+   */
+  APP_SERVER


I believe this is not being used anywhere

pavolloffay · 2021-06-25T16:36:42Z

instrumentation/undertow/undertow-1.4/build.gradle.kts

+
+dependencies {
+    implementation("io.opentelemetry.javaagent.instrumentation:opentelemetry-javaagent-undertow-1.4:${versions["opentelemetry_java_agent"]}")
+    library("io.undertow:undertow-core:2.0.0.Final")


I think this should be 1.4.0. It is what muzzle check is configred against. The idea is to use the lowest version possible.

Works for me 👍 FWIW i copied this part from the OTEL undertow module

ryandens · 2021-06-25T17:31:53Z

Why there are 3 gradle modules? Any real justification?

There is:
* common

* undertow

* undertow-servlet
Does the undertow instrumentation work without servlet? I guess the servlet depends on the core so if the core is not reused I would merge them together.

Yes, the common module is depend on by both the undertow module and the undertow-servlet module. The undertow module works without the servlet module because we can't count on undertow-servlet always being used. The undertow-servlet insrumentation module serves to add instrumentation to undertow-servlet APIs in a way that can be detected by our undertow-core instrumentation. This allows us to intelligently avoid capturing the reqeust body twice depeneidng on on what APIs are used to access it/in what context.

✨ add noop instrumentation module for Undertow 🐛 dont assert inverse for now Revert "♻️ move more code from the servlet into utils to make it debuggable" This reverts commit 765c37a. :white_check_mark: add failing test for bug :heavy_plus_sign: add undertwo as implementaiton dependency :construction: put SpanAndObjectPair in Instrumentationcontext where the context key is the channel returned by getRequestChannel and the object associated with the channel is the HttpServerExchange :sparkles: instrument StreamSourceChannel#read to add the request body to the span when the channel is read :heavy_minus_sign: move servlet API from library configuration to just testImplementation as it is not need on the main compile classpath :construction: WIP avoid String.trim() of request bodies by only writing the number of bytes that were read by StreamSourceChannel.read() :recycle: change type of SpanAndBuffer to be a BoundedByteArrayOutputStream :recycle: refactor UndertowHttpServerExchangeInstrumentation to store SpanAndBuffer instead of SpanAndObjectPair :recycle: refactor StreamSourceChannelInstrumentation to use SpanAndBuffer object and write to bounded outputstream without allocating an extra byte[] :fire: remove unused utils :bug: update InputStreamInstrumentationModuleTest to use Bounded OuptutStream :bug: fix code for muzzle :ok_hand: move logger to top level, remove unneeded log statements :ok_hand: update class names to start with uppercase :rocket: capture JAR as artifact :ok_hand: only capture request body if configured to do so :ok_hand: properly handle Throwable for callDepth :white_check_mark: assertInverse :art: effectively final local variable :art: formatting of comment :bulb: add some javadoc :recycle: specify Advice classes by reference rather than with Strings :bug: handle -1 content length

…let-based

…as detected

…m was used

…value

…mentation projects to it

pavolloffay reviewed Jun 11, 2021

View reviewed changes

ryandens force-pushed the undertow-http-bodies branch from 56b92a9 to 71ec0bd Compare June 14, 2021 21:37

ryandens changed the title ~~🐛 Capture HTTP bodies on Undertow when not accessed via the Servlet API (Draft, work in progress)~~ ✨ Capture HTTP bodies on Undertow Jun 15, 2021

ryandens marked this pull request as ready for review June 15, 2021 05:09

pavolloffay reviewed Jun 16, 2021

View reviewed changes

ryandens force-pushed the undertow-http-bodies branch 2 times, most recently from 5bd5a0e to f7ae59c Compare June 22, 2021 17:46

ryandens requested review from davexroth, pavolloffay and shashank11p June 22, 2021 21:16

pavolloffay reviewed Jun 23, 2021

View reviewed changes

ryandens requested a review from pavolloffay June 23, 2021 14:57

pavolloffay mentioned this pull request Jun 25, 2021

Ignore appdynamics/cisco multi tenant agent module #326

Merged

3 tasks

davexroth approved these changes Jun 25, 2021

View reviewed changes

pavolloffay reviewed Jun 25, 2021

View reviewed changes

ryandens added 12 commits June 25, 2021 14:21

🐛 fix config

c5dbe2e

🚧 add enum for request body capture strategies to javaagent-core

c28d7b2

✨ add instrumentation to undertow-servlet to mark an exchange as serv…

7e6737a

…let-based

✅ add test to demonstrate optimization is working

d639677

♻️ move final captruing of request body to end of exchange

d333059

⚡ short-circuit undertow instrumentation if servlet instrumentation w…

4767e39

…as detected

⚗️ mark spans with attribute when servlet captured request body

8f8b8d6

✅ test undertow servlet instrumentation optimization

ef28876

🔥 remove span attributes for determining what body detection mechanis…

50bf9b7

…m was used

🔥 remove undertow-core from undertow-servlet muzzle

2bb004d

🐛 inline static method as it causes muzzle issues and doesnt provide …

096435c

…value

ryandens added 15 commits June 25, 2021 14:21

✅ expand muzzle pass range

7b5777b

🔥 delete uplaod JAR task

a3a0beb

👌 add more names to UndertowInstrumentationModule

2f206e8

👌 add more names to UndertowServletInstrumentationModule

31e6155

👌 use singleton map in StreamSourceChannelInstrumentation

333b44f

👌 use singleton map in UndertowHttpServerExchangeInstrumentation

8c9e40b

👌 update test method name to match project convention

8574e5d

✅ add test for GET application/json

e91a9c6

✅ get text/html

736f736

👌 handle content length of 0;

b8744c6

👌 specify advice class name in a string

a612855

🚚 make instrumentation:undertow project and move both undertow instru…

a020e6f

…mentation projects to it

♻️ move RequestBodyCaptureMethod to undertow-specific library

cd6afcb

⬇️ downgrade library dependency on undertow

ad7a0e2

🔥 remove the APP_SERVER enum value as it is unused

61d9630

ryandens force-pushed the undertow-http-bodies branch from 739e435 to 61d9630 Compare June 25, 2021 18:22

ryandens merged commit a2333cf into main Jun 25, 2021

ryandens deleted the undertow-http-bodies branch June 25, 2021 18:54

✨ Capture HTTP bodies on Undertow #321

✨ Capture HTTP bodies on Undertow #321

Uh oh!

Conversation

ryandens commented Jun 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Capture request bodies on Undertow

Background

Overview

Testing

Checklist:

Documentation

Uh oh!

pavolloffay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pavolloffay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ryandens commented Jun 22, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryandens commented Jun 24, 2021

Uh oh!

davexroth left a comment

Choose a reason for hiding this comment

Uh oh!

pavolloffay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryandens commented Jun 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

ryandens commented Jun 10, 2021 •

edited

Loading