Writing this blog post has been really painful. It’s been three months since I last published my introduction to the semantic model and I’ve been putting off this post for as long as I could. I started a new series called Learn Roslyn Now Quick Tips, I helped build Source Browser, and I even submitted a small pull request to clean up the analysis APIs. Basically, I’ve done everything but learn and write about these APIs.
- I’ve struggled to imagine how one would use them in an analyzer or extension.
- They’re weird, unintuitive and they frighten me.
I put out a tweet asking how others were using them, and it appears they’re only really used within Microsoft to implement the “Extract Method” functionality. A handful of questions on Stack Overflow have mentioned these APIs, so I’m sure someone out there is putting them to good use.
Data Flow Analysis
This API can be used to inspect how variables are read and written within a given block of code. Perhaps you’d like to make a Visual Studio extension that captures and logs all assignments to a certain variable. You could use the data flow analysis API to find the statements, and a rewriter to log them.
To demonstrate the capabilities of this API, we’ll be looking at a modified piece of code posted on Stack Overflow. I’ve cleaned it up slightly, but it shows a number of interesting behaviors consumers of this API should be aware of.
We can analyze the for-loop in the following code:
At this point we’ve got access to a
Perhaps the most important property on this object is
Succeeded. This tells you if the data flow analysis completed successfully. In my experience the API has been pretty good at dealing with semantically invalid code. Neither invocations to missing methods nor use of undeclared variables seemed to trip it up. The documentation notes that if the analyzed region does not span a single expression or statement then analysis is likely to fail.
DataFlowAnalysis object exposes a pretty rich API for uses to consume. It exposes information about unsafe addresses, local variables captured by anonymous methods and much more.
In our case, we’re interested in the following properties:
DataFlowAnalysis.AlwaysAssigned– The set of local variables for which a value is always assigned inside a region.
DataFlowAnalysis.ReadInside– The set of local variables that are read inside a region.
DataFlowAnalysis.WrittenOutside– The set of local variables that are written outside a region.
DataFlowAnalysis.WrittenInside– The set of local variables that are written inside a region.
DataFlowAnalysis.VariablesDeclared– The set of local variables that are declared within a region. Note the region must be bounded by a method’s body or a field’s initializer, so parameter symbols are never included in the result.
To refresh, the code on which we’ve analyzed is displayed below. The region we’ve declared interest in is the for-loop.
The results from analysis are as follows:
index is always assigned to as it is contained within the initializer of the for-loop, which runs unconditionally.
innerArray are clearly written within the loop.
One important point is that
outerArray is not. While we’re mutating the array, we’re not mutating the reference contained within the
outerArray variable. Therefore it does not show up in this list.
outerArray is clearly written to outside of the for-loop.
However, it surprised me that
this showed up as a parameter symbol within the WrittenOutside list. It appears as though
this is passed as a parameter to the class and its member, which means that it shows up here as well. This appears to be by design, although I suspect most consumers of this API will be surprised, and likely ignore this value.
It is clear that the value of
index is read within the loop.
It was surprising to me that
outerArray is considered to be “read” inside the loop as we’re not reading its value directly. I suppose that technically we must first read the value of
outerArray in order to calculate the offset and retrieve the correct address for the given element of the array. So we’re performing a sort of “implicit read” inside the loop here.
This is fairly straightforward.
index is declared within the loop initializer and
innerArray within the body of the for-loop.
The general weirdness of the data flow analysis API has long kept me from writing about it. The issues with
this and what’s considered a read vs. a write is pretty offputting to me. I suspect these kinds of issues will prevent a lot of people from taking advantage of this API, but I could be wrong. It’s difficult to say this early in the game and I have not seen very much discussion about this API and the above problems.