Introduction
At this point in the project, I managed to develop a GCC GIMPLE pass that analyzes the code for function clones (based on their base name), analyzes if the clones are significantly similar, and outputs a diagnostic dump. This clone analysis is done thanks to a function signature that gets generated, and compared. Currently, the signature only contains structural information about the function – the number of Basic Blocks, edges, and GIMPLE statement codes, in the order they appear in the function. The statement codes are simply a representation of the type of statement it is, for example – an assignment, function call, return, condition, etc. Two statements can have be of the same type, but have completely different logic – like ==
vs. !=
, or calls to different functions. Although there is a low chance that a pair of function clones that underwent optimization will have the same GIMPLE statement types, in the same order, but are still not the same, I believe it’s worth it to account for the actual function logic, as opposed to comparing the clones based merely on their structure.
In addition to improving the overall signature, in Stage 3 I must ensure that my pass can handle 3 function clones, and perform a correct analysis. For that, I will need to produce some sort of test case that will give me at least 3 function clones to “analyze”.
Goal: improve signature complexity, analyze more than 2 clones at a time.
Improving Signature Complexity
Basic Counts
First, I will keep the basic Edge and BasicBlock counts. Since I am registering the gimple statement codes, I do not need to include the count of the statements.
I will still keep
Basic Block – Predecessors and Successors
Although the instructions mentioned that the numbering of the basic blocks can differ between similar functions, I initially each Basic Block with its index (bb->index
). I tested this thoroughly, and didn’t run into any conflicts – but still decided to eliminate the index count. Logging the predecessors and successors allows me to encode the structure of the tree, or the function’s logic, into the signature.
// Start of basic block, predecessors & successors
signature << " B" << "(" << EDGE_COUNT(bb->preds)
<< ">" << EDGE_COUNT(bb->succs) << ")";
Improving Signature: GIMPLE Statement
The next step is to store some information about the gimple statements in the function. There are a lot of different information that can be chosen to store in the signature, but to be mindful of efficiency – I will only select the ones that are simple, meaningful and most common.
GIMPLE_ASSIGN
– Assignment
Added the
rhs_code
, which denotes the type of expression on the right hand side of an assignment operation.case GIMPLE_ASSIGN: { int rhs_code = gimple_assign_rhs_code(stmt); signature << " =" << rhs_code << "," << gimple_num_ops(stmt); break; }
GIMPLE_CALL
– Function Calls
Records internal (
I
) function call, along with internal numeric identifier, & external (E
) calls, along with the number of arguments.case GIMPLE_CALL: { gcall* call_stmt = as_a<gcall*>(stmt); if (call_stmt) { if (gimple_call_internal_p(call_stmt)) { signature << " I" << (int)gimple_call_internal_fn(call_stmt); } else { signature << " E" << gimple_call_num_args(call_stmt); } } break; }
GIMPLE_COND
– Conditional Statements
Records a code representing the type of conditional statement (
==
,!=
,<
,>=
…).case GIMPLE_COND: { signature << " ?" << (int)gimple_cond_code(stmt); break; }
Testing with Multiple Clones (2+)
Before I tried to modify my code, I wanted to test if my current code will support handling more than 2 clones. I needed to modify my test to include another function clone. Although I attempted several ways, like modifying the test file (clone-test-core.c
), I eventually opted for adding another clone instruction into the Makefile, similar to the popcnt
target-clone direction.
Finding Clone Options
The specific code snippet that I was interested in modifying was this, in the test Makefile:
clone-test-x86-prune: clone-test-core.c ${LIBRARIES}
gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","popcnt") ))'\
My thought was this – if I add another target clone, that will also produce an identical version – just like the current popcnt
clone, I will have myself a perfect tester. But how do I know what values to put there?
While searching the web for clones similar to popcnt
, I stumbled upon this Clang Documentation page, where the following line appears:
Example “subtarget features” from the x86 backend include: “mmx”, “sse”, “sse4.2”, “avx”, “xop” and largely correspond to the machine specific options handled by the front end.
Now I have some possible values to put to the test. I modified that line and added mmx
to the target clones:
... __attribute__((target_clones("default","popcnt","mmx") )) ...
And I ran make clean
, and make
, and inspected the results. The tree-efagin.cc
that I used outputted the following:
gcc -c vol_createsample.c -o vol_createsample.o
gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","popcnt","mmx") ))'\
-march=x86-64 -g -O3 -fno-lto -ftree-vectorize -fdump-tree-efagin \
clone-test-core.c vol_createsample.o -o clone-test-x86-prune
[scale_samples] = PRUNE
[scale_samples] = PRUNE
[scale_samples] = PRUNE
gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","arch=x86-64-v3") ))' \
-march=x86-64 -g -O3 -fno-lto -ftree-vectorize -fdump-tree-efagin \
clone-test-core.c vol_createsample.o -o clone-test-x86-noprune
[scale_samples] = NOPRUNE
Looks like the right decisions were made! Although the output happened 3 times, it looks like it at least made the right decision. Let’s inspect the output further, in the clone-test-x86-prune-clone-test-core.c.265t.efagin
file:
--------Examining Func: scale_samples.mmx-----
CLONE IDENTIFIED: [scale_samples]
#B33 #E44|
B(1>2) 2 2 2 2 ?112 B(1>1) B(1>2) =75,3 =79,3 =96,3 =129,2 =73,3 ?112 B(1>2) =76,3 =77,3 =129,2 ?112 B(1>1) B(1>1) =97,3 =58,2 =129,2 =75,3 B(2>2) 2 2 =166,2 =229,2 =228,2 =75,3 =75,3 =97,3 =97,3 =234,3 =155,2 2 2 2 =73,3 ?116 B(1>1) B(1>2) =102,3 =129,2 ?115 B(1>2) 2 2 =129,2 =75,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>1) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 2 2 B(1>1) B(2>1) =129,2 =75,3 B(2>2) 2 2 =166,2 =129,2 =75,3 =97,3 =129,2 =155,2 2 2 2 =73,3 ?116 B(1>1) B(1>1) B(1>1) B(8>1) B(2>1) B(2>1) 10
CURRENT:
#B33 #E44|
B(1>2) 2 2 2 2 ?112 B(1>1) B(1>2) =75,3 =79,3 =96,3 =129,2 =73,3 ?112 B(1>2) =76,3 =77,3 =129,2 ?112 B(1>1) B(1>1) =97,3 =58,2 =129,2 =75,3 B(2>2) 2 2 =166,2 =229,2 =228,2 =75,3 =75,3 =97,3 =97,3 =234,3 =155,2 2 2 2 =73,3 ?116 B(1>1) B(1>2) =102,3 =129,2 ?115 B(1>2) 2 2 =129,2 =75,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>1) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 2 2 B(1>1) B(2>1) =129,2 =75,3 B(2>2) 2 2 =166,2 =129,2 =75,3 =97,3 =129,2 =155,2 2 2 2 =73,3 ?116 B(1>1) B(1>1) B(1>1) B(8>1) B(2>1) B(2>1) 10
[scale_samples] = PRUNE
CLONE IDENTIFIED: [scale_samples.popcnt]
#B33 #E44|
B(1>2) 2 2 2 2 ?112 B(1>1) B(1>2) =75,3 =79,3 =96,3 =129,2 =73,3 ?112 B(1>2) =76,3 =77,3 =129,2 ?112 B(1>1) B(1>1) =97,3 =58,2 =129,2 =75,3 B(2>2) 2 2 =166,2 =229,2 =228,2 =75,3 =75,3 =97,3 =97,3 =234,3 =155,2 2 2 2 =73,3 ?116 B(1>1) B(1>2) =102,3 =129,2 ?115 B(1>2) 2 2 =129,2 =75,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>1) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 2 2 B(1>1) B(2>1) =129,2 =75,3 B(2>2) 2 2 =166,2 =129,2 =75,3 =97,3 =129,2 =155,2 2 2 2 =73,3 ?116 B(1>1) B(1>1) B(1>1) B(8>1) B(2>1) B(2>1) 10
CURRENT:
#B33 #E44|
B(1>2) 2 2 2 2 ?112 B(1>1) B(1>2) =75,3 =79,3 =96,3 =129,2 =73,3 ?112 B(1>2) =76,3 =77,3 =129,2 ?112 B(1>1) B(1>1) =97,3 =58,2 =129,2 =75,3 B(2>2) 2 2 =166,2 =229,2 =228,2 =75,3 =75,3 =97,3 =97,3 =234,3 =155,2 2 2 2 =73,3 ?116 B(1>1) B(1>2) =102,3 =129,2 ?115 B(1>2) 2 2 =129,2 =75,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>2) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 =73,3 2 2 ?112 B(1>1) B(1>1) 2 2 =73,3 =76,3 =167,2 =129,2 =75,3 =97,3 =76,3 =129,2 =155,2 2 2 2 B(1>1) B(2>1) =129,2 =75,3 B(2>2) 2 2 =166,2 =129,2 =75,3 =97,3 =129,2 =155,2 2 2 2 =73,3 ?116 B(1>1) B(1>1) B(1>1) B(8>1) B(2>1) B(2>1) 10
[scale_samples] = PRUNE
We can see that on top of the usual popcnt
analysis, which outputs its own PRUNE
decision, the mmx
clone also outputs PRUNE – twice, since it gets compared to both clones, the default and the popcnt
. This logic can simply be changed:
auto it = fun_map.find(base_name);
// If this base name already has registered clones, print them
if (it != fun_map.end()) {
for (auto& variant : it->second.first) {
fprintf(dump_file, "CLONE IDENTIFIED: [%s]\n%s\n",
variant.full_name.c_str(),
variant.signature.c_str());
bool is_same = (fdata.signature == variant.signature);
char msg[256];
snprintf(msg, sizeof(msg), "\n[%s] = %s\n",
base_name.c_str(),
is_same ? "PRUNE" : "NOPRUNE");
fprintf(dump_file, msg);
printf(msg);
if (is_same) {
// Signatures match — no need to continue
break;
}
}
}
Now my code goes through the list of function clones, and when it finds a signature match – it stops looking. With this new change, my code should output only 2 PRUNEs.
gcc -c vol_createsample.c -o vol_createsample.o
gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","popcnt","mmx") ))'\
-march=x86-64 -g -O3 -fno-lto -ftree-vectorize -fdump-tree-efagin \
clone-test-core.c vol_createsample.o -o clone-test-x86-prune
[scale_samples] = PRUNE
[scale_samples] = PRUNE
gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","arch=x86-64-v3") ))' \
-march=x86-64 -g -O3 -fno-lto -ftree-vectorize -fdump-tree-efagin \
clone-test-core.c vol_createsample.o -o clone-test-x86-noprune
[scale_samples] = NOPRUNE
Applying Changes to aarch64
To replicate the same results on the aarch64
machine, I needed to find a clone that would produce the same output. This GCC Documentation revealed a couple options, so I picked the first one I saw crc
, added it to the Makefile
, and tested it:
clone-test-aarch64-prune: clone-test-core.c ${LIBRARIES}
gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","rng","crc") ))'\
And the output produced:
gcc -c vol_createsample.c -o vol_createsample.o
gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","rng","crc") ))'\
-march=armv8-a -g -O3 -fno-lto -ftree-vectorize -fdump-tree-efagin \
clone-test-core.c vol_createsample.o -o clone-test-aarch64-prune
clone-test-core.c:28:6: warning: Function Multi Versioning support is experimental, and the behavior is likely to change [-Wexperimental-fmv-target]
28 | void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {
| ^~~~~~~~~~~~~
[scale_samples] = PRUNE
[scale_samples] = PRUNE
gcc -D 'CLONE_ATTRIBUTE=__attribute__((target_clones("default","sve2") ))' \
-march=armv8-a -g -O3 -fno-lto -ftree-vectorize -fdump-tree-efagin \
clone-test-core.c vol_createsample.o -o clone-test-aarch64-noprune
clone-test-core.c:28:6: warning: Function Multi Versioning support is experimental, and the behavior is likely to change [-Wexperimental-fmv-target]
28 | void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {
| ^~~~~~~~~~~~~
[scale_samples] = NOPRUNE
I examined the clone functions code at that point in the process, through the dump file, and confirmed that the 3 function clones – default
, rng
, and crc
have identical logic, according to the signature generated.
Conclusion
Although the pass does not transform or optimize the code in any way, the PRUNE/NOPRUNE output provides an important insight about the compiled code. As I mentioned in Stage 2.1, function variants exist to optimize the code for specific environments and conditions, and it depends on multiple factors. Sometimes, even after undergoing optimization, different versions of the function remain essentially the same – meaning there are now redundant copies of the same logic, in the compiled code. Pruning these copies can improve the efficiency of the compiler, removing extra space and reducing memory usage in the final executable. This can be especially important for resource-constrained environments where code size impacts performance and loading times.
Now that my signature is more complex, and I ensured that my pass can handle more than 2 clones at a time, I’ve successfully accomplished all the goals for this project so far. There are a few improvements that I can think of, for the future:
- If there are many clones, under the same base name, there could be a scenario where there are 2 pairs of similar clones – and the
PRUNE
should be instructed for specific subset of clones. - Overall, the
PRUNE
message should only print once per base function. However, since I don’t know the entire purpose of this diagnostic pass, it might be important to keep track of which functions outputted PRUNE and which didn’t. - Now that the signature generation is more complex, there is a higher possibility that the components in it are redundant and inefficient. With more test files and cases, it would be easier to filter out probable and not-probable cases, like a perfect match in GIMPLE codes but not internal logic.
Working with GCC was not easy – the lack of documentation was the most difficult challenge. Having to rely on trial and error, when each build and install takes a couple minutes, was discouraging to say the least. But the satisfaction of completing this pass, understanding the logic and structure of the compiler, has been worth it. I hope I get to work with open-source projects like these more, the idea of contributing to these important projects in the future is very exciting, and now I can see myself being comfortable with large, undocumented code bases in the future, although maybe not quite yet.
Leave a Reply