Stage 1.2: Improving the Custom GCC Pass

Introduction

In my previous post, I described how I got started contributing to gcc, by creating a new pass and adding it to the pass manager. The goal of this blog post, it to improve the newly-created dummy pass, to satisfy the following requirements:

  1. Iterates through the code being compiled;
  2. Prints the name of every function being compiled;
  3. Prints a count of the number of basic blocks in each function; and
  4. Prints a count of the number of gimple statements in each function.

So far, requirements #1 & 2 are complete – the dummy pass iterates through the code, and prints the name of every function. I will be focusing on completing requirements #3 & 4, and exploring what “Basic Block” and “Gimple Statements” mean.

Analyzing Current Version

The execute function of the pass is the one where all the magic happens – it is the part that holds the actual logic. Our current dummy pass, tree-efagin.cc looks like this:

unsigned int execute (function *fun) final override {
  struct cgraph_node *node;
  int func_cnt = 0;

  // macro for iterating thru all functions
  FOR_EACH_FUNCTION (node) {
    if (dump_file) { // if a dumpfile exists
      fprintf(dump_file, "=== Function %d Name '%s' ===\n", ++func_cnt, node->name());
    }
  }

  // output end of dump
  if (dump_file) {
    fprintf(dump_file, "\n\n*** End efagin diagnostic dump ***\n\n");
  }

  return 0;
}

And it produces the following diagnostic information in the dump file:

;; Function main (main, funcdef_no=0, decl_uid=2336, cgraph_uid=1, symbol_order=0)

=== Function 1 Name 'printf' ===
=== Function 2 Name 'main' ===

*** End efagin diagnostic dump ***

But there is another part in the dump file, right below the output above. It seems to be a very complicated, weird version of the same hello.c file we compiled. What does it mean?

int main ()
{
  int D.3930;
  int _3;

  <bb 2> :
  printf ("\nHello, World!\n\n");
  _3 = 0;

  <bb 3> :
<L0>:
  return _3;

}

This is actually showing us GCC’s internal representation of our simple Hello World program. This is how the compiler “thinks about” the code after breaking it down:

  • Basic Blocks: <bb 1> all these lables represent self-contained chunks of code with one entry and exit point. These are helpful to iterate through the code and perform optimizations.
  • Labels: <L0> labels, are created to mark specific points in the code that might be jumped to.
  • The return value:int _3, _3 = 0, and then return _3. This likely happens because of some optimizations that take place, which first declare the value to be returned (in this case, 0), and then initialize it and return it.
  • The placeholder: int D.3930 seems to be created as a placeholder for the return value, by the compiler.

Now that we know what all this means, we can proceed to try to improve our gcc pass, adding more requirements.

Iterating Through Basic Blocks

The current pass file looks like this:

unsigned int execute (function *fun) final override {
  struct cgraph_node *node;
  int func_cnt = 0;

  // macro for iterating thru all functions
  FOR_EACH_FUNCTION (node) {
    if (dump_file) { // if a dumpfile exists
      fprintf(dump_file, "=== Function %d Name '%s' ===\n", ++func_cnt, node->name());
    }
  }

  // output end of dump
  if (dump_file) {
    fprintf(dump_file, "\n\n*** End efagin diagnostic dump ***\n\n");		 
  }

  return 0;
}
Full “tree-efagin.cc” file:

/* Test efagin pass
   Language independent return value optimizations
   Copyright (C) 2004-2025 Free Software Foundation, Inc.

This file is part of GCC

GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.

GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.  */

#define INCLUDE_MEMORY
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"

// Added  headers:

#include "gimple.h"
#include "gimple-ssa.h"
#include "cgraph.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "basic-block.h"


// for dump_printf
#include "tree-pretty-print.h"
#include "diagnostic.h"
#include "dumpfile.h"
#include "builtins.h"
#include <stdlib.h>


namespace {

const pass_data pass_data_efagin =
{
  GIMPLE_PASS, /* type */
  "efagin", /* name */
  OPTGROUP_NONE, /* optinfo_flags */
  TV_NONE, /* tv_id */
  PROP_cfg, /* properties_required */
  0, /* properties_provided */
  0, /* properties_destroyed */
  0, /* todo_flags_start */
  0, /* todo_flags_finish */
};

class pass_efagin : public gimple_opt_pass
{
public:
  pass_efagin (gcc::context *ctxt)
    : gimple_opt_pass (pass_data_efagin, ctxt)
  {}

  /* opt_pass methods: (--> all the time, so return 1)*/
  bool gate (function *) final override { return 1; }

	unsigned int execute (function *fun) final override {
		struct cgraph_node *node;
		int func_cnt = 0;

		// macro for iterating thru all functions
		FOR_EACH_FUNCTION (node) {
			if (dump_file) { // if a dumpfile exists
				fprintf(dump_file, "=== Function %d Name '%s' ===\n", ++func_cnt, node->name());
			}
		}

		// output end of dump
		if (dump_file) {
			fprintf(dump_file, "\n\n*** End efagin diagnostic dump ***\n\n");
		}

		return 0;
	}

}; // class pass_efagin

} // anon namespace

gimple_opt_pass *
make_pass_efagin (gcc::context *ctxt)
{
  return new pass_efagin (ctxt);
}

This version iterates through all the functions, using the FOR_EACH_FUNCTION(node) macro, and prints out the name & the count. To implement a basic-block iterator, we can use the FOR_EACH_BB_FN(bb, fun), which iterates thorugh all the basic blocks, when it is passed the function argument from the execute function.

This macro can be used in the following format, as shown in class by our professor:

basic_block bb;
int bb_cnt = 0

// macro for iterating through all basic blocks
FOR_EACH_BB_FN(bb, fun) {
  bb_cnt++;
  if (dump_file) {
    fprintf(dump_file, "\n==== Basic block count: %d ====\n", bb_cnt);
  }
  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) {
    gimple *g = gsi_stmt (gsi);
    stmt_cnt++;
    if (dump_file) {
      fprintf (dump_file, "---- Statement Count: %d ----\n", stmt_cnt);
      print_gimple_stmt (dump_file, g, 0, TDF_VOPS|TDF_MEMSYMS);
    }
  }
}

This code snippet traverses the function’s basic blocks using the FOR_EACH_BB_FN macro, counting each block it encounters, and then uses a clever iterator-based for loop (gimple_stmt_iterator) to walk through each GIMPLE statement within the block—where GIMPLE statements represent the simplified, three-address code form of your program’s operations after initial compilation.

Testing the Changes

The final execute function looks like this:

unsigned int execute (function *fun) final override {
  basic_block bb;
  struct cgraph_node *node;

  int func_cnt = 0, bb_cnt = 0, stmt_cnt = 0;
  if (dump_file) fprintf(dump_file, "\n************** Functions **************\n\n");

  // macro for iterating through all functions
  FOR_EACH_FUNCTION (node) {
    if (dump_file) { // if a dumpfile exists
      fprintf(dump_file, "---- Function %d Name '%s' ----\n", ++func_cnt, node->name());
    }
  }

  if (dump_file) fprintf(dump_file, "\n************* Basic Blocks ************\n");

  // macro for iterating through all basic blocks
  FOR_EACH_BB_FN(bb, fun) {
    bb_cnt++;
    if (dump_file) {
      fprintf(dump_file, "\n==== Basic block count: %d ====\n", bb_cnt);
    }
    for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) {
      gimple *g = gsi_stmt (gsi);
      stmt_cnt++;
      if (dump_file) {
        fprintf (dump_file, "---- Statement Count: %d ----\n", stmt_cnt);
        print_gimple_stmt (dump_file, g, 0, TDF_VOPS|TDF_MEMSYMS);
      }
    }
  }


  if (dump_file) {
    fprintf(dump_file, "\n\n*** End efagin diagnost, strt regular dump of curr gimple ***\n\n");
  }
  return 0;
}

To test the changes, first I must re-build gcc. I will change to the build directory, and execute make, and then install the build:

> (time make -j 24) |& tee build.log
> make install

Now, all that’s left is to compile some tester code, and analyze the dump files.

Downloading Makefile Examples

I wanted to test my pass with some harder examples. I decided to use the makefile examples folder, referenced in the course website.


# Get the examples folder from the website
> wget https://matrix.senecacollege.ca/~chris.tyler/spo600/makefile-examples.tgz

# Extract the files
> tar -xvzf makefile-examples.tgz

# Created files:
> tree makefile-examples
makefile-examples
├── ex1
│   ├── hello.c
│   ├── Makefile
│   └── man8
│       └── hello.8
├── ex2
│   ├── double.c
│   ├── half.c
│   ├── Makefile
│   ├── number.h
│   ├── sauce.c
│   └── sauce.h
└── README

Now, to test these examples, I will add the necassary flags to the Makefile in each example, to ensure a dump file for my pass is produced.

// Makefile
CC=gcc          
CFLAGS=-g -O0 -fno-builtin -fdump-tree-efagin
...

Then, I can execute the make command and view the dump files of all the programs:

// sauce.c

#include <stdio.h>

double specialsauce(double a, double b) {

        double result=a*b;
        return result;
}

And here is the dumpfile produced (sauce.c.265t.efagin):

;; Function specialsauce (specialsauce, funcdef_no=0, decl_uid=2337, cgraph_uid=1, symbol_order=0)


************** Functions **************

---- Function 1 Name 'specialsauce' ----

************* Basic Blocks ************

==== Basic block count: 1 ====
---- Statement Count: 1 ----
result_3 = a_1(D) * b_2(D);
---- Statement Count: 2 ----
_4 = result_3;

==== Basic block count: 2 ====
---- Statement Count: 3 ----
<L0>:
---- Statement Count: 4 ----
# VUSE <.MEM_5(D)>
return _4;


*** End efagin diagnost, strt regular dump current code ***

double specialsauce (double a, double b)
{
  double result;
  double D.2341;
  double _4;

  <bb 2> :
  result_3 = a_1(D) * b_2(D);
  _4 = result_3;

  <bb 3> :
<L0>:
  return _4;

}

Conclusion

In this blog post, I was focusing on improving my dummy GCC pass, adding new functionallities like iterating through basic blocks, and logging GIMPLE statements. Using the macro, and special gimple iterator, made the code quite simple, to write diagnostic text in the dump file. Although the current code doesn’t optimize, or perform any transformations to the code, it helped me gain an understanding of how these GCC passes work, and how to improve them, and build logic in the future.

The build times for small changes in the pass, was approximately 40 seconds on the x86, and around a minute on the aarch64.

Full Pass:

/* Test efagin pass
   Language independent return value optimizations
   Copyright (C) 2004-2025 Free Software Foundation, Inc.


This file is part of GCC

GCC is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.

GCC is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.  */

#define INCLUDE_MEMORY
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"

// Added  headers:

#include "gimple.h"
#include "gimple-ssa.h"
#include "cgraph.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "basic-block.h"


// for dump_printf
#include "tree-pretty-print.h"
#include "diagnostic.h"
#include "dumpfile.h"
#include "builtins.h"
#include <stdlib.h>


namespace {

const pass_data pass_data_efagin =
{
  GIMPLE_PASS, /* type */
  "efagin", /* name */
  OPTGROUP_NONE, /* optinfo_flags */
  TV_NONE, /* tv_id */
  PROP_cfg, /* properties_required */
  0, /* properties_provided */
  0, /* properties_destroyed */
  0, /* todo_flags_start */
  0, /* todo_flags_finish */
};

class pass_efagin : public gimple_opt_pass
{
public:
  pass_efagin (gcc::context *ctxt)
    : gimple_opt_pass (pass_data_efagin, ctxt)
  {}

  /* opt_pass methods: (--> all the time, so return 1)*/
  bool gate (function *) final override { return 1; }

  unsigned int execute (function *fun) final override {
    basic_block bb;
    struct cgraph_node *node;

    int func_cnt = 0, bb_cnt = 0, stmt_cnt = 0;

    if (dump_file) fprintf(dump_file, "\n************** Functions **************\n\n");

    // macro for iterating through all functions
    FOR_EACH_FUNCTION (node) {
      if (dump_file) { // if a dumpfile exists
        fprintf(dump_file, "---- Function %d Name '%s' ----\n", ++func_cnt, node->name());
      }
    }

    if (dump_file) fprintf(dump_file, "\n************* Basic Blocks ************\n");

    // macro for iterating through all basic blocks
    FOR_EACH_BB_FN(bb, fun) {
      bb_cnt++;
      if (dump_file) {
        fprintf(dump_file, "\n==== Basic block count: %d ====\n", bb_cnt);
      }
      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) {
        gimple *g = gsi_stmt (gsi);
        stmt_cnt++;
        if (dump_file) {
          fprintf (dump_file, "---- Statement Count: %d ----\n", stmt_cnt);
          print_gimple_stmt (dump_file, g, 0, TDF_VOPS|TDF_MEMSYMS);
        }
      }
    }


    if (dump_file) {
      fprintf(dump_file, "\n\n*** End efagin diagnost, strt regular dump of curr gimple ***\n\n");
    }
    return 0;
  }

}; // class pass_efagin

} // anon namespace

gimple_opt_pass *
make_pass_efagin (gcc::context *ctxt)
{
  return new pass_efagin (ctxt);
}

Comments

2 responses to “Stage 1.2: Improving the Custom GCC Pass”

  1. […] will be using the same gcc pass I created in Stage 1, improving it step by step, to achieve the desired outcome. But first, I need to understand what […]

  2. […] implementing a basic dummy pass in Stage 1, and exploring the different concepts that will be involved in Stage 2, it’s time to […]

Leave a Reply

Your email address will not be published. Required fields are marked *