Skip to content
On this page

如果要对llvm进行二次开发,需要熟悉他基本的数据结构, 内容主要来自LLVM Programmer’s Manual — LLVM 17.0.0git documentation.

核心api

isa,cast,dyn_cast

cpp编程中的常见操作,面向对象编程常见的概念.

cpp
static bool isLoopInvariant(const Value *V, const Loop *L) {
  if (isa<Constant>(V) || isa<Argument>(V) || isa<GlobalValue>(V))
    return true;

  // Otherwise, it must be an instruction...
  return !L->contains(cast<Instruction>(V)->getParent());
}

if (auto *AI = dyn_cast<AllocationInst>(Val)) {
  // ...
}
  • isa_and_nonnull 和isa一样,不过传入null的时候,返回false
  • cast_or_null 和cast一样,可以处理null
  • dyn_cast_or_null 和dyn_cast一样,可以处理null

字符串处理

当需要处理含有null的字符串时,需要用到StringRef或Twine, 不能用const char*是因为null,不能用std::string&是因为需要分配内存.

  • StringRef 表示对一个字符串的只读引用,并且提供了常见的std::string上面的操作.
  • Twine 专门为了处理字符串拼接而存在, 延迟拼接以避免内存分配.

结构体的格式化

formatv 专用于字符串格式化的, printf的替代,方便易用,借鉴了python,c#的格式化字符串.

样式:{N[[,align]:style]}

自定义格式化

  1. 提供format_provider的特例

    cpp
    namespace llvm {
      template<>
      struct format_provider<MyFooBar> {
        static void format(const MyFooBar &V, raw_ostream &Stream, StringRef Style) {
          // Do whatever is necessary to format `V` into `Stream`
        }
      };
      void foo() {
        MyFooBar X;
        std::string S = formatv("{0}", X);
      }
    }
    
  2. 继承llvm:FormatAdapter, formatv检测到T继承自llvm:FormatAdapter时,会调用它的format进行处理.

cpp
namespace anything {
  struct format_int_custom : public llvm::FormatAdapter<int> {
    explicit format_int_custom(int N) : llvm::FormatAdapter<int>(N) {}
    void format(llvm::raw_ostream &Stream, StringRef Style) override {
      // Do whatever is necessary to format ``this->Item`` into ``Stream``
    }
  };
}
namespace llvm {
  void foo() {
    std::string S = formatv("{0}", anything::format_int_custom(42));
  }
}
cpp
std::string S;
// Simple formatting of basic types and implicit string conversion.
S = formatv("{0} ({1:P})", 7, 0.35);  // S == "7 (35.00%)"

// Out-of-order referencing and multi-referencing
outs() << formatv("{0} {2} {1} {0}", 1, "test", 3); // prints "1 3 test 1"

// Left, right, and center alignment
S = formatv("{0,7}",  'a');  // S == "      a";
S = formatv("{0,-7}", 'a');  // S == "a      ";
S = formatv("{0,=7}", 'a');  // S == "   a   ";
S = formatv("{0,+7}", 'a');  // S == "      a";

// Custom styles
S = formatv("{0:N} - {0:x} - {1:E}", 12345, 123908342); // S == "12,345 - 0x3039 - 1.24E8"

// Adapters
S = formatv("{0}", fmt_align(42, AlignStyle::Center, 7));  // S == "  42   "
S = formatv("{0}", fmt_repeat("hi", 3)); // S == "hihihi"
S = formatv("{0}", fmt_pad("hi", 2, 6)); // S == "  hi      "

// Ranges
std::vector<int> V = {8, 9, 10};
S = formatv("{0}", make_range(V.begin(), V.end())); // S == "8, 9, 10"
S = formatv("{0:$[+]}", make_range(V.begin(), V.end())); // S == "8+9+10"
S = formatv("{0:$[ + ]@[x]}", make_range(V.begin(), V.end())); // S == "0x8 + 0x9 + 0xA"

错误处理

cpp
assert(isPhysReg(R) && "All virt regs should have been allocated already.");
llvm_unreachable("X should be Foo or Bar here");
Error 类型
cpp
//自定义的错误类型
class BadFileFormat : public ErrorInfo<BadFileFormat> {
public:
  static char ID;
  std::string Path;

  BadFileFormat(StringRef Path) : Path(Path.str()) {}

  void log(raw_ostream &OS) const override {
    OS << Path << " is malformed";
  }

  std::error_code convertToErrorCode() const override {
    return make_error_code(object_error::parse_failed);
  }
};

char BadFileFormat::ID; // This should be declared in the C++ file.

Error printFormattedFile(StringRef Path) {
  if (<check for valid format>)
    return make_error<BadFileFormat>(Path); //返回特定的错误类型,这里的BadFileFormat必须继承ErrorInfo
  // print file contents.
  return Error::success();
}

如何使用Error. Expected<T> 可以隐形的转换为bool

cpp
Expected<FormattedFile> openFormattedFile(StringRef Path) {
  // If badly formatted, return an error.
  if (auto Err = checkFormat(Path))
    return std::move(Err);
  // Otherwise return a FormattedFile instance.
  return FormattedFile(Path);
}

Error processFormattedFile(StringRef Path) {
  // Try to open a formatted file
  if (auto FileOrErr = openFormattedFile(Path)) { //error 应该是重载了bool判断操作符,c++坑真多.
    // On success, grab a reference to the file and continue.
    auto &File = *FileOrErr;
    ...
  } else
    // On error, extract the Error value and return it.
    return FileOrErr.takeError();
}

简单的错误:

cpp
// These two lines of code are equivalent:
make_error<StringError>("Bad executable", errc::executable_format_error);
createStringError(errc::executable_format_error, "Bad executable");

可以动态判断error类型

cpp
   
auto ChildOrErr = A.getMember(I);
    if (auto Err = ChildOrErr.takeError()) {
      if (Err.isA<BadFileFormat>())
        consumeError(std::move(Err))
    }

拼接error:DeferredErrs = joinErrors(std::move(DeferredErrs), std::move(Err));

函数作为参数

  1. 原始的:

void takeCallback(bool (*Callback)(Function *, void *), void *Cookie);

  1. 函数模板
cpp
template<typename Callable>
void takeCallback(Callable Callback) {
  Callback(1, 2, 3);
}
  1. function_ref function_ref<Ret(Param1, Param2, ...)> 有一个对Callable的隐式转换, c++为了支持新特性,还兼容老语法,所以越来越复杂,就像x86指令集
cpp
void visitBasicBlocks(Function *F, function_ref<bool (BasicBlock*)> Callback) {
  for (BasicBlock &BB : *F)
    if (Callback(&BB))
      return;
}

LLVM_DEBUG 使用

配合-debug选项,llvm/Support/Debug.h

LLVM_DEBUG(dbgs() << "I am here!\n");
$ opt < a.bc > /dev/null -mypass
<no output>
$ opt < a.bc > /dev/null -mypass -debug
I am here!

还支持更细节的DEBUG_TYPE

cpp
#define DEBUG_TYPE "foo" //注意核心是这里的宏的定义,而不是下面的foo字符串
LLVM_DEBUG(dbgs() << "'foo' debug type\n");
#undef  DEBUG_TYPE
#define DEBUG_TYPE "bar"
LLVM_DEBUG(dbgs() << "'bar' debug type\n");
#undef  DEBUG_TYPE
$ opt < a.bc > /dev/null -mypass
<no output>
$ opt < a.bc > /dev/null -mypass -debug
'foo' debug type
'bar' debug type
$ opt < a.bc > /dev/null -mypass -debug-only=foo
'foo' debug type
$ opt < a.bc > /dev/null -mypass -debug-only=bar
'bar' debug type
$ opt < a.bc > /dev/null -mypass -debug-only=foo,bar
'foo' debug type
'bar' debug type

函数的图形化展示

Function::viewCFG()

核心数据类型

位于llvm/ADT/下,一般用STL即可,

  1. llvm::ArrayRef

  2. TinyPtrVector<Type> 为只有一个或者0个元素的vector的优化版本

  3. SmallVector<Type, N> 有固定大小的vector,在栈上分配,无需malloc,是alloca的有效替代版本

    一些示例

cpp
// DISCOURAGED: Clients cannot pass e.g. raw arrays.
hardcodedContiguousStorage(const SmallVectorImpl<Foo> &In);
// ENCOURAGED: Clients can pass any contiguous storage of Foo.
allowsAnyContiguousStorage(ArrayRef<Foo> In);

void someFunc1() {
  Foo Vec[] = { /* ... */ };
  hardcodedContiguousStorage(Vec); // Error.
  allowsAnyContiguousStorage(Vec); // Works.
}

// DISCOURAGED: Clients cannot pass e.g. SmallVector<Foo, 8>.
hardcodedSmallSize(SmallVector<Foo, 2> &Out);
// ENCOURAGED: Clients can pass any SmallVector<Foo, N>.
allowsAnySmallSize(SmallVectorImpl<Foo> &Out);

void someFunc2() {
  SmallVector<Foo, 8> Vec;
  hardcodedSmallSize(Vec); // Error.
  allowsAnySmallSize(Vec); // Works.
}

Set类

本质是一个有序的vector

Map类

Bit 类

调试技术

source /path/to/llvm/src/utils/gdb-scripts/prettyprinters.py

让print更友好

LLVM中常用的操作

遍历函数

cpp
Function &Func = ...
for (BasicBlock &BB : Func)
  // Print out the name of the basic block if it has one, and then the
  // number of instructions that it contains
  errs() << "Basic block (name=" << BB.getName() << ") has "
             << BB.size() << " instructions.\n";

BasicBlock& BB = ...
for (Instruction &I : BB)
   // The next statement works since operator<<(ostream&,...)
   // is overloaded for Instruction&
   errs() << I << "\n";

也可以是外部的iterator:

cpp
#include "llvm/IR/InstIterator.h"

// F is a pointer to a Function instance
for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
  errs() << *I << "\n";

另一个统计function调用的例子:

cpp
Function* targetFunc = ...;

class OurFunctionPass : public FunctionPass {
  public:
    OurFunctionPass(): callCounter(0) { }

    virtual runOnFunction(Function& F) {
      for (BasicBlock &B : F) {
        for (Instruction &I: B) {
          if (auto *CB = dyn_cast<CallBase>(&I)) {
            // We know we've encountered some kind of call instruction (call,
            // invoke, or callbr), so we need to determine if it's a call to
            // the function pointed to by m_func or not.
            if (CB->getCalledFunction() == targetFunc)
              ++callCounter;
          }
        }
      }
    }

  private:
    unsigned callCounter;
};

def-use和use-def的遍历

cpp
Function *F = ...;

for (User *U : F->users()) {
  if (Instruction *Inst = dyn_cast<Instruction>(U)) {
    errs() << "F is used in instruction:\n";
    errs() << *Inst << "\n";
  }
  
  
  
Instruction *pi = ...;

for (Use &U : pi->operands()) {
  Value *v = U.get();
  // ...
}
  
#include "llvm/IR/CFG.h"
BasicBlock *BB = ...;

for (BasicBlock *Pred : predecessors(BB)) {
  // ...
}
  
  

修改IR

  1. 创建和插入指令
  2. 删除指令
  3. 替换指令
  4. 删除全局变量

LLVM中的多线程

用完LLVM的api后,记得调用llvm_shutdown来释放llvm自己内部分配的内存

ManagedStatic 解决静态变量初始化的问题,他已经考虑了单线程和多线程两种情况,所以直接使用,无需考虑多线程问题.

核心类

注意这些类针对的都是IR

Type

  • bool isIntegerTy() const: Returns true for any integer type.
  • bool isFloatingPointTy(): Return true if this is one of the five floating point types.
  • bool isSized(): Return true if the type has known size. Things that don’t have a size are abstract types, labels and void.

子类:

  • IntegerType
  • SequentialType
    • ArrayType
    • VectorType
  • PointerType
  • StructType
  • FunctionType

Module

是函数,全局变量,符号表的组合.

Value

The Value class is the most important class in the LLVM Source base. It represents a typed value that may be used (among other things) as an operand to an instruction. There are many different types of Values, such as Constants, Arguments. Even Instructions and Functions are Values.

Value可以认为是某个Type的实例

User

The User class is the common base class of all LLVM nodes that may refer to Values. It exposes a list of “Operands” that are all of the Values that the User is referring to. The User class itself is a subclass of Value.

代表对Value的使用

Instruction

就是IR中的指令

Constant

代表各种常量

  • ConstantInt
  • ConstantFP
  • ConstantArray
  • ConstantStruct
  • GlobalValue

GlobalValue

比如函数,全局变量等

InternalLinkage是能不能对外链接,比如c语言中的static变量,

Function

代表函数

GlobalVariable

GobalValue的子类,代表全局变量

BasicBlock

Instruction的Parent,其Parent是Function

Argument

参数

LLVM 开发资料来源:

  • reviews.llvm.org 有特性的实现细节和说明
  • llvm’s developers’ meeting 有很好的视频教程
  • 单元测试